Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions examples/ai-test-agents/hercules/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Hercules AI Test Agent for PyAirbyte

This example demonstrates how to use [Hercules](https://github.com/test-zeus-ai/testzeus-hercules), an open-source AI testing agent, to test PyAirbyte functionality.

## About Hercules

Hercules is the world's first open-source testing agent that uses Gherkin format for test scenarios. It can perform UI, API, and other types of testing without requiring manual scripting.

## Prerequisites

1. Python 3.11 or higher
2. Hercules installed in a separate virtual environment (due to dependency conflicts with PyAirbyte)
3. OpenAI API key or other LLM provider credentials

## Installation

Since Hercules has a dependency conflict with PyAirbyte's airbyte-cdk (psutil version), it should be installed in a separate virtual environment:

```bash
# Create a separate virtual environment for Hercules
python -m venv hercules-env
source hercules-env/bin/activate # On Windows: hercules-env\Scripts\activate

# Install Hercules
pip install testzeus-hercules

# Install Playwright (required by Hercules)
playwright install --with-deps
```

## Usage

1. Set your LLM API key:
```bash
export OPENAI_API_KEY="your-api-key-here"
```

2. Run Hercules with the test feature file:
```bash
testzeus-hercules --input-file test_pyairbyte.feature \
--output-path ./output \
--test-data-path ./test_data \
--llm-model gpt-4o \
--llm-model-api-key $OPENAI_API_KEY
```

## Test Scenario

The included `test_pyairbyte.feature` file contains a simple Gherkin scenario that tests basic PyAirbyte functionality:

- Installing PyAirbyte
- Creating a source connector
- Reading data from the source
- Validating the data
Comment on lines +47 to +54
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Update to reflect both test scenarios.

The README says "Test Scenario" (singular) but the feature file actually contains TWO scenarios:

  1. Test PyAirbyte source connector with faker (lines 3-36)
  2. Test PyAirbyte connector discovery (lines 38-61)

Would you mind updating this section to mention both scenarios? Something like:

## Test Scenarios

The included `test_pyairbyte.feature` file contains two Gherkin scenarios:

1. **Basic functionality test**: Tests installing PyAirbyte, creating a source connector, reading data, and validating the results
2. **Connector discovery test**: Tests querying the registry to discover available source connectors

Both scenarios validate core PyAirbyte functionality.

WDYT?

🤖 Prompt for AI Agents
In examples/ai-test-agents/hercules/README.md around lines 47 to 54, the "Test
Scenario" section incorrectly refers to a single scenario though the feature
file contains two; update the heading and body to plural "Test Scenarios" and
describe both scenarios: (1) the basic functionality test — installing
PyAirbyte, creating a source connector (faker), reading data and validating
results, and (2) the connector discovery test — querying the registry to
discover available source connectors; end with a brief line that both scenarios
validate core PyAirbyte functionality.


## Output

Hercules will generate:
- JUnit XML test results in `./output/`
- HTML test report in `./output/`
- Execution proofs (screenshots, videos, network logs) in `./proofs/`
- Detailed logs in `./log_files/`

## Limitations

Due to dependency conflicts (specifically psutil version requirements), Hercules cannot be installed in the same environment as PyAirbyte. This example demonstrates using Hercules in a separate environment to test PyAirbyte functionality.

## Alternative Approach

For integrated testing, consider:
1. Using Docker to run Hercules in an isolated container
2. Creating a CI/CD pipeline that runs Hercules tests separately
3. Using Hercules to test PyAirbyte's CLI or API endpoints rather than importing it directly
5 changes: 5 additions & 0 deletions examples/ai-test-agents/hercules/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Requirements for running Hercules AI test agent
# Note: Install these in a separate virtual environment due to dependency conflicts with PyAirbyte

testzeus-hercules>=0.2.0
playwright>=1.40.0
61 changes: 61 additions & 0 deletions examples/ai-test-agents/hercules/test_pyairbyte.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
Feature: PyAirbyte Basic Functionality Test

Scenario: Test PyAirbyte source connector with faker
Given I have Python installed on my system
When I install PyAirbyte using "pip install airbyte"
And I create a Python script with the following code:
"""
import airbyte as ab

# Create a source connector
source = ab.get_source(
"source-faker",
config={"count": 10},
install_if_missing=True
)

# Check the connection
source.check()

# Read data into a local cache
cache = ab.new_local_cache()
result = source.read(cache)

# Get data from a stream
df = cache["users"].to_pandas()

# Validate we got data
assert len(df) > 0, "No data was read from source"
assert "id" in df.columns, "Expected 'id' column not found"

print(f"Successfully read {len(df)} records from source-faker")
"""
And I run the Python script
Then the script should execute successfully
And I should see output containing "Successfully read"
And I should see output containing "records from source-faker"

Scenario: Test PyAirbyte connector discovery
Given I have PyAirbyte installed
When I create a Python script to discover available connectors:
"""
import airbyte as ab
from airbyte.registry import get_available_connectors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the registry import path.

Is from airbyte.registry import get_available_connectors the correct import path? Would be good to verify this is the actual module structure in PyAirbyte. WDYT?


🏁 Script executed:

#!/bin/bash
# Search for registry module and get_available_connectors function
ast-grep --pattern $'def get_available_connectors($$$) {
  $$$
}'

Length of output: 0


Let me try different approaches to verify the registry import:


---

🏁 Script executed:

```shell
#!/bin/bash
# Search for get_available_connectors function definition using rg
rg -nP 'def\s+get_available_connectors' --type=py

Length of output: 113


🏁 Script executed:

#!/bin/bash
# Find files related to registry module
fd -e py '.*registry.*' | head -20

Length of output: 185


🏁 Script executed:

#!/bin/bash
# Search for imports of get_available_connectors to see how it's used
rg -nP 'from\s+airbyte\.registry\s+import|import.*get_available_connectors' --type=py

Length of output: 923


🏁 Script executed:

#!/bin/bash
# Look for airbyte module structure
fd -type d -name 'airbyte' | head -5

Length of output: 170


Let me check the actual test file to see the full context:


---

🏁 Script executed:

```shell
#!/bin/bash
# Get the actual content of the test file around lines 43-52
cat examples/ai-test-agents/hercules/test_pyairbyte.feature | sed -n '38,61p' | cat -n

Length of output: 1153


🏁 Script executed:

#!/bin/bash
# Search for connector_type parameter usage in the codebase
rg -nP 'connector_type\s*=' --type=py -A2 -B2

Length of output: 3502


🏁 Script executed:

#!/bin/bash
# Search for how get_available_connectors is called in actual code
rg -nP 'get_available_connectors\(' --type=py -A2 -B1 | head -50

Length of output: 3141


The import is correct, but the test script has critical bugs in how it uses the function.

The import path from airbyte.registry import get_available_connectors is valid and properly exported. However, the test script has two issues:

  1. Line 46: The parameter should be install_type, not connector_type. Usage in the codebase shows: get_available_connectors(install_type="docker") or install_type="yaml", not connector_type="source".

  2. Line 52: Since get_available_connectors() returns list[str] (connector names as strings), iterating with c.name won't work. Should be: for c in sources if c == "source-faker" instead.

Wdyt on addressing these two issues in the test script?

🤖 Prompt for AI Agents
In examples/ai-test-agents/hercules/test_pyairbyte.feature around lines 43–52,
the test misuses get_available_connectors: change the keyword arg from
connector_type to install_type (e.g.,
get_available_connectors(install_type="docker") or "yaml") and update the loop
to treat returned items as strings (e.g., check equality against "source-faker"
rather than accessing c.name). Ensure any assertions or membership checks use
string comparisons or contains checks on the list of connector names.


# Get list of available source connectors
sources = get_available_connectors(connector_type="source")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the get_available_connectors API.

Could you verify that get_available_connectors(connector_type="source") is the correct function signature? Want to make sure the parameter name is connector_type and not something like type or kind. WDYT?


🏁 Script executed:

#!/bin/bash
# Find the get_available_connectors function definition
rg -n "def get_available_connectors" --type=py -A 10

Length of output: 850


I found the issue! The function definition shows install_type as the parameter, but the code is calling it with connector_type="source".Perfect! The API documentation confirms that get_available_connectors takes install_type parameter, not connector_type. The function definition shows install_type accepts values like InstallType.PYTHON, InstallType.JAVA, InstallType.DOCKER, and InstallType.YAML.

Fix the parameter name in get_available_connectors() call.

The function signature uses install_type, not connector_type. Additionally, "source" is not a valid value for install_type—valid options are "python", "docker", "yaml", "java", or None based on the installation method, not connector type. If you need to filter for source connectors specifically, the current API doesn't support that parameter. What's your intention here?

🤖 Prompt for AI Agents
In examples/ai-test-agents/hercules/test_pyairbyte.feature around line 46, the
call uses connector_type="source" but the function signature expects
install_type; change the argument name to install_type and pass one of the valid
install type values ("python", "docker", "yaml", "java") or None as appropriate
(e.g., install_type="python"); if your intent was to filter by connector
category (“source”), remove this unsupported parameter and instead call
get_available_connectors() without install_type and apply a separate filter on
the returned connectors for type=="source" after retrieving them.


# Validate we have connectors
assert len(sources) > 0, "No source connectors found"

# Check that source-faker is available
faker_found = any(c.name == "source-faker" for c in sources)
assert faker_found, "source-faker not found in available connectors"

print(f"Found {len(sources)} source connectors")
print("source-faker is available")
"""
And I run the Python script
Then the script should execute successfully
And I should see output containing "source connectors"
And I should see output containing "source-faker is available"