-
Notifications
You must be signed in to change notification settings - Fork 29
feat: Add unprivileged and config-free discover for declarative static schemas #559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add unprivileged and config-free discover for declarative static schemas #559
Conversation
…cSchemaLoader Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
…ation control Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
…class Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
…ver property Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
…discover-for-declarative-static-schemas
Warning Rate limit exceeded@aaronsteers has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 24 minutes and 3 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe changes introduce a mechanism to allow sources with dynamic schema loaders to skip configuration validation during the discovery phase. This is achieved by adding a new flag, Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI/Entrypoint
participant Source
participant ManifestDeclarativeSource
User->>CLI/Entrypoint: Run "discover" (with or without --config)
CLI/Entrypoint->>Source: Check check_config_during_discover
alt check_config_during_discover is False and no config
CLI/Entrypoint->>Source: Call discover with empty config
Source->>ManifestDeclarativeSource: (if applicable) Use dynamic schema loader
ManifestDeclarativeSource-->>Source: Provide catalog
Source-->>CLI/Entrypoint: Return discovery messages
else check_config_during_discover is True or config provided
CLI/Entrypoint->>Source: Validate config
Source->>Source: Perform discovery
Source-->>CLI/Entrypoint: Return discovery messages
end
CLI/Entrypoint-->>User: Output discovery results or error
Possibly related PRs
Would you like to consider adding a note in the documentation or CLI help output to clarify when config is optional for discovery, wdyt? ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
unit_tests/sources/declarative/test_manifest_declarative_source_dynamic_schema.py (3)
7-14
: Trim unused imports to avoid linter noise, wdyt?
pytest
andcheck_config_against_spec_or_exit
are imported but never referenced in this module. Removing them keeps the test file lean and prevents futureflake8/ruff
warnings.-from unittest.mock import MagicMock, patch - -import pytest - -from airbyte_cdk.models import AirbyteCatalog -from airbyte_cdk.sources.declarative.manifest_declarative_source import ManifestDeclarativeSource -from airbyte_cdk.sources.utils.schema_helpers import check_config_against_spec_or_exit +from unittest.mock import MagicMock, patch + +from airbyte_cdk.models import AirbyteCatalog +from airbyte_cdk.sources.declarative.manifest_declarative_source import ManifestDeclarativeSource
16-43
: Collapse almost-identical configs with@pytest.mark.parametrize
?Both tests build large, nearly identical
source_config
dicts that differ only by theschema_loader
type and the expected boolean flag. Switching to a single parametrized test would reduce duplication and make the intent clearer:@pytest.mark.parametrize( "schema_loader, expected_flag", [ ({"type": "DynamicSchemaLoader", ...}, True), ({"type": "InlineSchemaLoader", "schema": {}}, False), ], ) def test_check_config_during_discover(schema_loader, expected_flag): source_config = {... "schema_loader": schema_loader, ...} source = ManifestDeclarativeSource(source_config=source_config) assert source.check_config_during_discover is expected_flag assert source.check_config_against_spec is TrueThis keeps the focus on the behavior being exercised rather than on the boilerplate fixture, wdyt?
Also applies to: 52-68
84-87
: Simplify mock stream name assignment for claritySetting the name via
type(mock_airbyte_stream).name = "test_dynamic_stream"
works but is slightly cryptic. Assigning the attribute directly on the instance is more obvious:-mock_airbyte_stream = MagicMock() -type(mock_airbyte_stream).name = "test_dynamic_stream" +mock_airbyte_stream = MagicMock() +mock_airbyte_stream.name = "test_dynamic_stream"Unless there’s a strict need for the attribute to live on the class, the direct instance assignment is easier to read, wdyt?
airbyte_cdk/entrypoint.py (1)
106-110
: CLI UX: optional flag still listed under “required” group
--config
is no longer required fordiscover
, but it’s still added to the “required named arguments” group. This can mislead users reading--help
. Would moving it to the parent parser (or renaming the group) be clearer?-required_discover_parser = discover_parser.add_argument_group("required named arguments") -required_discover_parser.add_argument( +discover_parser.add_argument( "--config", type=str, required=False, help="path to the json configuration file" )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (6)
airbyte_cdk/cli/source_declarative_manifest/_run.py
(1 hunks)airbyte_cdk/connector.py
(1 hunks)airbyte_cdk/entrypoint.py
(3 hunks)airbyte_cdk/sources/declarative/manifest_declarative_source.py
(3 hunks)unit_tests/sources/declarative/test_manifest_declarative_source_dynamic_schema.py
(1 hunks)unit_tests/test_entrypoint.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (6)
airbyte_cdk/connector.py (1)
35-42
: Documentation for config validation control flags is clear and well-placed.The docstrings for both
check_config_against_spec
andcheck_config_during_discover
clearly explain their purpose and behavior. The new flag allows sources to provide catalog information without requiring authentication, which is a useful extension.airbyte_cdk/cli/source_declarative_manifest/_run.py (1)
238-239
: Good change to avoid None for config.This change ensures that
config
is always a dictionary (possibly empty) rather thanNone
when no valid config argument is provided, which makes downstream processing more consistent. This supports the new unprivileged discovery flow.unit_tests/test_entrypoint.py (1)
246-246
: Test updated correctly to reflect optional config for discover.The test definition has been properly updated to reflect the fact that the "discover" command no longer requires a config parameter. This aligns with the changes in the entrypoint implementation.
airbyte_cdk/sources/declarative/manifest_declarative_source.py (2)
143-144
: Good pattern for setting check_config_during_discover based on schema loader type.Setting this flag based on the presence of a
DynamicSchemaLoader
is a clean approach that automatically enables the appropriate behavior without requiring manual configuration for each source.
549-579
: Well-implemented schema loader detection method.The
_uses_dynamic_schema_loader
method is thorough, checking both static streams and dynamic stream templates. The documentation is clear about its purpose and return value.I particularly appreciate the detailed checks that handle nested configurations and the various ways streams can be defined in the manifest.
airbyte_cdk/entrypoint.py (1)
278-282
: Double-check backwards compatibility of the new gate
discover()
now validates config only whencheck_config_during_discover
isTrue
. For connectors that never define the flag (older versions), the default value onBaseConnector
will govern behavior. Could you confirm that default isTrue
to preserve existing semantics? If not, adding an explicit default here might avoid accidental skips, wdyt?
PyTest Results (Fast)3 698 tests +3 3 687 ✅ +3 6m 36s ⏱️ +11s Results for commit e27697e. ± Comparison against base commit 312f2e1. This pull request removes 2 and adds 5 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
PyTest Results (Full)3 701 tests +3 3 690 ✅ +3 17m 57s ⏱️ -18s Results for commit e27697e. ± Comparison against base commit 312f2e1. This pull request removes 2 and adds 5 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
/poetry-lock
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@aj/feat/unprivileged-discover-for-declarative-static-schemas#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch aj/feat/unprivileged-discover-for-declarative-static-schemas Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (1)
airbyte_cdk/test/standard_tests/connector_base.py (1)
154-194
: Well-structured scenario loading with good filtering logicThe method properly loads scenarios from YAML config, filters by categories, excludes IAM role tests, and converts paths to absolute. The implementation is comprehensive and handles edge cases well.
One small suggestion - should we consider making the categories configurable or documenting why only "connection" and "spec" are included, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/cli/source_declarative_manifest/_run.py
(1 hunks)airbyte_cdk/sources/declarative/manifest_declarative_source.py
(4 hunks)airbyte_cdk/test/standard_tests/connector_base.py
(2 hunks)airbyte_cdk/test/standard_tests/source_base.py
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte_cdk/cli/source_declarative_manifest/_run.py
- airbyte_cdk/sources/declarative/manifest_declarative_source.py
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.
airbyte_cdk/test/standard_tests/source_base.py (4)
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py:21-29
Timestamp: 2025-01-13T23:39:15.457Z
Learning: The CustomPageIncrement class in unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py is imported from another connector definition and should not be modified in this context.
airbyte_cdk/test/standard_tests/connector_base.py (4)
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py:21-29
Timestamp: 2025-01-13T23:39:15.457Z
Learning: The CustomPageIncrement class in unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py is imported from another connector definition and should not be modified in this context.
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#90
File: Dockerfile:16-21
Timestamp: 2024-12-02T18:36:04.346Z
Learning: Copying files from `site-packages` in the Dockerfile maintains compatibility with both the old file structure that manifest-only connectors expect and the new package-based structure where SDM is part of the CDK.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
🪛 GitHub Actions: Linters
airbyte_cdk/test/standard_tests/source_base.py
[error] 114-114: Missing named argument "connector_root" for "run_test_job" [call-arg]
[error] 119-119: "ConnectorTestScenario" has no attribute "expect_exception" [attr-defined]
[error] 150-150: "ConnectorTestScenario" has no attribute "expect_exception" [attr-defined]
[error] 164-164: Missing named argument "connector_root" for "get_config_dict" of "ConnectorTestScenario" [call-arg]
airbyte_cdk/test/standard_tests/connector_base.py
[error] 125-125: Item "None" of "list[AirbyteMessage] | None" has no attribute "iter" (not iterable) [union-attr]
[error] 134-134: "ConnectorTestScenario" has no attribute "expect_exception" [attr-defined]
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-shopify
- GitHub Check: SDM Docker Image Build
- GitHub Check: Analyze (python)
🔇 Additional comments (4)
airbyte_cdk/test/standard_tests/source_base.py (1)
66-69
: LGTM! Clear skip logic for expected exceptionsThe skip logic with a descriptive message improves test clarity when dealing with scenarios that expect exceptions.
airbyte_cdk/test/standard_tests/connector_base.py (3)
11-15
: LGTM! Clean import additionsThe import changes properly add the necessary dependencies for YAML parsing and connection status handling.
140-143
: LGTM! Clean implementationThe method properly delegates to the utility function with appropriate search paths.
145-153
: LGTM! Clear error handlingThe property provides a clean interface with appropriate error messaging when the config file is not found.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
- Fixed TypeError in ManifestDeclarativeSource._stream_configs() missing required positional argument 'config' - Added empty_config parameter when calling _stream_configs during schema detection - This enables config-free discover to work for static schema connectors Manual testing shows this fixes the immediate TypeError but additional work needed for datetime parsing issues. Co-Authored-By: AJ Steers <[email protected]>
- Fix ConnectorTestScenario.connector_root attribute error in source_base.py - Update manual test plan with complete CLI testing results - Document successful config-free discover for source-pokeapi (static schema) - Confirm dynamic schema connectors correctly require config as expected - All local quality checks (MyPy, ruff format, ruff check) pass Key findings: - PR #559 core functionality is working for static schema connectors - source-pokeapi successfully returns catalog without config - source-datascope still has datetime parsing issues (separate fix needed) - Dynamic schema connectors correctly fail without config as expected Co-Authored-By: AJ Steers <[email protected]>
- Add blank line after AssertionError in connector_base.py - Resolves ruff format check CI failure Co-Authored-By: AJ Steers <[email protected]>
- Convert relative paths to absolute paths before creating frozen ConnectorTestScenario models - Fixes PyTest failures in CI by preventing attempts to modify frozen Pydantic instances - Local tests now pass: 7 passed, 1 skipped Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
airbyte_cdk/test/standard_tests/connector_base.py (1)
123-138
: Good fixes addressing the previous review comments!The null check for
result._messages
and the correct use ofscenario.expected_outcome.expect_exception()
both address the issues flagged in previous reviews.However, there's a formatting issue that needs attention, wdyt?
if ( - scenario.expected_outcome.expect_exception() - and conn_status.status == Status.SUCCEEDED + scenario.expected_outcome.expect_exception() + and conn_status.status == Status.SUCCEEDED and not result.errors ):
🧹 Nitpick comments (1)
airbyte_cdk/test/standard_tests/connector_base.py (1)
153-193
: Solid implementation of scenario loading with good error handling!The method properly loads YAML config, validates scenarios, and handles path resolution. The filtering logic for "iam_role" tests and path conversion to absolute paths is well thought out.
One small suggestion: consider making the categories configurable rather than hardcoded, wdyt?
@classmethod def get_scenarios( cls, + categories: list[str] | None = None, ) -> list[ConnectorTestScenario]: """Get acceptance tests for a given category.""" - categories = ["connection", "spec"] + if categories is None: + categories = ["connection", "spec"]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (2)
airbyte_cdk/test/standard_tests/connector_base.py
(2 hunks)airbyte_cdk/test/standard_tests/source_base.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/test/standard_tests/source_base.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
airbyte_cdk/test/standard_tests/connector_base.py (4)
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py:21-29
Timestamp: 2025-01-13T23:39:15.457Z
Learning: The CustomPageIncrement class in unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py is imported from another connector definition and should not be modified in this context.
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#90
File: Dockerfile:16-21
Timestamp: 2024-12-02T18:36:04.346Z
Learning: Copying files from `site-packages` in the Dockerfile maintains compatibility with both the old file structure that manifest-only connectors expect and the new package-based structure where SDM is part of the CDK.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
🪛 GitHub Actions: Linters
airbyte_cdk/test/standard_tests/connector_base.py
[error] 136-137: ruff formatting check failed. File requires reformatting as indicated by 'poetry run ruff format --diff'.
🔇 Additional comments (3)
airbyte_cdk/test/standard_tests/connector_base.py (3)
11-15
: Import changes look good!The addition of
yaml
andAirbyteConnectionStatus
imports align well with the new YAML config loading functionality and explicit type casting in the test methods.
139-142
: Clean implementation of connector root directory discovery!The method properly leverages the existing
find_connector_root
utility and provides sensible search locations.
144-151
: Well-implemented class property with proper error handling!The implementation correctly uses the connector root directory and provides clear error messaging when the config file is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/test/standard_tests/connector_base.py (2)
154-194
: Consider adding error handling for YAML parsing and path resolution.The scenario loading logic is comprehensive, but there are a few areas where additional error handling might be helpful:
- YAML parsing could fail with malformed files
- Path resolution for config and catalog paths could fail if the files don't exist
Would you like to add try-catch blocks around the YAML loading and path resolution operations, wdyt?
- all_tests_config = yaml.safe_load(cls.acceptance_test_config_path.read_text()) + try: + all_tests_config = yaml.safe_load(cls.acceptance_test_config_path.read_text()) + except yaml.YAMLError as e: + raise ValueError(f"Failed to parse YAML config at {cls.acceptance_test_config_path}: {e}") from eAlso, should we validate that the resolved paths actually exist before returning the scenarios, wdyt?
179-185
: Consider extracting the filtering logic for better maintainability.The list comprehension with multiple conditions is functional but could be more readable. Would you consider extracting this into a helper method or breaking it down for clarity, wdyt?
- test_scenarios.extend( - [ - ConnectorTestScenario.model_validate(test) - for test in all_tests_config["acceptance_tests"][category]["tests"] - if "config_path" in test and "iam_role" not in test["config_path"] - ] - ) + for test in all_tests_config["acceptance_tests"][category]["tests"]: + if "config_path" not in test: + continue + if "iam_role" in test["config_path"]: + # Skip iam_role tests as they are not supported in the test suite + continue + test_scenarios.append(ConnectorTestScenario.model_validate(test))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/manifest_declarative_source.py
(4 hunks)airbyte_cdk/test/standard_tests/connector_base.py
(2 hunks)airbyte_cdk/test/standard_tests/source_base.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte_cdk/test/standard_tests/source_base.py
- airbyte_cdk/sources/declarative/manifest_declarative_source.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, ignore all `__init__.py` files when providing a recommended reviewing order.
airbyte_cdk/test/standard_tests/connector_base.py (4)
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py:21-29
Timestamp: 2025-01-13T23:39:15.457Z
Learning: The CustomPageIncrement class in unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py is imported from another connector definition and should not be modified in this context.
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#90
File: Dockerfile:16-21
Timestamp: 2024-12-02T18:36:04.346Z
Learning: Copying files from `site-packages` in the Dockerfile maintains compatibility with both the old file structure that manifest-only connectors expect and the new package-based structure where SDM is part of the CDK.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
🧬 Code Graph Analysis (1)
airbyte_cdk/test/standard_tests/connector_base.py (5)
airbyte_cdk/test/entrypoint_wrapper.py (2)
connection_status_messages
(167-168)errors
(200-201)airbyte_cdk/test/models/scenario.py (2)
expected_outcome
(93-99)ConnectorTestScenario
(27-179)airbyte_cdk/test/models/outcome.py (1)
expect_exception
(55-57)airbyte_cdk/test/standard_tests/docker_base.py (4)
get_connector_root_dir
(53-55)get_test_class_dir
(46-50)acceptance_test_config_path
(69-75)get_scenarios
(78-132)airbyte_cdk/utils/connector_paths.py (1)
find_connector_root
(180-198)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-shopify
- GitHub Check: SDM Docker Image Build
- GitHub Check: Analyze (python)
🔇 Additional comments (4)
airbyte_cdk/test/standard_tests/connector_base.py (4)
11-15
: Import changes look good.The addition of
yaml
andAirbyteConnectionStatus
imports align with the new functionality, and the cleanup of unused imports (AirbyteMessage
,Type
) is appropriate.
123-139
: Great improvements to error handling and validation!The enhanced error message formatting and explicit connection status validation significantly improve debugging capabilities. I notice the past review comments have been addressed with the null check for
result._messages
and the proper attribute access forscenario.expected_outcome.expect_exception()
.
140-143
: Clean delegation to utility function.The
get_connector_root_dir
method properly delegates tofind_connector_root
with appropriate search paths. This aligns with the existing pattern inDockerConnectorTestSuite
.
145-152
: Solid implementation of the acceptance test config path property.The
classproperty
decorator and error handling provide clear feedback when the config file is missing. Good use of the existingACCEPTANCE_TEST_CONFIG
constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/test/entrypoint_wrapper.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, ignore all `__init__.py` files when providing a recommended reviewing order.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Analyze (python)
@@ -65,6 +65,8 @@ class AirbyteEntrypointException(Exception): | |||
raise output.as_exception() | |||
""" | |||
|
|||
message: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Question about the dataclass field addition - could this break the existing constructor pattern?
Adding the message: str
field to the dataclass looks good for making the structure more explicit, but I'm wondering if this might cause issues with how as_exception()
creates the exception on line 224?
The method calls AirbyteEntrypointException(self.get_formatted_error_message())
, which would now need to match the dataclass signature. Should we consider either:
- Making the field optional with a default value:
message: str = ""
- Or updating the
as_exception()
method to use keyword arguments:AirbyteEntrypointException(message=self.get_formatted_error_message())
wdyt? This would ensure the constructor call aligns with the new dataclass structure.
🤖 Prompt for AI Agents
In airbyte_cdk/test/entrypoint_wrapper.py at line 68, adding the message: str
field to the dataclass changes its constructor signature, which may break the
as_exception() method call at line 224 that instantiates
AirbyteEntrypointException without matching the new signature. To fix this,
either make the message field optional by assigning a default value like
message: str = "" or update the as_exception() method to call
AirbyteEntrypointException with a keyword argument, e.g.,
AirbyteEntrypointException(message=self.get_formatted_error_message()), ensuring
the constructor call matches the dataclass definition.
What
Introduces new behavior to attempt discovery even if
config
is omitted. Tested successfully withsource-pokeapi
.Replaces the Devin-created PRs:
Important
discover
in practice, even though they could complete it in theory. For example, I tested the below sample run withklaviyo
in place ofpokeapi
, and the Klaviyo source tries to accessapi_key
fromconfig
during initialization of its custom components. The problem lies in the fact thatdiscover
has to callstreams()
, which tries to fully initialize allStream
objects.TODO
Future Improvements
Further future improvements could be made which delivers one or more of the following:
streams()
method are treated as warnings and not fatal errors.streams()
is used for discovery.discover
would not fail (for instance) butcheck
orread
would fail.Sample successful run:
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores