-
Notifications
You must be signed in to change notification settings - Fork 22
feat: Add unprivileged and config-free discover for declarative static schemas #559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add unprivileged and config-free discover for declarative static schemas #559
Conversation
…cSchemaLoader Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
…ation control Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
…class Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
…ver property Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
…discover-for-declarative-static-schemas
📝 WalkthroughWalkthroughThis update introduces a mechanism for sources to optionally skip configuration validation during the discovery phase, controlled by a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI/Entrypoint
participant Source
participant SchemaLoader
User->>CLI/Entrypoint: Run `discover` (with or without --config)
CLI/Entrypoint->>Source: Check check_config_during_discover
alt check_config_during_discover is True
CLI/Entrypoint->>Source: Validate config
Source->>CLI/Entrypoint: Validation result
CLI/Entrypoint->>Source: Call discover(config)
else check_config_during_discover is False
CLI/Entrypoint->>Source: Call discover({}) (empty config)
end
Source->>SchemaLoader: (if stream uses DynamicSchemaLoader) Load schema
SchemaLoader-->>Source: Schema
Source->>CLI/Entrypoint: Discovery messages
CLI/Entrypoint->>User: Output discovery results
Would you like me to create a separate sequence diagram illustrating the dynamic schema loader detection logic inside Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used🧬 Code Graph Analysis (2)airbyte_cdk/test/standard_tests/_job_runner.py (2)
airbyte_cdk/test/standard_tests/connector_base.py (2)
⏰ Context from checks skipped due to timeout of 90000ms (9)
🔇 Additional comments (5)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
unit_tests/sources/declarative/test_manifest_declarative_source_dynamic_schema.py (3)
7-14
: Trim unused imports to avoid linter noise, wdyt?
pytest
andcheck_config_against_spec_or_exit
are imported but never referenced in this module. Removing them keeps the test file lean and prevents futureflake8/ruff
warnings.-from unittest.mock import MagicMock, patch - -import pytest - -from airbyte_cdk.models import AirbyteCatalog -from airbyte_cdk.sources.declarative.manifest_declarative_source import ManifestDeclarativeSource -from airbyte_cdk.sources.utils.schema_helpers import check_config_against_spec_or_exit +from unittest.mock import MagicMock, patch + +from airbyte_cdk.models import AirbyteCatalog +from airbyte_cdk.sources.declarative.manifest_declarative_source import ManifestDeclarativeSource
16-43
: Collapse almost-identical configs with@pytest.mark.parametrize
?Both tests build large, nearly identical
source_config
dicts that differ only by theschema_loader
type and the expected boolean flag. Switching to a single parametrized test would reduce duplication and make the intent clearer:@pytest.mark.parametrize( "schema_loader, expected_flag", [ ({"type": "DynamicSchemaLoader", ...}, True), ({"type": "InlineSchemaLoader", "schema": {}}, False), ], ) def test_check_config_during_discover(schema_loader, expected_flag): source_config = {... "schema_loader": schema_loader, ...} source = ManifestDeclarativeSource(source_config=source_config) assert source.check_config_during_discover is expected_flag assert source.check_config_against_spec is TrueThis keeps the focus on the behavior being exercised rather than on the boilerplate fixture, wdyt?
Also applies to: 52-68
84-87
: Simplify mock stream name assignment for claritySetting the name via
type(mock_airbyte_stream).name = "test_dynamic_stream"
works but is slightly cryptic. Assigning the attribute directly on the instance is more obvious:-mock_airbyte_stream = MagicMock() -type(mock_airbyte_stream).name = "test_dynamic_stream" +mock_airbyte_stream = MagicMock() +mock_airbyte_stream.name = "test_dynamic_stream"Unless there’s a strict need for the attribute to live on the class, the direct instance assignment is easier to read, wdyt?
airbyte_cdk/entrypoint.py (1)
106-110
: CLI UX: optional flag still listed under “required” group
--config
is no longer required fordiscover
, but it’s still added to the “required named arguments” group. This can mislead users reading--help
. Would moving it to the parent parser (or renaming the group) be clearer?-required_discover_parser = discover_parser.add_argument_group("required named arguments") -required_discover_parser.add_argument( +discover_parser.add_argument( "--config", type=str, required=False, help="path to the json configuration file" )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (6)
airbyte_cdk/cli/source_declarative_manifest/_run.py
(1 hunks)airbyte_cdk/connector.py
(1 hunks)airbyte_cdk/entrypoint.py
(3 hunks)airbyte_cdk/sources/declarative/manifest_declarative_source.py
(3 hunks)unit_tests/sources/declarative/test_manifest_declarative_source_dynamic_schema.py
(1 hunks)unit_tests/test_entrypoint.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (6)
airbyte_cdk/connector.py (1)
35-42
: Documentation for config validation control flags is clear and well-placed.The docstrings for both
check_config_against_spec
andcheck_config_during_discover
clearly explain their purpose and behavior. The new flag allows sources to provide catalog information without requiring authentication, which is a useful extension.airbyte_cdk/cli/source_declarative_manifest/_run.py (1)
238-239
: Good change to avoid None for config.This change ensures that
config
is always a dictionary (possibly empty) rather thanNone
when no valid config argument is provided, which makes downstream processing more consistent. This supports the new unprivileged discovery flow.unit_tests/test_entrypoint.py (1)
246-246
: Test updated correctly to reflect optional config for discover.The test definition has been properly updated to reflect the fact that the "discover" command no longer requires a config parameter. This aligns with the changes in the entrypoint implementation.
airbyte_cdk/sources/declarative/manifest_declarative_source.py (2)
143-144
: Good pattern for setting check_config_during_discover based on schema loader type.Setting this flag based on the presence of a
DynamicSchemaLoader
is a clean approach that automatically enables the appropriate behavior without requiring manual configuration for each source.
549-579
: Well-implemented schema loader detection method.The
_uses_dynamic_schema_loader
method is thorough, checking both static streams and dynamic stream templates. The documentation is clear about its purpose and return value.I particularly appreciate the detailed checks that handle nested configurations and the various ways streams can be defined in the manifest.
airbyte_cdk/entrypoint.py (1)
278-282
: Double-check backwards compatibility of the new gate
discover()
now validates config only whencheck_config_during_discover
isTrue
. For connectors that never define the flag (older versions), the default value onBaseConnector
will govern behavior. Could you confirm that default isTrue
to preserve existing semantics? If not, adding an explicit default here might avoid accidental skips, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/test/standard_tests/source_base.py (1)
63-66
: Conditionally skipping discover test makes sense, but worth considering consistency with test_basic_read approach - wdyt?This addition properly skips the discover test when an exception is expected, which aligns with the PR objective of supporting unprivileged discovery for static schemas.
I noticed that in
test_basic_read
(lines 108-115), there's a different approach - the test runs and then asserts that errors are present whenscenario.expect_exception
is true. Both approaches are valid, but having different patterns might be confusing to future contributors.Would it make sense to either:
- Use the same pattern in both methods (either skip or expect errors), or
- Add a comment explaining why different approaches are used?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
airbyte_cdk/test/standard_tests/source_base.py
(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte_cdk/test/standard_tests/source_base.py (1)
airbyte_cdk/test/standard_tests/models/scenario.py (1)
expect_exception
(72-73)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/test/standard_tests/source_base.py (1)
6-6
: Import addition looks good.The addition of the pytest import is necessary to support the new test skip functionality. Clean and straightforward.
/autofix
|
@@ -261,7 +279,7 @@ def discover( | |||
self, source_spec: ConnectorSpecification, config: TConfig | |||
) -> Iterable[AirbyteMessage]: | |||
self.set_up_secret_filter(config, source_spec.connectionSpecification) | |||
if self.source.check_config_against_spec: | |||
if self.source.check_config_during_discover: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Am I reading this right that this is a manifest level flag? Should it be a spec level definition instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bnchrch - I will think through this a bit more. As of now, it is a property of the connector base class as of now (defaulting to requiring check), overwritten by Declarative source to not require by default.
I think the desired behavior is that for declarative sources, we'd not validate config unless dynamic schemas are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to approve if this is useful for this area.
Certainly would be great to enable for all connectors via something in the spec though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it would take too much to add a proposal
Note:
|
What
Introduces new behavior to attempt discovery even if
config
is omitted. Tested successfully withsource-pokeapi
.Replaces the Devin-created PRs:
Important
discover
in practice, even though they could complete it in theory. For example, I tested the below sample run withklaviyo
in place ofpokeapi
, and the Klaviyo source tries to accessapi_key
fromconfig
during initialization of its custom components. The problem lies in the fact thatdiscover
has to callstreams()
, which tries to fully initialize allStream
objects.TODO
Future Improvements
Further future improvements could be made which delivers one or more of the following:
streams()
method are treated as warnings and not fatal errors.streams()
is used for discovery.discover
would not fail (for instance) butcheck
orread
would fail.Sample successful run:
Summary by CodeRabbit
Summary by CodeRabbit