-
Notifications
You must be signed in to change notification settings - Fork 18
feat: check for request_option mapping conflicts in individual components #328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughThe changes introduce new validation functionality across the codebase. In the initialization methods of two classes, new import statements and calls to a validation helper have been added to verify request options. Additionally, the utility module has been modified to enhance error messages and introduce a new validation function, while the test suite has been expanded to cover these validation scenarios with additional tests and fixtures. Changes
Sequence Diagram(s)sequenceDiagram
participant DTC as DatetimeBasedCursor
participant MH as mapping_helpers
DTC->>DTC: __post_init__()
DTC->>MH: _validate_component_request_option_paths(config, start_time_option, end_time_option)
MH-->>DTC: Return validation result
sequenceDiagram
participant DP as DefaultPaginator
participant MH as mapping_helpers
DP->>DP: __post_init__()
alt page_token_option provided & not RequestPath
DP->>MH: _validate_component_request_option_paths(config, page_size_option, page_token_option)
MH-->>DP: Return validation result
else No validation required
DP-->>DP: Continue initialization
end
Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
airbyte_cdk/utils/mapping_helpers.py (3)
43-44
: Consider enhancing the error message for better clarity?The error message is good but could be even more helpful by including the conflicting values. What do you think about:
- f"Request body collision, duplicate keys detected at key path: {'.'.join(current_path)}. Please ensure that all keys in the request are unique." + f"Request body collision at key path: {'.'.join(current_path)}. Found duplicate keys with values '{source_value}' and '{target_value}'. Please ensure that all keys in the request are unique."Also applies to: 51-52
117-146
: Consider adding type hints and additional error handling?The function looks good! A few suggestions to make it even better:
- Type hint for
grouped_options
could be more specific:- grouped_options: Dict[RequestOptionType, List[RequestOption]] = {} + grouped_options: Dict[RequestOptionType, List[RequestOption]] = defaultdict(list)
- The error handling could be more specific:
- except ValueError as error: - raise ValueError(error) + except ValueError as error: + raise ValueError(f"Invalid request option configuration: {error}")What do you think about these changes? They would make the code more type-safe and provide better error context.
120-123
: Consider adding usage examples to the docstring?The docstring explains the purpose well, but would it be helpful to add an example showing how the function is used? Something like:
""" Validates that a component with multiple request options does not have conflicting paths. Uses dummy values for validation since actual values might not be available at init time. Example: >>> config = {"api_key": "123"} >>> option1 = RequestOption(field_name="page", inject_into=RequestOptionType.request_parameter) >>> option2 = RequestOption(field_name="limit", inject_into=RequestOptionType.request_parameter) >>> _validate_component_request_option_paths(config, option1, option2) # No conflict """What do you think about adding this example to help future developers?
unit_tests/utils/test_mapping_helpers.py (1)
194-219
: Consider adding more edge cases to test?The edge case tests look good! Would it be valuable to add a few more cases:
- Mixed request option types with same field names
- Deeply nested JSON paths
- Special characters in field names
Something like:
@pytest.mark.parametrize( "test_name, options", [ ( "mixed_types_same_field", [ RequestOption(field_name="page", inject_into=RequestOptionType.header), RequestOption(field_name="page", inject_into=RequestOptionType.request_parameter), ], ), ( "deep_nested_json", [ RequestOption(field_path=["data", "meta", "pagination", "next"], inject_into=RequestOptionType.body_json), RequestOption(field_path=["data", "meta", "pagination", "prev"], inject_into=RequestOptionType.body_json), ], ), ( "special_chars", [ RequestOption(field_name="x-api-key", inject_into=RequestOptionType.header), RequestOption(field_name="x-client-id", inject_into=RequestOptionType.header), ], ), ], )What do you think about adding these cases to make the tests more robust?
airbyte_cdk/sources/declarative/requesters/paginators/default_paginator.py (1)
120-125
: Consider adding a comment explaining the validation?The validation looks good! Would it be helpful to add a brief comment explaining why we validate only when
page_token_option
is not aRequestPath
? Something like:# Validate request options only when page_token_option is a RequestOption # RequestPath options are handled differently and don't need path conflict validation if self.page_token_option and not isinstance(self.page_token_option, RequestPath): _validate_component_request_option_paths( self.config, self.page_size_option, self.page_token_option, )What do you think about adding this context for future maintainers?
airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py (1)
126-128
: Consider adding a test for the validation?The validation looks good! Would it be valuable to add a test case in the corresponding test file to verify this validation? Something like:
def test_datetime_based_cursor_validation(): """Test that the cursor properly validates request options.""" with pytest.raises(ValueError, match="duplicate keys detected"): DatetimeBasedCursor( start_datetime="2021-01-01", cursor_field="created_at", datetime_format="%Y-%m-%d", config={}, parameters={}, start_time_option=RequestOption(field_name="time", inject_into=RequestOptionType.request_parameter), end_time_option=RequestOption(field_name="time", inject_into=RequestOptionType.request_parameter), )What do you think about adding this test to ensure the validation works as expected?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py
(2 hunks)airbyte_cdk/sources/declarative/requesters/paginators/default_paginator.py
(2 hunks)airbyte_cdk/utils/mapping_helpers.py
(3 hunks)unit_tests/utils/test_mapping_helpers.py
(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it feels like we should have a test for datetime_based_cursor and default_paginator. I don't mind not doing the cursor one because we are slowly moving away from this implementation. I'll let you decide if it's worth for default_paginator
The rest LGTM! Thanks @ChristoGrab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
unit_tests/sources/declarative/requesters/paginators/test_default_paginator.py (1)
452-475
: The test looks good! Here are some suggestions to make it even better.The test effectively validates the request option mapping conflict detection. Would you consider these improvements? wdyt?
- Rename the test to be more descriptive, e.g.,
test_raises_error_when_page_options_have_conflicting_paths
- Add more test cases using
@pytest.mark.parametrize
to cover different field path conflicts- Assert the error message content to ensure it's helpful for debugging
Here's a possible refactor:
-def test_request_option_mapping_validator(): +@pytest.mark.parametrize( + "size_path,token_path,expected_error", + [ + (["variables", "limit"], ["variables", "limit"], "Conflicting paths found"), + (["body", "page"], ["body", "page"], "Conflicting paths found"), + (["headers", "x-page"], ["headers", "x-page"], "Conflicting paths found"), + ], + ids=["json_body_conflict", "form_data_conflict", "header_conflict"] +) +def test_raises_error_when_page_options_have_conflicting_paths(size_path, token_path, expected_error): pagination_strategy = PageIncrement( config={}, page_size=1, start_from_page=0, parameters={}, inject_on_first_request=True ) - with pytest.raises(ValueError): + with pytest.raises(ValueError, match=expected_error): ( DefaultPaginator( page_size_option=RequestOption( - field_path=["variables", "limit"], + field_path=size_path, inject_into=RequestOptionType.body_json, parameters={}, ), page_token_option=RequestOption( - field_path=["variables", "limit"], + field_path=token_path, inject_into=RequestOptionType.body_json, parameters={}, ), pagination_strategy=pagination_strategy, config=MagicMock(), url_base=MagicMock(), parameters={}, ), )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/declarative/requesters/paginators/test_default_paginator.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
* main: fix: update cryptography package to latest version to address CVE (airbytehq#377) fix: (CDK) (HttpRequester) - Make the `HttpRequester.path` optional (airbytehq#370) feat: improved custom components handling (airbytehq#350) feat: add microseconds timestamp format (airbytehq#373) fix: Replace Unidecode with anyascii for permissive license (airbytehq#367) feat: add IncrementingCountCursor (airbytehq#346) feat: (low-code cdk) datetime format with milliseconds (airbytehq#369) fix: (CDK) (AsyncRetriever) - Improve UX on variable naming and interpolation (airbytehq#368) fix: (CDK) (AsyncRetriever) - Add the `request` and `response` to each `async` operations (airbytehq#356) fix: (CDK) (ConnectorBuilder) - Add `auxiliary requests` to slice; support `TestRead` for AsyncRetriever (part 1/2) (airbytehq#355) feat(concurrent perpartition cursor): Add parent state updates (airbytehq#343) fix: update csv parser for builder compatibility (airbytehq#364) feat(low-code cdk): add interpolation for limit field in Rate (airbytehq#353) feat(low-code cdk): add AbstractStreamFacade processing as concurrent streams in declarative source (airbytehq#347) fix: (CDK) (CsvParser) - Fix the `\\` escaping when passing the `delimiter` from Builder's UI (airbytehq#358) feat: expose `str_to_datetime` jinja macro (airbytehq#351) fix: update CDK migration for 6.34.0 (airbytehq#348) feat: Removes `stream_state` interpolation from CDK (airbytehq#320) fix(declarative): Pass `extra_fields` in `global_substream_cursor` (airbytehq#195) feat(concurrent perpartition cursor): Refactor ConcurrentPerPartitionCursor (airbytehq#331) feat(HttpMocker): adding support for PUT requests and bytes responses (airbytehq#342) chore: use certified source for manifest-only test (airbytehq#338) feat: check for request_option mapping conflicts in individual components (airbytehq#328) feat(file-based): sync file acl permissions and identities (airbytehq#260) fix: (CDK) (Connector Builder) - refactor the `MessageGrouper` > `TestRead` (airbytehq#332) fix(low code): Fix missing cursor for ClientSideIncrementalRecordFilterDecorator (airbytehq#334) feat(low-code): Add API Budget (airbytehq#314) chore(decoder): clean decoders and make csvdecoder available (airbytehq#326)
What
Although we check for duplicate keys when combining all request option mappings in a retriever, individual components that define more than one request option lack validations to confirm that there are no key mapping errors. Therefore, we currently override values when a conflict occurs, ie:
This PR adds a new validation specifically for components that allow multiple request options, to ensure that mappings are combined and checked for each component individually.
How
_validate_component_request_option_paths
. This function takes an arbitrary number of request option keys and reuses the existingcombine_mappings
implementation to check for duplicate keys. A dict with dummy values is constructed from the component's keys (since the actual values may not be available at time of init). Resolves #11580Summary by CodeRabbit
Summary by CodeRabbit
New Features
Bug Fixes