-
Notifications
You must be signed in to change notification settings - Fork 31
feat(concurrent cursor): attempt at clamping datetime #234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/autofix
|
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
📝 WalkthroughWalkthroughThis pull request introduces a new Changes
Sequence DiagramsequenceDiagram
participant User as User Configuration
participant Cursor as DatetimeBasedCursor
participant Strategy as ClampingStrategy
participant EndProvider as ClampingEndProvider
User->>Cursor: Define clamping (DAY/WEEK/MONTH)
Cursor->>Strategy: Select appropriate strategy
Strategy->>EndProvider: Apply clamping rules
EndProvider-->>Cursor: Return clamped cursor value
Possibly related PRs
Suggested labels
Suggested reviewers
Hey there! 👋 Would you like me to elaborate on any part of these changes? The clamping strategies look pretty neat - wdyt? The implementation seems to provide a flexible way to control time-based data synchronization. Any specific aspects you'd like me to dive deeper into? 🕰️🔍 ✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (6)
airbyte_cdk/sources/streams/concurrent/clamping.py (1)
85-94: Consider adding comments to clarify calculations inWeekClampingStrategy.clamp()The calculations for
days_diff_to_ceilinganddeltain theclampmethod are a bit complex. Adding comments to explain the logic might improve readability and maintainability. Wdyt?airbyte_cdk/sources/streams/concurrent/cursor.py (1)
418-420: Address theFIXMEcomment regarding clamping in_split_per_slice_rangeThere's a
FIXMEcomment indicating that whenclamped_lower >= clamped_upper, the handling might need adjustment, possibly replacing thebreakstatement with proper end provider handling. Would you like assistance in resolving this? Wdyt?airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1105-1122: Simplify weekday mapping in_assemble_weekdayfunctionWould it be beneficial to simplify the
_assemble_weekdayfunction by directly mapping the input string to theWeekdayenum usingWeekday[weekday.upper()]? This could reduce the amount of code and improve maintainability. WDYT?Here's how it might look:
def _assemble_weekday(self, weekday: str) -> Weekday: - match weekday: - case "MONDAY": - return Weekday.MONDAY - case "TUESDAY": - return Weekday.TUESDAY - case "WEDNESDAY": - return Weekday.WEDNESDAY - case "THURSDAY": - return Weekday.THURSDAY - case "FRIDAY": - return Weekday.FRIDAY - case "SATURDAY": - return Weekday.SATURDAY - case "SUNDAY": - return Weekday.SUNDAY - case _: - raise ValueError(f"Unknown weekday {weekday}") + try: + return Weekday[weekday.upper()] + except KeyError: + raise ValueError(f"Unknown weekday {weekday}")unit_tests/sources/streams/concurrent/test_clamping.py (1)
21-24: Useself.assertEqualinstead ofassertin test casesWould it be better to use
self.assertEqualand otherself.assert*methods provided byunittest.TestCaseinstead of bareassertstatements? This can improve test readability and provide more informative error messages on test failures. WDYT?For example:
# DayClampingStrategyTest - test_when_clamp_then_remove_every_unit_smaller_than_days - assert result.hour == 0 - assert result.minute == 0 - assert result.second == 0 - assert result.microsecond == 0 + self.assertEqual(result.hour, 0) + self.assertEqual(result.minute, 0) + self.assertEqual(result.second, 0) + self.assertEqual(result.microsecond, 0)Also applies to: 42-45, 72-75
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
337-340: Would you consider adding docstrings to clarify the purpose of target_details?The
Clampingmodel looks good, but it might be helpful to document what kind of details can be specified intarget_detailsand provide some examples, wdyt?airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
787-797: Would you consider adding a description field for the clamping property?The schema structure looks good, but adding a description field would help users understand:
- The purpose of clamping
- How it affects cursor behavior
- When to use each target value (DAY, WEEK, MONTH)
- What can be specified in target_details
wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml(1 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py(3 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(5 hunks)airbyte_cdk/sources/streams/concurrent/clamping.py(1 hunks)airbyte_cdk/sources/streams/concurrent/cursor.py(6 hunks)airbyte_cdk/sources/streams/concurrent/cursor_types.py(1 hunks)unit_tests/sources/streams/concurrent/test_clamping.py(1 hunks)unit_tests/sources/streams/concurrent/test_cursor.py(4 hunks)
🔇 Additional comments (4)
airbyte_cdk/sources/streams/concurrent/clamping.py (1)
70-76: Consider adding a comment to explainWeekdayenum valuesWould it be helpful to add a comment indicating that the
Weekdayenum values align with Python'sdatetime.weekday()numbering (Monday=0, ..., Sunday=6)? This could enhance readability and prevent confusion. Wdyt?airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1064-1064: Consider using granularity fromdatetime_based_cursor_modelI noticed that the granularity is hardcoded to
timedelta(seconds=1). Should we update it to usedatetime_based_cursor_model.granularityto make it configurable and consistent with other implementations? WDYT?airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
331-335: LGTM! Clear and well-structured enum for time granularity.The
Targetenum provides a clean way to specify time-based clamping granularity with standard time units.
1468-1468: LGTM! Clean integration of the clamping feature.The optional clamping field is well-integrated and maintains backward compatibility.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
brianjlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving. i responded to some of your questions and the next commit I push has what you have mentioned addressed
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
…granularity for clamping, formatting, add testing for clamping on ModelToComponentFactory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
331-339: Consider adding validation for target values.The
targetfield accepts any string value, but based on the examples and implementation, only "DAY", "WEEK", and "MONTH" are valid. Should we add an enum to restrict the values? WDYT?+class ClampingTarget(Enum): + DAY = "DAY" + WEEK = "WEEK" + MONTH = "MONTH" class Clamping(BaseModel): - target: str = Field( + target: Union[str, ClampingTarget] = Field( ..., description="The period of time that datetime windows will be clamped by", examples=["DAY", "WEEK", "MONTH", "{{ config['target'] }}"], title="Target", ) target_details: Optional[Dict[str, Any]] = None
Line range hint
1057-1102: Consider extracting clamping strategy creation into a separate method.The clamping logic in
create_concurrent_cursor_from_datetime_based_cursoris quite complex. Should we extract it into a dedicated method for better readability? WDYT?+ def _create_clamping_strategy( + self, + clamping_model: Optional[Clamping], + config: Config, + parameters: Optional[Dict[str, Any]], + end_date_provider: Callable[[], datetime.datetime], + cursor_granularity: Optional[datetime.timedelta], + ) -> Tuple[ClampingStrategy, Callable[[], datetime.datetime]]: + if not clamping_model: + return NoClamping(), end_date_provider + + target = InterpolatedString( + string=clamping_model.target, + parameters=parameters or {}, + ) + evaluated_target = target.eval(config=config) + match evaluated_target: + case "DAY": + return ( + DayClampingStrategy(), + ClampingEndProvider( + DayClampingStrategy(is_ceiling=False), + end_date_provider, + granularity=cursor_granularity or timedelta(seconds=1), + ), + ) + # ... rest of the casesairbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
787-809: The clamping property looks well-designed! A few thoughts for consideration.The implementation provides a clean way to adjust datetime window boundaries. I particularly like the extensibility with
target_detailsfor future enhancements.I noticed the comment about ideally using an enum for the target field. Would it make sense to add validation logic to ensure the interpolated values match the expected "DAY", "WEEK", "MONTH" options, even though we can't use an enum directly? This could help catch configuration errors early. wdyt?
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1057-1102: Consider improving type system to avoid type ignores.The implementation looks solid, but there are several type ignore comments. Would it be worth improving the type definitions for
GapTypeandCursorValueTypeto avoid these? This could make the code more maintainable in the long run. WDYT?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml(1 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py(4 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(5 hunks)airbyte_cdk/sources/streams/concurrent/clamping.py(1 hunks)unit_tests/sources/declarative/parsers/test_model_to_component_factory.py(2 hunks)unit_tests/sources/streams/concurrent/test_cursor.py(4 hunks)
👮 Files not reviewed due to content moderation or server errors (3)
- unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
- airbyte_cdk/sources/streams/concurrent/clamping.py
- unit_tests/sources/streams/concurrent/test_cursor.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (7)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)
1464-1468: LGTM! Clear description and well-structured field definition.The clamping field is properly integrated into the
DatetimeBasedCursormodel with a descriptive explanation of its purpose.
Line range hint
462-470: LGTM! Well-organized imports.The clamping-related imports are properly grouped and ordered.
Line range hint
1121-1138: LGTM! Well-structured weekday conversion.The
_assemble_weekdaymethod is clear and handles validation appropriately.airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)
462-470: LGTM! Clean import organization.The new imports for clamping functionality are well-organized and follow the project's conventions.
1070-1074: Question about granularity consistency.I notice different default granularities:
- DAY:
timedelta(seconds=1)- WEEK:
timedelta(days=1)- MONTH:
timedelta(days=1)Should we maintain consistency by using the same granularity for all strategies? Or is this difference intentional? WDYT?
Also applies to: 1087-1091, 1094-1098
1059-1065: Great documentation on interpolation decision!Really appreciate the clear comment explaining why interpolation is done in the model factory. This helps future maintainers understand the design decision.
1121-1138: Clean implementation of weekday conversion!The
_assemble_weekdaymethod is well-implemented:
- Uses pattern matching for readability
- Handles all weekday cases
- Includes proper error handling for unknown values
|
I pushed a commit that does the following:
If you want to give this one more look, otherwise I'll merge this in the morning so that we can rebase this into the concurrent cursor for async retriever branch and I'll continue retesting with amazon seller partner on that CDK version |
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)
1062-1064: Enhance the comment about interpolation.The current comment explains why interpolation is done in the model factory, but it could be more descriptive about the trade-offs and alternatives considered. Perhaps we could expand it to include why shifting interpolation to runtime would make the
ConcurrentCursorless maintainable? WDYT?
1073-1077: Consider simplifying granularity fallback logic.The pattern
cursor_granularity or timedelta(...)is repeated for each clamping type. Would it make sense to extract this into a helper method or set a default granularity at the beginning of the method? This could make the code more DRY and easier to maintain. WDYT?+ def _get_clamping_granularity(self, cursor_granularity: Optional[timedelta], default: timedelta) -> timedelta: + return cursor_granularity or default # Then use it like: - granularity=cursor_granularity or datetime.timedelta(seconds=1), + granularity=self._get_clamping_granularity(cursor_granularity, datetime.timedelta(seconds=1)),Also applies to: 1090-1094, 1097-1101
1075-1075: Improve type ignore comments.The type ignore comments mention "Having issues w/ inspection for GapType and CursorValueType as shown in existing tests". Could we add a reference to the specific test case or issue that demonstrates this problem? This would help future maintainers understand why these ignores are necessary.
Also applies to: 1092-1092, 1099-1099
1124-1141: Consider using a mapping for weekday conversion.The current implementation uses a match statement which is clean but could be replaced with a more concise mapping approach. This could make the code more maintainable and potentially more efficient. WDYT?
def _assemble_weekday(self, weekday: str) -> Weekday: - match weekday: - case "MONDAY": - return Weekday.MONDAY - case "TUESDAY": - return Weekday.TUESDAY - case "WEDNESDAY": - return Weekday.WEDNESDAY - case "THURSDAY": - return Weekday.THURSDAY - case "FRIDAY": - return Weekday.FRIDAY - case "SATURDAY": - return Weekday.SATURDAY - case "SUNDAY": - return Weekday.SUNDAY - case _: - raise ValueError(f"Unknown weekday {weekday}") + weekday_map = { + "MONDAY": Weekday.MONDAY, + "TUESDAY": Weekday.TUESDAY, + "WEDNESDAY": Weekday.WEDNESDAY, + "THURSDAY": Weekday.THURSDAY, + "FRIDAY": Weekday.FRIDAY, + "SATURDAY": Weekday.SATURDAY, + "SUNDAY": Weekday.SUNDAY, + } + if weekday not in weekday_map: + raise ValueError(f"Unknown weekday {weekday}") + return weekday_map[weekday]
1080-1085: Enhance error message for missing weekday configuration.The error message could be more helpful by including the available weekday options. This would make it easier for users to fix configuration issues. WDYT?
raise ValueError( - "Given WEEK clamping, weekday needs to be provided as target_details" + "Given WEEK clamping, weekday needs to be provided as target_details. " + "Available weekdays are: MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY" )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
465-473: LGTM! Clean import organization.The new clamping-related imports are well-organized and properly grouped with other stream-related imports.
What
source-amazon-seller-partner has reports that require specific start and end time (example). The constraints are:
How
Adding clamping in ConcurrentCursor
Can this change be reverted?
Yes
Summary by CodeRabbit
New Features
Improvements
Technical Updates