Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(concurrent cursor): attempt at clamping datetime #234

Merged
merged 8 commits into from
Jan 22, 2025

Conversation

maxi297
Copy link
Contributor

@maxi297 maxi297 commented Jan 20, 2025

What

source-amazon-seller-partner has reports that require specific start and end time (example). The constraints are:

  • Day: based on this, the lower boundary would have to truncate the hours, minutes, seconds and microseconds
pendulum.now("utc").start_of("day")
DateTime(2025, 1, 21, 0, 0, 0, tzinfo=Timezone('UTC'))
pendulum.now("utc").end_of("day")
DateTime(2025, 1, 21, 23, 59, 59, 999999, tzinfo=Timezone('UTC'))
  • Week: Weekday should be Sunday but in the current implementation, we made it configurable
  • Month: Slices will start on the first day of the month

How

Adding clamping in ConcurrentCursor

Can this change be reverted?

Yes

Summary by CodeRabbit

  • New Features

    • Added advanced clamping strategies for date-based cursors.
    • Introduced time-based cursor control with day, week, and month granularity.
    • Enhanced concurrent stream slicing with flexible time interval management.
  • Improvements

    • Expanded cursor configuration options for more precise data retrieval.
    • Implemented robust time-based data synchronization strategies.
  • Technical Updates

    • Updated cursor and stream handling mechanisms.
    • Added new protocols and strategies for cursor value management.
    • Introduced comprehensive unit tests for clamping strategies and cursor behavior.

@maxi297
Copy link
Contributor Author

maxi297 commented Jan 20, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

@maxi297 maxi297 requested a review from brianjlai January 21, 2025 01:11
@maxi297 maxi297 marked this pull request as ready for review January 21, 2025 15:58
Copy link
Contributor

coderabbitai bot commented Jan 21, 2025

📝 Walkthrough

Walkthrough

This pull request introduces a new clamping property for the DatetimeBasedCursor in Airbyte's CDK, enhancing time-based data synchronization capabilities. The changes span multiple files, adding support for different clamping strategies (DAY, WEEK, MONTH) that allow more precise control over cursor behavior when retrieving time-based data. The implementation includes new classes, strategies, and unit tests to validate the functionality of these clamping mechanisms.

Changes

File Change Summary
airbyte_cdk/sources/declarative/declarative_component_schema.yaml Added clamping property to DatetimeBasedCursor definition
airbyte_cdk/sources/declarative/models/declarative_component_schema.py Added Clamping class, updated DatetimeBasedCursor and TypesMap classes
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Updated to support clamping strategies, added method to handle weekday conversion
airbyte_cdk/sources/streams/concurrent/clamping.py Introduced new clamping strategy classes and Weekday enum
airbyte_cdk/sources/streams/concurrent/cursor.py Removed GapType and CursorValueType protocols, added clamping strategy support
airbyte_cdk/sources/streams/concurrent/cursor_types.py Added GapType and CursorValueType protocols with new methods
unit_tests/sources/streams/concurrent/test_clamping.py Added unit tests for clamping strategies
unit_tests/sources/streams/concurrent/test_cursor.py Added integration tests for clamping functionality
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py Added tests for concurrent cursor creation with clamping

Sequence Diagram

sequenceDiagram
    participant User as User Configuration
    participant Cursor as DatetimeBasedCursor
    participant Strategy as ClampingStrategy
    participant EndProvider as ClampingEndProvider

    User->>Cursor: Define clamping (DAY/WEEK/MONTH)
    Cursor->>Strategy: Select appropriate strategy
    Strategy->>EndProvider: Apply clamping rules
    EndProvider-->>Cursor: Return clamped cursor value
Loading

Possibly related PRs

  • chore(refactor): refactor partition generator to take any stream slicer #39: The changes in the main PR regarding the DatetimeBasedCursor and its new clamping property relate to the modifications in the ConcurrentDeclarativeSource, which now accommodates both SimpleRetriever and AsyncRetriever types, enhancing the handling of streams and their configurations.
  • feat(low-code cdk): add component resolver and http component resolver #88: The introduction of the DynamicDeclarativeStream and HttpComponentsResolver in the main PR aligns with the enhancements made to the ConcurrentDeclarativeSource, which now supports dynamic stream configurations, improving the overall flexibility of stream handling.
  • feat(low-code cdk): add dynamic schema loader #104: The addition of the DynamicSchemaLoader, SchemaTypeIdentifier, and TypesMap in the main PR complements the changes in the ConcurrentDeclarativeSource, which has been updated to handle dynamic configurations, thus enhancing the schema management capabilities.

Suggested labels

enhancement, bug

Suggested reviewers

  • maxi297

Hey there! 👋 Would you like me to elaborate on any part of these changes? The clamping strategies look pretty neat - wdyt? The implementation seems to provide a flexible way to control time-based data synchronization. Any specific aspects you'd like me to dive deeper into? 🕰️🔍

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
airbyte_cdk/sources/streams/concurrent/clamping.py (1)

85-94: Consider adding comments to clarify calculations in WeekClampingStrategy.clamp()

The calculations for days_diff_to_ceiling and delta in the clamp method are a bit complex. Adding comments to explain the logic might improve readability and maintainability. Wdyt?

airbyte_cdk/sources/streams/concurrent/cursor.py (1)

418-420: Address the FIXME comment regarding clamping in _split_per_slice_range

There's a FIXME comment indicating that when clamped_lower >= clamped_upper, the handling might need adjustment, possibly replacing the break statement with proper end provider handling. Would you like assistance in resolving this? Wdyt?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

1105-1122: Simplify weekday mapping in _assemble_weekday function

Would it be beneficial to simplify the _assemble_weekday function by directly mapping the input string to the Weekday enum using Weekday[weekday.upper()]? This could reduce the amount of code and improve maintainability. WDYT?

Here's how it might look:

 def _assemble_weekday(self, weekday: str) -> Weekday:
-    match weekday:
-        case "MONDAY":
-            return Weekday.MONDAY
-        case "TUESDAY":
-            return Weekday.TUESDAY
-        case "WEDNESDAY":
-            return Weekday.WEDNESDAY
-        case "THURSDAY":
-            return Weekday.THURSDAY
-        case "FRIDAY":
-            return Weekday.FRIDAY
-        case "SATURDAY":
-            return Weekday.SATURDAY
-        case "SUNDAY":
-            return Weekday.SUNDAY
-        case _:
-            raise ValueError(f"Unknown weekday {weekday}")
+    try:
+        return Weekday[weekday.upper()]
+    except KeyError:
+        raise ValueError(f"Unknown weekday {weekday}")
unit_tests/sources/streams/concurrent/test_clamping.py (1)

21-24: Use self.assertEqual instead of assert in test cases

Would it be better to use self.assertEqual and other self.assert* methods provided by unittest.TestCase instead of bare assert statements? This can improve test readability and provide more informative error messages on test failures. WDYT?

For example:

         # DayClampingStrategyTest - test_when_clamp_then_remove_every_unit_smaller_than_days
-        assert result.hour == 0
-        assert result.minute == 0
-        assert result.second == 0
-        assert result.microsecond == 0
+        self.assertEqual(result.hour, 0)
+        self.assertEqual(result.minute, 0)
+        self.assertEqual(result.second, 0)
+        self.assertEqual(result.microsecond, 0)

Also applies to: 42-45, 72-75

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

337-340: Would you consider adding docstrings to clarify the purpose of target_details?

The Clamping model looks good, but it might be helpful to document what kind of details can be specified in target_details and provide some examples, wdyt?

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

787-797: Would you consider adding a description field for the clamping property?

The schema structure looks good, but adding a description field would help users understand:

  1. The purpose of clamping
  2. How it affects cursor behavior
  3. When to use each target value (DAY, WEEK, MONTH)
  4. What can be specified in target_details

wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17dd71f and e9003b1.

📒 Files selected for processing (8)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5 hunks)
  • airbyte_cdk/sources/streams/concurrent/clamping.py (1 hunks)
  • airbyte_cdk/sources/streams/concurrent/cursor.py (6 hunks)
  • airbyte_cdk/sources/streams/concurrent/cursor_types.py (1 hunks)
  • unit_tests/sources/streams/concurrent/test_clamping.py (1 hunks)
  • unit_tests/sources/streams/concurrent/test_cursor.py (4 hunks)
🔇 Additional comments (4)
airbyte_cdk/sources/streams/concurrent/clamping.py (1)

70-76: Consider adding a comment to explain Weekday enum values

Would it be helpful to add a comment indicating that the Weekday enum values align with Python's datetime.weekday() numbering (Monday=0, ..., Sunday=6)? This could enhance readability and prevent confusion. Wdyt?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

1064-1064: Consider using granularity from datetime_based_cursor_model

I noticed that the granularity is hardcoded to timedelta(seconds=1). Should we update it to use datetime_based_cursor_model.granularity to make it configurable and consistent with other implementations? WDYT?

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)

331-335: LGTM! Clear and well-structured enum for time granularity.

The Target enum provides a clean way to specify time-based clamping granularity with standard time units.


1468-1468: LGTM! Clean integration of the clamping feature.

The optional clamping field is well-integrated and maintains backward compatibility.

Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving. i responded to some of your questions and the next commit I push has what you have mentioned addressed

…granularity for clamping, formatting, add testing for clamping on ModelToComponentFactory
@brianjlai brianjlai changed the title attempt at clamping datetime feat(concurrent cursor) attempt at clamping datetime Jan 22, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)

331-339: Consider adding validation for target values.

The target field accepts any string value, but based on the examples and implementation, only "DAY", "WEEK", and "MONTH" are valid. Should we add an enum to restrict the values? WDYT?

+class ClampingTarget(Enum):
+    DAY = "DAY"
+    WEEK = "WEEK"
+    MONTH = "MONTH"

class Clamping(BaseModel):
-    target: str = Field(
+    target: Union[str, ClampingTarget] = Field(
        ...,
        description="The period of time that datetime windows will be clamped by",
        examples=["DAY", "WEEK", "MONTH", "{{ config['target'] }}"],
        title="Target",
    )
    target_details: Optional[Dict[str, Any]] = None

Line range hint 1057-1102: Consider extracting clamping strategy creation into a separate method.

The clamping logic in create_concurrent_cursor_from_datetime_based_cursor is quite complex. Should we extract it into a dedicated method for better readability? WDYT?

+    def _create_clamping_strategy(
+        self,
+        clamping_model: Optional[Clamping],
+        config: Config,
+        parameters: Optional[Dict[str, Any]],
+        end_date_provider: Callable[[], datetime.datetime],
+        cursor_granularity: Optional[datetime.timedelta],
+    ) -> Tuple[ClampingStrategy, Callable[[], datetime.datetime]]:
+        if not clamping_model:
+            return NoClamping(), end_date_provider
+
+        target = InterpolatedString(
+            string=clamping_model.target,
+            parameters=parameters or {},
+        )
+        evaluated_target = target.eval(config=config)
+        match evaluated_target:
+            case "DAY":
+                return (
+                    DayClampingStrategy(),
+                    ClampingEndProvider(
+                        DayClampingStrategy(is_ceiling=False),
+                        end_date_provider,
+                        granularity=cursor_granularity or timedelta(seconds=1),
+                    ),
+                )
+            # ... rest of the cases
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

787-809: The clamping property looks well-designed! A few thoughts for consideration.

The implementation provides a clean way to adjust datetime window boundaries. I particularly like the extensibility with target_details for future enhancements.

I noticed the comment about ideally using an enum for the target field. Would it make sense to add validation logic to ensure the interpolated values match the expected "DAY", "WEEK", "MONTH" options, even though we can't use an enum directly? This could help catch configuration errors early. wdyt?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

1057-1102: Consider improving type system to avoid type ignores.

The implementation looks solid, but there are several type ignore comments. Would it be worth improving the type definitions for GapType and CursorValueType to avoid these? This could make the code more maintainable in the long run. WDYT?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e9003b1 and 26752b2.

📒 Files selected for processing (6)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5 hunks)
  • airbyte_cdk/sources/streams/concurrent/clamping.py (1 hunks)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2 hunks)
  • unit_tests/sources/streams/concurrent/test_cursor.py (4 hunks)
👮 Files not reviewed due to content moderation or server errors (3)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
  • airbyte_cdk/sources/streams/concurrent/clamping.py
  • unit_tests/sources/streams/concurrent/test_cursor.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)
🔇 Additional comments (7)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)

1464-1468: LGTM! Clear description and well-structured field definition.

The clamping field is properly integrated into the DatetimeBasedCursor model with a descriptive explanation of its purpose.


Line range hint 462-470: LGTM! Well-organized imports.

The clamping-related imports are properly grouped and ordered.


Line range hint 1121-1138: LGTM! Well-structured weekday conversion.

The _assemble_weekday method is clear and handles validation appropriately.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)

462-470: LGTM! Clean import organization.

The new imports for clamping functionality are well-organized and follow the project's conventions.


1070-1074: Question about granularity consistency.

I notice different default granularities:

  • DAY: timedelta(seconds=1)
  • WEEK: timedelta(days=1)
  • MONTH: timedelta(days=1)

Should we maintain consistency by using the same granularity for all strategies? Or is this difference intentional? WDYT?

Also applies to: 1087-1091, 1094-1098


1059-1065: Great documentation on interpolation decision!

Really appreciate the clear comment explaining why interpolation is done in the model factory. This helps future maintainers understand the design decision.


1121-1138: Clean implementation of weekday conversion!

The _assemble_weekday method is well-implemented:

  • Uses pattern matching for readability
  • Handles all weekday cases
  • Includes proper error handling for unknown values

@brianjlai
Copy link
Contributor

@maxi297

I pushed a commit that does the following:

  • allow for interpolation of the config to get the clamping value: This has the drawback of not being able to use enums since the string could be anything. I did also notice we do this for JWT, but interpolation on JWT doesn't work :/
  • use the cursor_granularity for the clamping end provider
  • Add tests to ModelToComponentFactory to verify that we create the right clamping strategy when parsing the model
  • There were some formatting and things like an duplicated test that coderabbit caught. fixed those too

If you want to give this one more look, otherwise I'll merge this in the morning so that we can rebase this into the concurrent cursor for async retriever branch and I'll continue retesting with amazon seller partner on that CDK version

@brianjlai brianjlai changed the title feat(concurrent cursor) attempt at clamping datetime feat(concurrent cursor): attempt at clamping datetime Jan 22, 2025
@github-actions github-actions bot added the enhancement New feature or request label Jan 22, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)

1062-1064: Enhance the comment about interpolation.

The current comment explains why interpolation is done in the model factory, but it could be more descriptive about the trade-offs and alternatives considered. Perhaps we could expand it to include why shifting interpolation to runtime would make the ConcurrentCursor less maintainable? WDYT?


1073-1077: Consider simplifying granularity fallback logic.

The pattern cursor_granularity or timedelta(...) is repeated for each clamping type. Would it make sense to extract this into a helper method or set a default granularity at the beginning of the method? This could make the code more DRY and easier to maintain. WDYT?

+ def _get_clamping_granularity(self, cursor_granularity: Optional[timedelta], default: timedelta) -> timedelta:
+     return cursor_granularity or default

  # Then use it like:
- granularity=cursor_granularity or datetime.timedelta(seconds=1),
+ granularity=self._get_clamping_granularity(cursor_granularity, datetime.timedelta(seconds=1)),

Also applies to: 1090-1094, 1097-1101


1075-1075: Improve type ignore comments.

The type ignore comments mention "Having issues w/ inspection for GapType and CursorValueType as shown in existing tests". Could we add a reference to the specific test case or issue that demonstrates this problem? This would help future maintainers understand why these ignores are necessary.

Also applies to: 1092-1092, 1099-1099


1124-1141: Consider using a mapping for weekday conversion.

The current implementation uses a match statement which is clean but could be replaced with a more concise mapping approach. This could make the code more maintainable and potentially more efficient. WDYT?

  def _assemble_weekday(self, weekday: str) -> Weekday:
-     match weekday:
-         case "MONDAY":
-             return Weekday.MONDAY
-         case "TUESDAY":
-             return Weekday.TUESDAY
-         case "WEDNESDAY":
-             return Weekday.WEDNESDAY
-         case "THURSDAY":
-             return Weekday.THURSDAY
-         case "FRIDAY":
-             return Weekday.FRIDAY
-         case "SATURDAY":
-             return Weekday.SATURDAY
-         case "SUNDAY":
-             return Weekday.SUNDAY
-         case _:
-             raise ValueError(f"Unknown weekday {weekday}")
+     weekday_map = {
+         "MONDAY": Weekday.MONDAY,
+         "TUESDAY": Weekday.TUESDAY,
+         "WEDNESDAY": Weekday.WEDNESDAY,
+         "THURSDAY": Weekday.THURSDAY,
+         "FRIDAY": Weekday.FRIDAY,
+         "SATURDAY": Weekday.SATURDAY,
+         "SUNDAY": Weekday.SUNDAY,
+     }
+     if weekday not in weekday_map:
+         raise ValueError(f"Unknown weekday {weekday}")
+     return weekday_map[weekday]

1080-1085: Enhance error message for missing weekday configuration.

The error message could be more helpful by including the available weekday options. This would make it easier for users to fix configuration issues. WDYT?

  raise ValueError(
-     "Given WEEK clamping, weekday needs to be provided as target_details"
+     "Given WEEK clamping, weekday needs to be provided as target_details. "
+     "Available weekdays are: MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY"
  )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26752b2 and 299bacf.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

465-473: LGTM! Clean import organization.

The new clamping-related imports are well-organized and properly grouped with other stream-related imports.

@brianjlai brianjlai merged commit 14375fe into main Jan 22, 2025
15 of 19 checks passed
@brianjlai brianjlai deleted the maxi297/attempt-at-clamping-datetime branch January 22, 2025 19:23
rpopov pushed a commit to rpopov/airbyte-python-cdk that referenced this pull request Jan 23, 2025
* remotes/airbyte/main:
  fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254)
  feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236)
  feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111)
  fix: don't mypy unit_tests (airbytehq#241)
  fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225)
  feat(concurrent cursor): attempt at clamping datetime (airbytehq#234)
  ci: use `ubuntu-24.04` explicitly (resolves CI warnings) (airbytehq#244)
  Fix(sdm): module ref issue in python components import (airbytehq#243)
  feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174)
  chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133)
  docs: comments on what the `Dockerfile` is for (airbytehq#240)
  chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237)
rpopov added a commit to rpopov/airbyte-python-cdk that referenced this pull request Jan 26, 2025
Created a DPath Enhancing Extractor
Refactored the record enhancement logic - moved to the extracted class
Split the tests of DPathExtractor and DPathEnhancingExtractor

Fix the failing tests:

FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_custom_components[test_create_custom_component_with_subcomponent_that_uses_parameters]
FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_custom_components_do_not_contain_extra_fields
FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_parse_custom_component_fields_if_subcomponent
FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_page_increment
FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_offset_increment
FAILED unit_tests/sources/file_based/test_file_based_scenarios.py::test_file_based_read[simple_unstructured_scenario]
FAILED unit_tests/sources/file_based/test_file_based_scenarios.py::test_file_based_read[no_file_extension_unstructured_scenario]

They faile because of comparing string and int values of the page_size (public) attribute.
Imposed an invariant:
  on construction, page_size can be set to a string or int
  keep only values of one type in page_size for uniform comparison (convert the values of the other type)
  _page_size holds the internal / working value
... unless manipulated directly.

Merged:
feat(low-code concurrent): Allow async job low-code streams that are incremental to be run by the concurrent framework (airbytehq#228)
fix(low-code): Fix declarative low-code state migration in SubstreamPartitionRouter (airbytehq#267)
feat: combine slash command jobs into single job steps (airbytehq#266)
feat(low-code): add items and property mappings to dynamic schemas (airbytehq#256)
feat: add help response for unrecognized slash commands (airbytehq#264)
ci: post direct links to html connector test reports (airbytehq#252) (airbytehq#263)
fix(low-code): Fix legacy state migration in SubstreamPartitionRouter (airbytehq#261)
fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254)
feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236)
feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111)
fix: don't mypy unit_tests (airbytehq#241)
fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225)
feat(concurrent cursor): attempt at clamping datetime (airbytehq#234)
fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254)
feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236)
feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111)
fix: don't mypy unit_tests (airbytehq#241)
fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225)
feat(concurrent cursor): attempt at clamping datetime (airbytehq#234)
ci: use `ubuntu-24.04` explicitly (resolves CI warnings) (airbytehq#244)
Fix(sdm): module ref issue in python components import (airbytehq#243)
feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174)
chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133)
docs: comments on what the `Dockerfile` is for (airbytehq#240)
chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237)
Fix(sdm): module ref issue in python components import (airbytehq#243)
feat(low-code): add DpathFlattenFields (airbytehq#227)
feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174)
chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133)
docs: comments on what the `Dockerfile` is for (airbytehq#240)
chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237)
rpopov added a commit to rpopov/airbyte-python-cdk that referenced this pull request Feb 8, 2025
At record extraction step, in each record add the service field $root holding a reference to:
* the root response object, when parsing JSON format
* the original record, when parsing JSONL format
that each record to process is extracted from.
More service fields could be added in future.
The service fields are available in the record's filtering and transform steps.

Avoid:
* reusing the maps/dictionaries produced, thus avoid building cyclic structures
* transforming the service fields in the Flatten transformation.

Explicitly cleanup the service field(s) after the transform step, thus making them:
* local for the filter and transform steps
* not visible to the next mapping and store steps (as they should be)
* not visible in the tests beyond the test_record_selector (as they should be)
This allows the record transformation logic to define its "local variables" to reuse
some interim calculations.

The contract of body parsing seems irregular in representing the cases of bad JSON, no JSON and empty JSON.
Cannot be unified as that that irregularity is already used.

Update the development environment setup documentation
* to organize and present the setup steps explicitly
* to avoid misunderstandings and wasted efforts.

Update CONTRIBUTING.md to
* collect and organize the knowledge on running the test locally.
* state the actual testing steps.
* clarify and make explicit the procedures and steps.

The unit, integration, and acceptance tests in this exactly version succeed under Fedora 41, while
one of them fails under Oracle Linux 8.7. not related to the contents of this PR.
The integration tests of the CDK fail due to missing `secrets/config.json` file for the Shopify source.
See airbytehq#197

Polish

Integrate the DpathEnhancingExtractor in the UI of Airbyte.
Created a DPath Enhancing Extractor
Refactored the record enhancement logic - moved to the extracted class
Split the tests of DPathExtractor and DPathEnhancingExtractor

Fix the failing tests:

FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_custom_components[test_create_custom_component_with_subcomponent_that_uses_parameters]
FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_custom_components_do_not_contain_extra_fields
FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_parse_custom_component_fields_if_subcomponent
FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_page_increment
FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_offset_increment
FAILED unit_tests/sources/file_based/test_file_based_scenarios.py::test_file_based_read[simple_unstructured_scenario]
FAILED unit_tests/sources/file_based/test_file_based_scenarios.py::test_file_based_read[no_file_extension_unstructured_scenario]

They faile because of comparing string and int values of the page_size (public) attribute.
Imposed an invariant:
  on construction, page_size can be set to a string or int
  keep only values of one type in page_size for uniform comparison (convert the values of the other type)
  _page_size holds the internal / working value
... unless manipulated directly.

Merged:
feat(low-code concurrent): Allow async job low-code streams that are incremental to be run by the concurrent framework (airbytehq#228)
fix(low-code): Fix declarative low-code state migration in SubstreamPartitionRouter (airbytehq#267)
feat: combine slash command jobs into single job steps (airbytehq#266)
feat(low-code): add items and property mappings to dynamic schemas (airbytehq#256)
feat: add help response for unrecognized slash commands (airbytehq#264)
ci: post direct links to html connector test reports (airbytehq#252) (airbytehq#263)
fix(low-code): Fix legacy state migration in SubstreamPartitionRouter (airbytehq#261)
fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254)
feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236)
feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111)
fix: don't mypy unit_tests (airbytehq#241)
fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225)
feat(concurrent cursor): attempt at clamping datetime (airbytehq#234)
fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254)
feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236)
feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111)
fix: don't mypy unit_tests (airbytehq#241)
fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225)
feat(concurrent cursor): attempt at clamping datetime (airbytehq#234)
ci: use `ubuntu-24.04` explicitly (resolves CI warnings) (airbytehq#244)
Fix(sdm): module ref issue in python components import (airbytehq#243)
feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174)
chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133)
docs: comments on what the `Dockerfile` is for (airbytehq#240)
chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237)
Fix(sdm): module ref issue in python components import (airbytehq#243)
feat(low-code): add DpathFlattenFields (airbytehq#227)
feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174)
chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133)
docs: comments on what the `Dockerfile` is for (airbytehq#240)
chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237)

formatted

Update record_extractor.py

Trigger a new build. Hopefully, the integration test infrastructure is fixed.

Update CONTRIBUTING.md

Trigger a new build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants