-
Notifications
You must be signed in to change notification settings - Fork 25
Fix: State handling when passed in read() method #387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Low-code connectors instantiate the source without a state and pass the state via the read() method. This commit updates the ConcurrentDeclarativeSource to support this behavior. Changes are: - Instantiate the ConnectorStateManager with the state passed in read() method. - Update the ModelToComponentFactory to use the stream_state parameter when creating a cursor. - Add a test to verify the behavior: instantiate the source without a state and pass the state via the read() method.
📝 WalkthroughWalkthroughThis pull request enhances state management in the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Would you like to discuss any of these details further, wdyt? 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms (4)
🔇 Additional comments (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @marianob-span1! Thanks for the contribution. I would like to ask more information regarding this change to make sure I understand what would be the right solution. I'm asking because we have been slowly fading out the use of state through parameters to encapsulate the state into the Cursor interface.
The reasons are:
- There are a bunch of maintenance problems that comes with multiple classes having access to the state:
- When we are adding capabilities to state management, the change can affect all those classes and each usage needs to be validated which is long and tedious
- It is very hard to debug what updates the state if the state is passed everywhere and we've had surprising behavior associated to that
- We are moving to use concurrency more and more to speed up our sources and having a dict passed around and being able to be modified by many component is not thread safe
Hence, my questions would be:
- What is the problem you are facing that brings you to create this PR?
- Why passing the state at the construction of the source is now a valid solution in your case?
In the meanwhile, I still added a couple of comments on the PR to try to bridge the gap between my understanding and yours.
source.read(logger=source.logger, config=_CONFIG, catalog=catalog, state=state) | ||
) | ||
|
||
concurrent_streams, _ = source._group_streams(config=_CONFIG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is private and only called once per source hence this does not reflect the normal usage of a source. Could you please comment on what this is trying to achieve?
The consequences of using this private method on the test seem important as the test read performed above is applied to another instance of party_members_stream
and the read should not influence the state that is tested below. The reason the test passes right now is the read swap the instance of _connector_state_manager
and the next time _group_stream
is called, it uses the new state.
In order to properly validate that the input state is taken into account, we would have to call http_mocker._validate_all_matchers_called()
to ensure that the HTTP requests that were performed considered the state as an input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a way to obtain the streams that I saw in other tests in this same file. I'm happy to find another way to get the streams in order to make the assertions I was intending to do if the general approach of the fix included in this PR is okay.
Hi @maxi297 ! The current issue we are experiencing with low code connectors and the CDK is that, when using incremental streams with a DatetimeBasedCursor, the connector ignores the previous stream state that is passed by the platform or a The reason seems to be that the point where a low code connector passes the current state is in the read() call in elif cmd == "read":
config_catalog = self.source.read_catalog(parsed_args.catalog)
state = self.source.read_state(parsed_args.state)
yield from map(
AirbyteEntrypoint.airbyte_message_to_string,
self.read(source_spec, config, config_catalog, state),
) So my changes introduced here where aimed at making sure that state passed in that read() call reaches the place where the Concurrent Cursor is instantiated, that's why I added the argument Given your question and looking at the whole CDK code, it seems that it is expected that the state is passed when the source is instantiated (not 100% sure about this though), which I'm not sure how that could be done at least for low code connectors where the source is instantiated without any parameters, this is the import sys
from airbyte_cdk.entrypoint import launch
from source_linear import SourceLinear
if __name__ == "__main__":
source = SourceLinear()
launch(source, sys.argv[1:]) It could also be the case that we are using the CDK in a way that it's not expected to be and that's why the state is blank at the moment the read operation occurs, but this happened in a low code connector recently generated using instructions found in here: https://docs.airbyte.com/connector-development/config-based/tutorial/create-source and I'm not aware of any other guides to create a low code connector. I'd love to get some guidance on how a low code connector should be structured if the issue is that there is something wrong with the connector itself Thanks! |
Ok, that makes a lot of sense. Thanks for sharing this context! You can check how it is done for other low-code sources like source-instagram. Don't forget to initialize the uncaught exception handler like this. If your source is all defined within a manifest and a components.py file for the custom components, I would suggest you move to the Let me know if this does not solve your issue and why and we will figure it out! 💪 |
Hi @maxi297 that reply is just gold! I wasn't aware of how the run() of a connector should be properly structured. I think for now we'll just replicate Instagram's structure given that it's easier for development purposes to just run I'm closing the PR but will appreciate if you could reply to this last comment, thank you so much! |
This is mostly true but not 100% true. Everything under the hood is Python so you could create a If you have any other questions, please ping me and we will figure it out! ❤️ |
Low-code connectors instantiate the source without a state and pass the state via the read() method. This PR updates ConcurrentDeclarativeSource to support this behavior.
Changes are:
Summary by CodeRabbit
New Features
Documentation
stream_state
variable is utilized in low-code scenarios.Tests