-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: Allow third-party packages to register IO engines #61642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
doc/source/development/extending.rst
Outdated
method on it with the arguments provided by the user (except the ``engine`` parameter). | ||
|
||
To avoid conflicts in the names of engines, we keep an "IO engines" section in our | ||
[Ecosystem page](https://pandas.pydata.org/community/ecosystem.html#io-engines). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need different formatting since rst
hyperlink syntax is different from md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, thanks for the heads up. I updated it.
@@ -52,6 +56,10 @@ def read_iceberg( | |||
scan_properties : dict of {str: obj}, optional | |||
Additional Table properties as a dictionary of string key value pairs to use | |||
for this scan. | |||
engine : str, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the read_*
and to_*
signatures also have an engine_kwargs: dict[str, Any] | None
argument to allow specific engine arguments to be passes per implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point. In read_parquet
we already have a **kwargs
for engine specific arguments. In map
, apply
... it's a normal engine_kwargs
since **kwargs
is used in some cases for the udf keyword arguments. I think for IO readers/writers **kwargs
as read_parquet
does is fine.
I didn't want to add the engine to all connectors in this PR to keep it simpler, but I'm planning to follow up with another PR that adds it, and adds **kwargs
for connectors where it's not there already. Surely happy to add both things here if you prefer, just thought it would make reviewing simpler to keep the implementation separate from all the changes to parameters.
/preview |
Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/61642/ |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Added the new system to the Iceberg connection only to keep this smaller. The idea is to add the decorator to all other connectors, happy to do it here or in a follow up PR.