feat: Add DuckDB plugin#633
Open
andreahlert wants to merge 6 commits intoflyteorg:mainfrom
Open
Conversation
Add a DuckDB connector plugin following the same patterns as the Snowflake plugin. DuckDB is an embedded analytical database that runs queries locally and synchronously, so the connector executes queries in create() and get() always returns SUCCEEDED. Features: - In-memory and file-based database support - Parameterized SQL queries with typed inputs - Extension installation and loading (httpfs, json, etc.) - Query results returned as pandas DataFrames via temp parquet files - Automatic cleanup of temporary result files Signed-off-by: André Ahlert <[email protected]>
47fa959 to
100a36c
Compare
kumare3
reviewed
Feb 8, 2026
kumare3
reviewed
Feb 8, 2026
kumare3
reviewed
Feb 8, 2026
kumare3
reviewed
Feb 8, 2026
kumare3
reviewed
Feb 8, 2026
Contributor
kumare3
left a comment
There was a problem hiding this comment.
I think you got it a little wrong
Contributor
Author
Thanks for the review! You're right, I should have looked at the existing flytekit DuckDB plugin as reference instead of modeling it after Snowflake. I'll rework this to use TaskTemplate with execute() and add DataFrame input type support. |
Drop the AsyncConnector pattern (connector.py, dataframe.py) which is designed for remote services. DuckDB runs locally and needs guaranteed memory isolation, so the plugin now subclasses TaskTemplate directly with an async execute() method, following the same pattern as ContainerTask. Changes: - Accept pd.DataFrame, pa.Table and flyte.io.DataFrame as inputs registered as virtual tables via con.register() - Support parameterized queries with ? and $N placeholders - Support multi-query execution (list of queries) - Support runtime queries via 'query' string input - Remove connector dependency and entry-point from pyproject.toml - 26 tests covering execution, DataFrame inputs, params, multi-query, runtime queries, extensions and serialization Signed-off-by: Andre Ahlert <[email protected]> Signed-off-by: André Ahlert <[email protected]>
Required for DataFrame input registration and Arrow table output. Without these the CI test environment fails on import. Signed-off-by: Andre Ahlert <[email protected]> Signed-off-by: André Ahlert <[email protected]>
- Guard against empty query list (query=[]) which would cause
AttributeError on None.to_arrow_table()
- Use startswith("insert") instead of "insert" in query to avoid
false matches on column names like insert_date
- Add tests for both edge cases
Signed-off-by: Andre Ahlert <[email protected]>
Signed-off-by: André Ahlert <[email protected]>
DuckDB's con.execute() always returns a result object even for DDL statements, so the None guard was unreachable. Replace with a test confirming DDL queries return an empty DataFrame. Signed-off-by: Andre Ahlert <[email protected]> Signed-off-by: André Ahlert <[email protected]>
kumare3
reviewed
Mar 31, 2026
…rames Handle DataFrame inputs that have _raw_df populated (from wrap_df/from_df) by converting directly to Arrow Table instead of trying to open via URI. Signed-off-by: André Ahlert <[email protected]>
Contributor
|
i'll review the PR later this week. thanks for porting it over! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Features
DuckDBConfig(database_path=...)Design
Unlike Snowflake/BigQuery, DuckDB runs locally with no remote service, so the connector pattern is adapted:
create()(wrapped inrun_in_executorfor async compat)get()always returns SUCCEEDED since queries complete increate()delete()cleans up temporary parquet result filesTest plan
pytest plugins/duckdb/tests/ -v)ruff check plugins/duckdb/)