Open
Description
Several feature requests / issues have come up that I think can all be addressed with the groundwork being laid in #16461:
- Add hooks to
SchemaAdapter
to add custom column generators #15261 - Per file filter evaluation #15057
- Avoid re-implementing expression simplification in pruning.rs #16004
- Support Push down expression evaluation in
TableProviders
#14993
In particular, #16461 introduces a general framework for adapting an expression to a file's schema, handling any necessary casts and missing columns.
We can expand this by:
- Optimizing the expressions to minimize cost of casts, wip in add cast unwraps to parquet predicate rewrites pydantic/datafusion#31. Closes Avoid re-implementing expression simplification in pruning.rs #16004.
- Other optimizations passes, such as evaluating literals / nulls. Also related to Avoid re-implementing expression simplification in pruning.rs #16004.
- Hook to handle missing columns (e.g. do something other than fill in with nulls based on Field metadata, could be a user defined default value); closes Add hooks to
SchemaAdapter
to add custom column generators #15261 - Hook to transform an expression before or after it is rewritten for the physical file schema; closes Per file filter evaluation #15057.
- Optimization to eliminate casts altogether when two types share the same parquet physical type (change the schema the data is read with and remove the cast).
Metadata
Metadata
Assignees
Labels
No labels