Skip to content

fix query results for predicates referencing partition columns and data columns #15935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 5, 2025

Conversation

adriangb
Copy link
Contributor

@adriangb adriangb commented May 3, 2025

Fixes #15912

@github-actions github-actions bot added the datasource Changes to the datasource crate label May 3, 2025
fn projected_columns_prevent_pushdown() {
let table_schema = get_basic_table_schema();

fn projected_or_partition_columns_prevent_pushdown() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test actually tests for this... except that since both schemas were being passed in it hit the bug as well.

@adriangb adriangb force-pushed the fix-partiition-bug-2 branch from 3b52297 to 854c264 Compare May 3, 2025 11:09
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label May 3, 2025
@adriangb
Copy link
Contributor Author

adriangb commented May 3, 2025

cc @alamb

Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @adriangb. It's LGTM, but I have one question

@@ -464,16 +464,16 @@ impl FileFormat for ParquetFormat {
fn supports_filters_pushdown(
&self,
file_schema: &Schema,
table_schema: &Schema,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused, as we talked like
"keep supports_filters_pushdown so that TableProviders can do Exact pruning of filters, e.g. using partition columns.",
and that is making sense to me now. So, instead, should we pass only table_schema's to these supports_filters_pushdown() API's at ListingTable level? Theory and practice conflict in my mind now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead, should we pass only table_schema's to these supports_filters_pushdown() API's at ListingTable level

We could! But we'd do that in #15769 right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't change any API?
I mean can we remove file_schema: &Schema or table_schema: &Schemafrom these supports_filters_pushdown() API's? If we can, which one should we remove? In this PR, the fix shows that we utilize file_schema: &Schema, however, what I understand from that conversation is we should depend on table_schema: &Schema, not file_schema: &Schema. That's what confuses me

Copy link
Contributor Author

@adriangb adriangb May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well keep only the table_schema. This fix (this PR) becomes irrelevant once we merge the other one. But since that may take longer I thought it was best to make this PR as a simpler fix and not mix together a bug fix and larger change. Helps others who have forks copy the change, etc.

@berkaysynnada berkaysynnada merged commit 9c1395f into apache:main May 5, 2025
27 checks passed
@berkaysynnada
Copy link
Contributor

Thanks @adriangb. Let's keep going

PokIsemaine pushed a commit to PokIsemaine/datafusion that referenced this pull request May 5, 2025
…ta columns (apache#15935)

* fix query results for predicates referencing partition columns and data columns

* fmt

* add e2e test

* newline
berkaysynnada added a commit that referenced this pull request May 6, 2025
* feat: ORDER BY ALL

* refactor: orderyby all

* refactor: order_by_to_sort_expr

* refactor: TODO comment

* fix query results for predicates referencing partition columns and data columns (#15935)

* fix query results for predicates referencing partition columns and data columns

* fmt

* add e2e test

* newline

* chore(deps): bump substrait from 0.55.0 to 0.55.1 (#15941)

Bumps [substrait](https://github.com/substrait-io/substrait-rs) from 0.55.0 to 0.55.1.
- [Release notes](https://github.com/substrait-io/substrait-rs/releases)
- [Changelog](https://github.com/substrait-io/substrait-rs/blob/main/CHANGELOG.md)
- [Commits](substrait-io/substrait-rs@v0.55.0...v0.55.1)

---
updated-dependencies:
- dependency-name: substrait
  dependency-version: 0.55.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat: create helpers to set the max_temp_directory_size (#15919)

* feat: create helpers to set the max_temp_directory_size

Signed-off-by: Jérémie Drouet <[email protected]>

* refactor: use helper in cli

Signed-off-by: Jérémie Drouet <[email protected]>

* refactor: update error message

Signed-off-by: Jérémie Drouet <[email protected]>

* refactor: use setter in tests

Signed-off-by: Jérémie Drouet <[email protected]>

---------

Signed-off-by: Jérémie Drouet <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>

* Fix main CI (#15942)

* Improve sqllogictest error reporting (#15905)

* refactor filter pushdown apis (#15801)

* refactor filter pushdown apis

* remove commented out code

* fix tests

* fail to fix bug

* fix

* add/fix docs

* lint

* add some docstrings, some minimal cleaup

* review suggestions

* add more comments

* fix doc links

* fmt

* add comments

* make test deterministic

* add bench

* fix bench

* register bench

* fix bench

* cargo fmt

---------

Co-authored-by: berkaysynnada <[email protected]>
Co-authored-by: Berkay Şahin <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Jérémie Drouet <[email protected]>
Co-authored-by: silezhou <[email protected]>
Co-authored-by: Adrian Garcia Badaracco <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jérémie Drouet <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: xudong.w <[email protected]>
Co-authored-by: Gabriel <[email protected]>
Co-authored-by: berkaysynnada <[email protected]>
Co-authored-by: Berkay Şahin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasource Changes to the datasource crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong query results for filters that involve partition columns and data file columns and pushdown_filters is enabled
2 participants