Skip to content

fix: PushDownFilter for GROUP BY on uppercase col names #16049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

aditanase
Copy link
Contributor

Which issue does this PR close?

PushDownFilter does not push a predicate when the table has columns that are not all lowercase. Tried with and without enable_ident_normalization - no change. The logic inside parse_identifiers_normalized does not seem to properly detect quotes and it will lowercase the column used in the group by expression.

Here's the query I used, just for illustration:

SELECT * FROM (
    SELECT
        fm.Timestamp,
        SUM(fm.ConsumedUnits)
    FROM fm
    WHERE
        fm.Timestamp BETWEEN to_timestamp('2025-04-10') AND to_timestamp('2025-04-20')
    GROUP BY fm.Timestamp
)
WHERE Timestamp = to_timestamp('2025-04-12')

Expected query plan:

Aggregate: groupBy=[[fm.Timestamp]], aggr=[[sum(fm.ConsumedUnits)]]
  TableScan: fm projection=[ConsumedUnits, Timestamp], full_filters=[fm.Timestamp = TimestampMicrosecond(1744416000000000, Some("UTC")), fm.Timestamp >= TimestampMicrosecond(1744243200000000, Some("UTC")), fm.Timestamp <= TimestampMicrosecond(1745107200000000, Some("UTC"))]

Actual query plan:

Filter: fm.Timestamp = TimestampMicrosecond(1744416000000000, Some("UTC"))
  Aggregate: groupBy=[[fm.Timestamp]], aggr=[[sum(fm.ConsumedUnits)]]
    TableScan: fm projection=[ConsumedUnits, Timestamp], full_filters=[fm.Timestamp >= TimestampMicrosecond(1744243200000000, Some("UTC")), fm.Timestamp <= TimestampMicrosecond(1745107200000000, Some("UTC"))]

An alterate fix could use expr_to_columns to extract the columns, as in Unnest above:

let mut group_expr_columns: HashSet<Column> = HashSet::new();
for p in &agg.group_expr {
    expr_to_columns(&p, &mut group_expr_columns)?;
}

Question: should we make the same change in the Window functions branch?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Yes, from a client application.

I did not add any unit tests, none of the existing tests in this module use upper case columns. Tried to add another table/schema, but then the test was failing, I am unsure of how to control the lowercasing of column names.

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label May 14, 2025
@aditanase aditanase changed the title fix: PushDownFilter for GROUP BY on uppercase col names fix: PushDownFilter for GROUP BY on uppercase col names May 14, 2025
@aditanase aditanase force-pushed the fix-gby-pushdownfilter branch from 0495f24 to 14ceef8 Compare May 14, 2025 13:16
@aditanase
Copy link
Contributor Author

@adriangb Saw you recent commits in this area, would appreciate if you weighed in on this. Thank you! 🙌

@adriangb
Copy link
Contributor

Hmm I've been doing a lot with the physical optimizer of the same name but haven't touched the logical optimizer. The recent changes may mean that the pushdown ends up happening regardless at the physical level but I think it's worth fixing the logical level anyway.

I don't fully understand the issue: does from_qualified_name_ignore_case do something different with quotes than from_qualified_name? I also don't see any quotes in your example.

@aditanase
Copy link
Contributor Author

I don't fully understand the issue: does from_qualified_name_ignore_case do something different with quotes than from_qualified_name? I also don't see any quotes in your example.

Sorry for not being more clear. I was referring to these lines:

.map(|id| match id.quote_style {
Some(_) => id.value,

Reading it made me think that if I used quotes I might convince it to remain unchanged, but it still converts to lowercase timestamp, no matter what I tried, so I didn't include it in the code sample.

@alamb
Copy link
Contributor

alamb commented May 27, 2025

Can we please get a test for this fix so we don't break it again in the future?

@alamb
Copy link
Contributor

alamb commented Jun 9, 2025

Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look

@alamb alamb marked this pull request as draft June 9, 2025 13:56
@aditanase aditanase force-pushed the fix-gby-pushdownfilter branch from 8f13d7f to 63a09dc Compare June 20, 2025 11:58
@aditanase aditanase force-pushed the fix-gby-pushdownfilter branch from 4df8ec6 to ba36d39 Compare June 20, 2025 12:27
@aditanase
Copy link
Contributor Author

@alamb Thanks for waiting, added a test that would break without this change

@aditanase aditanase marked this pull request as ready for review June 20, 2025 12:27
Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suggest adding sqllogictest based on the sql in PR summary


#[test]
fn filter_agg_case_insensitive() -> Result<()> {
let table_scan = test_table_scan_with_uppercase_columns()?;
Copy link
Member

@xudong963 xudong963 Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the table also has a column named 'a', what'll happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question, just tried this and it works as expected for both uppercase and lower case col, even if both are present in the schema at the same time. I added another test, lmk if we should keep it or it's overkill.

@aditanase
Copy link
Contributor Author

I also suggest adding sqllogictest based on the sql in PR summary

I'd be happy to, can you please point me at a sample patch or a good suite to add to? Last time I tried there was quite a bit of ceremony around SLT, not sure if I can get it right on first approach. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants