Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(query): rewrite function call expr to cast expr #17669

Merged
merged 27 commits into from
Apr 3, 2025

Conversation

forsaken628
Copy link
Collaborator

@forsaken628 forsaken628 commented Mar 28, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

A series of refactorings/optimizations have been done around ExprVisitor.

  1. functions like to_string are rewritten as cast expressions, which makes it easier to add other expression logic later on
  2. check_function will prioritize cast wrap on const parameters.
  3. Expr display provides the function to display ids.
  4. add ut test_type_check
  5. reduce the memory size of enum Expr from 288 bytes -> 128 bytes.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Mar 28, 2025
Signed-off-by: coldWater <[email protected]>
@forsaken628 forsaken628 marked this pull request as ready for review April 2, 2025 17:27
@forsaken628 forsaken628 requested a review from sundy-li April 2, 2025 17:28
@sundy-li
Copy link
Member

sundy-li commented Apr 3, 2025

So to_xxx will be checked as CastExpr in type checker and checked back into to_xxx function in runtime.

LGTM.

Do you perf the following SQLs with the previous version?

select  to_string(number), to_int64(number) from numbers(1000000000) ignore_result;

select  cast(number as string), cast(number as int64) from numbers(1000000000) ignore_result;

@forsaken628
Copy link
Collaborator Author

v1.2.716-nightly-979dab1299
--
v1.2.718-nightly-208d6feede

explain analyze select  to_string(number), to_int64(number) from numbers(1000000000) ignore_result;

EvalScalar
├── output columns: [to_string(number) (#1), to_int64(number) (#2)]
├── expressions: [to_string(numbers.number (#0)), to_int64(numbers.number (#0))]
├── estimated rows: 1000000000.00
├── cpu time: 14.703560079s
├── output rows: 1 billion
├── output bytes: 26.90 GiB
--
EvalScalar
├── output columns: [to_string(number) (#1), to_int64(number) (#2)]
├── expressions: [CAST(numbers.number (#0) AS String), CAST(numbers.number (#0) AS Int64)]
├── estimated rows: 1000000000.00
├── cpu time: 12.791843634s
├── output rows: 1 billion
├── output bytes: 26.90 GiB

explain analyze select cast(number as string), cast(number as int64) from numbers(1000000000) ignore_result;

EvalScalar
├── output columns: [CAST(number AS STRING) (#1), CAST(number AS Int64) (#2)]
├── expressions: [to_string(numbers.number (#0)), to_int64(numbers.number (#0))]
├── estimated rows: 1000000000.00
├── cpu time: 14.755832024s
├── output rows: 1 billion
├── output bytes: 26.90 GiB
--
EvalScalar
├── output columns: [CAST(number AS STRING) (#1), CAST(number AS Int64) (#2)]
├── expressions: [CAST(numbers.number (#0) AS String), CAST(numbers.number (#0) AS Int64)]
├── estimated rows: 1000000000.00
├── cpu time: 12.869861283s
├── output rows: 1 billion
├── output bytes: 26.90 GiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select to_string(l_orderkey) from t;

EvalScalar
├── output columns: [to_string(l_orderkey) (#18)]
├── expressions: [to_string(t.l_orderkey (#0))]
├── estimated rows: 18003645.00
├── cpu time: 344.298703ms
├── output rows: 30.01 million
├── output bytes: 538.40 MiB
--
EvalScalar
├── output columns: [to_string(l_orderkey) (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS String)]
├── estimated rows: 18003645.00
├── cpu time: 341.178704ms
├── output rows: 30.01 million
├── output bytes: 538.40 MiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select l_orderkey::string from t;

EvalScalar
├── output columns: [l_orderkey::STRING (#18)]
├── expressions: [to_string(t.l_orderkey (#0))]
├── estimated rows: 18003645.00
├── cpu time: 345.455491ms
├── output rows: 30.01 million
├── output bytes: 538.40 MiB
--
EvalScalar
├── output columns: [l_orderkey::STRING (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS String)]
├── estimated rows: 18003645.00
├── cpu time: 343.893416ms
├── output rows: 30.01 million
├── output bytes: 538.40 MiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select cast(l_orderkey as string) from t;

EvalScalar
├── output columns: [CAST(l_orderkey AS STRING) (#18)]
├── expressions: [to_string(t.l_orderkey (#0))]
├── estimated rows: 18003645.00
├── cpu time: 353.688089ms
├── output rows: 30.01 million
├── output bytes: 538.40 MiB
--
EvalScalar
├── output columns: [CAST(l_orderkey AS STRING) (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS String)]
├── estimated rows: 18003645.00
├── cpu time: 346.544552ms
├── output rows: 30.01 million
├── output bytes: 538.40 MiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select cast(l_orderkey as int64 null) from t;

EvalScalar
├── output columns: [CAST(l_orderkey AS Int64 NULL) (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS Int64 NULL)]
├── estimated rows: 18003645.00
├── cpu time: 7.222148ms
├── output rows: 30.01 million
├── output bytes: 232.51 MiB
--
EvalScalar
├── output columns: [CAST(l_orderkey AS Int64 NULL) (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS Int64 NULL)]
├── estimated rows: 18003645.00
├── cpu time: 3.83956ms
├── output rows: 30.01 million
├── output bytes: 232.51 MiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select cast(l_orderkey as uint64 null) from t;

EvalScalar
├── output columns: [CAST(l_orderkey AS UInt64 NULL) (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS UInt64 NULL)]
├── estimated rows: 18003645.00
├── cpu time: 153.676638ms
├── output rows: 30.01 million
├── output bytes: 232.51 MiB
--
EvalScalar
├── output columns: [CAST(l_orderkey AS UInt64 NULL) (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS UInt64 NULL)]
├── estimated rows: 18003645.00
├── cpu time: 79.140626ms
├── output rows: 30.01 million
├── output bytes: 232.51 MiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select to_uint64(l_orderkey) from t;

EvalScalar
├── output columns: [to_uint64(l_orderkey) (#18)]
├── expressions: [to_uint64(t.l_orderkey (#0))]
├── estimated rows: 18003645.00
├── cpu time: 70.598677ms
├── output rows: 30.01 million
├── output bytes: 228.93 MiB
--
EvalScalar
├── output columns: [to_uint64(l_orderkey) (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS UInt64)]
├── estimated rows: 18003645.00
├── cpu time: 76.400565ms
├── output rows: 30.01 million
├── output bytes: 228.93 MiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select to_int64(to_uint32(l_orderkey)) from t;

EvalScalar
├── output columns: [to_int64(to_uint32(l_orderkey)) (#18)]
├── expressions: [to_int64(to_uint32(t.l_orderkey (#0)))]
├── estimated rows: 18003645.00
├── cpu time: 84.32972ms
├── output rows: 30.01 million
├── output bytes: 228.93 MiB
--
EvalScalar
├── output columns: [to_int64(to_uint32(l_orderkey)) (#18)]
├── expressions: [CAST(CAST(t.l_orderkey (#0) AS UInt32) AS Int64)]
├── estimated rows: 18003645.00
├── cpu time: 99.113361ms
├── output rows: 30.01 million
├── output bytes: 228.93 MiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select to_int64(l_orderkey::uint32 null) from t;

EvalScalar
├── output columns: [to_int64(l_orderkey::UInt32 NULL) (#18)]
├── expressions: [to_int64(CAST(t.l_orderkey (#0) AS UInt32 NULL))]
├── estimated rows: 18003645.00
├── cpu time: 163.237386ms
├── output rows: 30.01 million
├── output bytes: 232.51 MiB
--
EvalScalar
├── output columns: [to_int64(l_orderkey::UInt32 NULL) (#18)]
├── expressions: [CAST(CAST(t.l_orderkey (#0) AS UInt32 NULL) AS Int64 NULL)]
├── estimated rows: 18003645.00
├── cpu time: 91.626866ms
├── output rows: 30.01 million
├── output bytes: 232.51 MiB

@sundy-li sundy-li merged commit 1f06d8a into databendlabs:main Apr 3, 2025
146 of 148 checks passed
@forsaken628 forsaken628 deleted the expr_cast branch April 4, 2025 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants