Skip to content

Conversation

forsaken628
Copy link
Collaborator

@forsaken628 forsaken628 commented Mar 28, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

A series of refactorings/optimizations have been done around ExprVisitor.

  1. functions like to_string are rewritten as cast expressions, which makes it easier to add other expression logic later on
  2. check_function will prioritize cast wrap on const parameters.
  3. Expr display provides the function to display ids.
  4. add ut test_type_check
  5. reduce the memory size of enum Expr from 288 bytes -> 128 bytes.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Mar 28, 2025
Signed-off-by: coldWater <[email protected]>
@forsaken628 forsaken628 marked this pull request as ready for review April 2, 2025 17:27
@forsaken628 forsaken628 requested a review from sundy-li April 2, 2025 17:28
@sundy-li
Copy link
Member

sundy-li commented Apr 3, 2025

So to_xxx will be checked as CastExpr in type checker and checked back into to_xxx function in runtime.

LGTM.

Do you perf the following SQLs with the previous version?

select  to_string(number), to_int64(number) from numbers(1000000000) ignore_result;

select  cast(number as string), cast(number as int64) from numbers(1000000000) ignore_result;

@forsaken628
Copy link
Collaborator Author

v1.2.716-nightly-979dab1299
--
v1.2.718-nightly-208d6feede

explain analyze select  to_string(number), to_int64(number) from numbers(1000000000) ignore_result;

EvalScalar
├── output columns: [to_string(number) (#1), to_int64(number) (#2)]
├── expressions: [to_string(numbers.number (#0)), to_int64(numbers.number (#0))]
├── estimated rows: 1000000000.00
├── cpu time: 14.703560079s
├── output rows: 1 billion
├── output bytes: 26.90 GiB
--
EvalScalar
├── output columns: [to_string(number) (#1), to_int64(number) (#2)]
├── expressions: [CAST(numbers.number (#0) AS String), CAST(numbers.number (#0) AS Int64)]
├── estimated rows: 1000000000.00
├── cpu time: 12.791843634s
├── output rows: 1 billion
├── output bytes: 26.90 GiB

explain analyze select cast(number as string), cast(number as int64) from numbers(1000000000) ignore_result;

EvalScalar
├── output columns: [CAST(number AS STRING) (#1), CAST(number AS Int64) (#2)]
├── expressions: [to_string(numbers.number (#0)), to_int64(numbers.number (#0))]
├── estimated rows: 1000000000.00
├── cpu time: 14.755832024s
├── output rows: 1 billion
├── output bytes: 26.90 GiB
--
EvalScalar
├── output columns: [CAST(number AS STRING) (#1), CAST(number AS Int64) (#2)]
├── expressions: [CAST(numbers.number (#0) AS String), CAST(numbers.number (#0) AS Int64)]
├── estimated rows: 1000000000.00
├── cpu time: 12.869861283s
├── output rows: 1 billion
├── output bytes: 26.90 GiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select to_string(l_orderkey) from t;

EvalScalar
├── output columns: [to_string(l_orderkey) (#18)]
├── expressions: [to_string(t.l_orderkey (#0))]
├── estimated rows: 18003645.00
├── cpu time: 344.298703ms
├── output rows: 30.01 million
├── output bytes: 538.40 MiB
--
EvalScalar
├── output columns: [to_string(l_orderkey) (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS String)]
├── estimated rows: 18003645.00
├── cpu time: 341.178704ms
├── output rows: 30.01 million
├── output bytes: 538.40 MiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select l_orderkey::string from t;

EvalScalar
├── output columns: [l_orderkey::STRING (#18)]
├── expressions: [to_string(t.l_orderkey (#0))]
├── estimated rows: 18003645.00
├── cpu time: 345.455491ms
├── output rows: 30.01 million
├── output bytes: 538.40 MiB
--
EvalScalar
├── output columns: [l_orderkey::STRING (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS String)]
├── estimated rows: 18003645.00
├── cpu time: 343.893416ms
├── output rows: 30.01 million
├── output bytes: 538.40 MiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select cast(l_orderkey as string) from t;

EvalScalar
├── output columns: [CAST(l_orderkey AS STRING) (#18)]
├── expressions: [to_string(t.l_orderkey (#0))]
├── estimated rows: 18003645.00
├── cpu time: 353.688089ms
├── output rows: 30.01 million
├── output bytes: 538.40 MiB
--
EvalScalar
├── output columns: [CAST(l_orderkey AS STRING) (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS String)]
├── estimated rows: 18003645.00
├── cpu time: 346.544552ms
├── output rows: 30.01 million
├── output bytes: 538.40 MiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select cast(l_orderkey as int64 null) from t;

EvalScalar
├── output columns: [CAST(l_orderkey AS Int64 NULL) (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS Int64 NULL)]
├── estimated rows: 18003645.00
├── cpu time: 7.222148ms
├── output rows: 30.01 million
├── output bytes: 232.51 MiB
--
EvalScalar
├── output columns: [CAST(l_orderkey AS Int64 NULL) (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS Int64 NULL)]
├── estimated rows: 18003645.00
├── cpu time: 3.83956ms
├── output rows: 30.01 million
├── output bytes: 232.51 MiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select cast(l_orderkey as uint64 null) from t;

EvalScalar
├── output columns: [CAST(l_orderkey AS UInt64 NULL) (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS UInt64 NULL)]
├── estimated rows: 18003645.00
├── cpu time: 153.676638ms
├── output rows: 30.01 million
├── output bytes: 232.51 MiB
--
EvalScalar
├── output columns: [CAST(l_orderkey AS UInt64 NULL) (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS UInt64 NULL)]
├── estimated rows: 18003645.00
├── cpu time: 79.140626ms
├── output rows: 30.01 million
├── output bytes: 232.51 MiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select to_uint64(l_orderkey) from t;

EvalScalar
├── output columns: [to_uint64(l_orderkey) (#18)]
├── expressions: [to_uint64(t.l_orderkey (#0))]
├── estimated rows: 18003645.00
├── cpu time: 70.598677ms
├── output rows: 30.01 million
├── output bytes: 228.93 MiB
--
EvalScalar
├── output columns: [to_uint64(l_orderkey) (#18)]
├── expressions: [CAST(t.l_orderkey (#0) AS UInt64)]
├── estimated rows: 18003645.00
├── cpu time: 76.400565ms
├── output rows: 30.01 million
├── output bytes: 228.93 MiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select to_int64(to_uint32(l_orderkey)) from t;

EvalScalar
├── output columns: [to_int64(to_uint32(l_orderkey)) (#18)]
├── expressions: [to_int64(to_uint32(t.l_orderkey (#0)))]
├── estimated rows: 18003645.00
├── cpu time: 84.32972ms
├── output rows: 30.01 million
├── output bytes: 228.93 MiB
--
EvalScalar
├── output columns: [to_int64(to_uint32(l_orderkey)) (#18)]
├── expressions: [CAST(CAST(t.l_orderkey (#0) AS UInt32) AS Int64)]
├── estimated rows: 18003645.00
├── cpu time: 99.113361ms
├── output rows: 30.01 million
├── output bytes: 228.93 MiB

explain analyze with t as (select l_orderkey,unnest([1,2,3,4,5]) from lineitem) select to_int64(l_orderkey::uint32 null) from t;

EvalScalar
├── output columns: [to_int64(l_orderkey::UInt32 NULL) (#18)]
├── expressions: [to_int64(CAST(t.l_orderkey (#0) AS UInt32 NULL))]
├── estimated rows: 18003645.00
├── cpu time: 163.237386ms
├── output rows: 30.01 million
├── output bytes: 232.51 MiB
--
EvalScalar
├── output columns: [to_int64(l_orderkey::UInt32 NULL) (#18)]
├── expressions: [CAST(CAST(t.l_orderkey (#0) AS UInt32 NULL) AS Int64 NULL)]
├── estimated rows: 18003645.00
├── cpu time: 91.626866ms
├── output rows: 30.01 million
├── output bytes: 232.51 MiB

@sundy-li sundy-li merged commit 1f06d8a into databendlabs:main Apr 3, 2025
146 of 148 checks passed
@forsaken628 forsaken628 deleted the expr_cast branch April 4, 2025 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants