-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix constant window for evaluate stateful #16430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FYI, @alamb @andygrove. I'm not fully familiar with testing beyond |
Thanks @suibianwanwank - I think it would be great if we could use .slt tests to write a reproducer Here are the instructions: https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest Ideally you should be able to extend one of the existing test files in https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest/test_files If you can't find something suitable I can try and find time to help over the next few days |
Thanks @alamb, as mentioned, the PhysicalPlan generated by default |
I tried making a reproducer but I could not reproduce the wrong results or panic reported in @andygrove 's comment #16308 (comment): Here is what I tried: Data: tenk.csv Repro create external table tenk1
(
unique1 int,
unique2 int,
two int,
four int,
ten int,
twenty int,
hundred int,
thousand int,
twothousand int,
fivethous int,
tenthous int,
odd int,
even int,
stringu1 string,
stringu2 string,
string4 string
)
stored as CSV location 'tenk.csv'
OPTIONS('has_header' 'false','format.delimiter' 9);
SELECT * from tenk1 limit 10;
SELECT COUNT(*) OVER () FROM tenk1 WHERE unique2 < 10 But that seems to work just fine: (venv) andrewlamb@Andrews-MacBook-Pro-2:~/Downloads$ datafusion-cli -f repro.sql
DataFusion CLI v48.0.0
0 row(s) fetched.
Elapsed 0.001 seconds.
+---------+---------+-----+------+-----+--------+---------+----------+-------------+-----------+----------+-----+------+----------+----------+---------+
| unique1 | unique2 | two | four | ten | twenty | hundred | thousand | twothousand | fivethous | tenthous | odd | even | stringu1 | stringu2 | string4 |
+---------+---------+-----+------+-----+--------+---------+----------+-------------+-----------+----------+-----+------+----------+----------+---------+
| 8800 | 0 | 0 | 0 | 0 | 0 | 0 | 800 | 800 | 3800 | 8800 | 0 | 1 | MAAAAA | AAAAAA | AAAAxx |
| 1891 | 1 | 1 | 3 | 1 | 11 | 91 | 891 | 1891 | 1891 | 1891 | 182 | 183 | TUAAAA | BAAAAA | HHHHxx |
| 3420 | 2 | 0 | 0 | 0 | 0 | 20 | 420 | 1420 | 3420 | 3420 | 40 | 41 | OBAAAA | CAAAAA | OOOOxx |
| 9850 | 3 | 0 | 2 | 0 | 10 | 50 | 850 | 1850 | 4850 | 9850 | 100 | 101 | WOAAAA | DAAAAA | VVVVxx |
| 7164 | 4 | 0 | 0 | 4 | 4 | 64 | 164 | 1164 | 2164 | 7164 | 128 | 129 | OPAAAA | EAAAAA | AAAAxx |
| 8009 | 5 | 1 | 1 | 9 | 9 | 9 | 9 | 9 | 3009 | 8009 | 18 | 19 | BWAAAA | FAAAAA | HHHHxx |
| 5057 | 6 | 1 | 1 | 7 | 17 | 57 | 57 | 1057 | 57 | 5057 | 114 | 115 | NMAAAA | GAAAAA | OOOOxx |
| 6701 | 7 | 1 | 1 | 1 | 1 | 1 | 701 | 701 | 1701 | 6701 | 2 | 3 | TXAAAA | HAAAAA | VVVVxx |
| 4321 | 8 | 1 | 1 | 1 | 1 | 21 | 321 | 321 | 4321 | 4321 | 42 | 43 | FKAAAA | IAAAAA | AAAAxx |
| 3043 | 9 | 1 | 3 | 3 | 3 | 43 | 43 | 1043 | 3043 | 3043 | 86 | 87 | BNAAAA | JAAAAA | HHHHxx |
+---------+---------+-----+------+-----+--------+---------+----------+-------------+-----------+----------+-----+------+----------+----------+---------+
10 row(s) fetched.
Elapsed 0.007 seconds.
+-------------------------------------------------------------------+
| count(*) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING |
+-------------------------------------------------------------------+
| 10 |
| 10 |
| 10 |
| 10 |
| 10 |
| 10 |
| 10 |
| 10 |
| 10 |
| 10 |
+-------------------------------------------------------------------+
10 row(s) fetched.
Elapsed 0.004 seconds. Notes for myself of where this came from: SELECT COUNT(*) OVER () FROM tenk1 WHERE unique2 < 10 I did some digging and found the table definition is
The data is here: https://github.com/apache/spark/blob/a38d1cef73eda8ab765dc168284b9c113c237a8e/sql/core/src/test/resources/test-data/postgresql/tenk.data |
@andygrove how can we test this with Comet? Can I just pin to a datafusion version? |
Yes, assuming that there are no breaking API changes in DataFusion since 48 ... I will take a look this evening |
Unfortunately, there are multiple breaking API changes (and some are non-trivial to resolve), so we can't easily test this PR from Comet. |
I can base this commit on 48 and test it in comet. |
apache/datafusion-comet#1913 @andygrove CI seems to have passed.🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @suibianwanwank -- this looks great to me
I took the liberty of merging up from main and adding a link to the github issue in the test but I think we'll be ready to merge after that
use datafusion_physical_plan::{common, ExecutionPlan, InputOrderMode}; | ||
use std::sync::Arc; | ||
|
||
#[tokio::test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also verified that these tests cover the code change. Without the change in this PR and they fail with
attempt to subtract with overflow
thread 'physical_optimizer::window_optimize::test::test_window_constant_aggregate' panicked at datafusion/expr/src/window_state.rs:95:13:
attempt to subtract with overflow
Thanks, LGTM! |
Yes, I confirmed that the test passes now:
Thanks, @suibianwanwank, for backporting your PR to DF 48 so that we could verify the fix in Comet! |
Thanks again for the diligence and follow up here |
Which issue does this PR close?
datafusion_expr::window_state::WindowAggState::update
#16308.Rationale for this change
I'm not very familiar with Comet, and I'm unsure how to verify this issue. So I tried checking the code.
When both
aggregate_evaluate_stateful
and the optimization from #16234 are present, it leads to inconsistent return values compared to the original behavior. (However, this doesn't seem to occur in DataFusion's default planning.)So based on this fix, I created a test, and on the main branch, this error is consistent with #16308: