-
Notifications
You must be signed in to change notification settings - Fork 1.3k
feat(sql): ASOF and LT JOIN TOLERANCE support #5713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
and the slave table does not contain a given symbol at all
could we improve ergonomics of this error message - it is total dead end for the user: edit by @jerrinot: done |
could we also open a PR to improve syntax highlighting? edit by @jerrinot: PR pending: questdb/sql-grammar#47 |
These two errors should be the same: and edit by @jerrinot: done |
can we say what the limit is? edit by @jerrinot: done |
asof perf is great, just the ergonomics of the SQL could be better |
also, as a nit, it would be good to throttle back the asof fuzz test, these tests take close to 1 min on my desktop. A way to do it is not to go through all combinations at all times, but rather pick a combination of parameters randomly. Right now the test randomizes the data, but not the parameter combinations. edit by @jerrinot: done |
an unrelated failure, fixed by #5758 |
grammar PR: questdb/sql-grammar#47 |
@bluestreak01: about year units: we would need to adjust for leap years. for each master row. instead of using the same constant for all rows as we do now. do you think it's worth it? it could have a performance impact too. |
@jerrinot we support months - don't we already adjust for leap years? |
@bluestreak01 TOLERANCE does not support months either. For the same reason. See supported units: questdb/core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java Lines 771 to 794 in 87117b6
I took the list of units from SAMPLE BY, but removed units where SAMPLE BY has a dedicated sampler instead of converting the period to micros. |
[PR Coverage check]😍 pass : 710 / 724 (98.07%) file detail
|
Currently, ASOF JOIN matches records from the right table with timestamps that are equal to or earlier than the timestamp in the left table. This PR addresses the feature request to limit how far back in time the join should look for a match.
This enhancement adds a
TOLERANCE
clause to the ASOF and LT JOIN syntax. TheTOLERANCE
parameter accepts a time interval value (e.g., 2s, 100ms, 1d). When specified, a record from the left table t1 at t1.ts will only be joined with a record from the right table t2 at t2.ts if t2.ts <= t1.ts AND t1.ts - t2.ts <= tolerance_value.This provides more fine-grained control over ASOF joins, particularly useful in scenarios with sparse data where a simple "equal or earlier" match might pick records that are too distant in time to be relevant.
Or without keys:
Performance impact
Specifying
TOLERANCE
can also improve performance.ASOF JOIN
execution plans often scan backward in time on the right table to find a matching entry for each left-table row.TOLERANCE
allows these scans to terminate early - once a right-table record is older than the left-table record by more than the specified tolerance - thus avoiding unnecessary processing of more distant records.💥 Breaking Change: TOLERANCE as a new keyword in JOIN
This change introduces
TOLERANCE
as a new keyword specifically within the JOIN clause. This may break existing queries whereTOLERANCE
was used as an unquoted table alias for the right-hand table in a JOIN.Example of affected query:
Reason for breakage:
After this change,
TOLERANCE
in the position above is interpreted as the new keyword. The query will fail because the parser expects an interval value (e.g., 2s) to follow theTOLERANCE
keyword, which is missing in the example.Solution:
To use
TOLERANCE
as a table alias in this context, it must now be enclosed in double-quotes, as per standard SQL for identifiers that are also keywords:Additional optimizations
This PR introduces a performance enhancement for specific keyed ASOF JOIN scenarios, particularly when the join key is of the
SYMBOL
type:The optimization works as follows: When processing a row from the left_table, if its particular
SYMBOL
key value is entirely absent in the right_table's corresponding symbol_key column (meaning no records in right_table share this key value), the improved execution plan can now detect this. By "exiting early" from the search for this non-existent key, the overall query performance can be significantly improved, especially in cases with many such missing keys.Closes #5562
Documentation PR: questdb/documentation#195