Skip to content

feat(sql): ASOF and LT JOIN TOLERANCE support #5713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 90 commits into
base: master
Choose a base branch
from

Conversation

jerrinot
Copy link
Contributor

@jerrinot jerrinot commented Jun 2, 2025

Currently, ASOF JOIN matches records from the right table with timestamps that are equal to or earlier than the timestamp in the left table. This PR addresses the feature request to limit how far back in time the join should look for a match.

This enhancement adds a TOLERANCE clause to the ASOF and LT JOIN syntax. The TOLERANCE parameter accepts a time interval value (e.g., 2s, 100ms, 1d). When specified, a record from the left table t1 at t1.ts will only be joined with a record from the right table t2 at t2.ts if t2.ts <= t1.ts AND t1.ts - t2.ts <= tolerance_value.

This provides more fine-grained control over ASOF joins, particularly useful in scenarios with sparse data where a simple "equal or earlier" match might pick records that are too distant in time to be relevant.

SELECT ...
FROM table1
ASOF JOIN table2 ON <key> TOLERANCE interval_literal
[WHERE ...]

Or without keys:

SELECT ...
FROM table1
ASOF JOIN table2 TOLERANCE interval_literal
[WHERE ...]

Performance impact

Specifying TOLERANCE can also improve performance. ASOF JOIN execution plans often scan backward in time on the right table to find a matching entry for each left-table row. TOLERANCE allows these scans to terminate early - once a right-table record is older than the left-table record by more than the specified tolerance - thus avoiding unnecessary processing of more distant records.

💥 Breaking Change: TOLERANCE as a new keyword in JOIN

This change introduces TOLERANCE as a new keyword specifically within the JOIN clause. This may break existing queries where TOLERANCE was used as an unquoted table alias for the right-hand table in a JOIN.

Example of affected query:

SELECT [...]
FROM tab1 tab1Alias
ASOF JOIN tab2 TOLERANCE  -- Previously, 'TOLERANCE' was an alias for 'tab2'

Reason for breakage:

After this change, TOLERANCE in the position above is interpreted as the new keyword. The query will fail because the parser expects an interval value (e.g., 2s) to follow the TOLERANCE keyword, which is missing in the example.

Solution:

To use TOLERANCE as a table alias in this context, it must now be enclosed in double-quotes, as per standard SQL for identifiers that are also keywords:

SELECT [...]
FROM tab1 tab1Alias
ASOF JOIN tab2 "TOLERANCE" -- 'TOLERANCE' is now correctly treated as an alias

Additional optimizations

This PR introduces a performance enhancement for specific keyed ASOF JOIN scenarios, particularly when the join key is of the SYMBOL type:

[...]
FROM left_table
ASOF JOIN right_table ON left_table.symbol_key = right_table.symbol_key

The optimization works as follows: When processing a row from the left_table, if its particular SYMBOL key value is entirely absent in the right_table's corresponding symbol_key column (meaning no records in right_table share this key value), the improved execution plan can now detect this. By "exiting early" from the search for this non-existent key, the overall query performance can be significantly improved, especially in cases with many such missing keys.

Closes #5562
Documentation PR: questdb/documentation#195

@jerrinot jerrinot added SQL Issues or changes relating to SQL execution Performance Performance improvements Java Improvements that update Java code labels Jun 2, 2025
@jerrinot jerrinot marked this pull request as ready for review June 4, 2025 08:43
@jerrinot jerrinot marked this pull request as draft June 4, 2025 08:45
@jerrinot jerrinot assigned jerrinot and unassigned jerrinot Jun 4, 2025
@jerrinot jerrinot marked this pull request as ready for review June 4, 2025 09:50
puzpuzpuz
puzpuzpuz previously approved these changes Jun 17, 2025
@bluestreak01
Copy link
Member

bluestreak01 commented Jun 17, 2025

could we improve ergonomics of this error message - it is total dead end for the user:

image

edit by @jerrinot: done

@bluestreak01
Copy link
Member

bluestreak01 commented Jun 17, 2025

could we also open a PR to improve syntax highlighting?

edit by @jerrinot: PR pending: questdb/sql-grammar#47

@bluestreak01
Copy link
Member

also this:

image

We have to move away from "unexpected token" proliferation. We should say what is expected

@bluestreak01
Copy link
Member

bluestreak01 commented Jun 17, 2025

These two errors should be the same:

image

and

image

edit by @jerrinot: done

@bluestreak01
Copy link
Member

shall we include 'y' as year ?

image

@bluestreak01
Copy link
Member

bluestreak01 commented Jun 17, 2025

can we say what the limit is?

image

edit by @jerrinot: done

@bluestreak01
Copy link
Member

asof perf is great, just the ergonomics of the SQL could be better

@bluestreak01
Copy link
Member

bluestreak01 commented Jun 17, 2025

also, as a nit, it would be good to throttle back the asof fuzz test, these tests take close to 1 min on my desktop. A way to do it is not to go through all combinations at all times, but rather pick a combination of parameters randomly. Right now the test randomizes the data, but not the parameter combinations.

edit by @jerrinot: done

@jerrinot
Copy link
Contributor Author

an unrelated failure, fixed by #5758

@jerrinot
Copy link
Contributor Author

grammar PR: questdb/sql-grammar#47

@jerrinot
Copy link
Contributor Author

@bluestreak01: about year units: we would need to adjust for leap years. for each master row. instead of using the same constant for all rows as we do now. do you think it's worth it? it could have a performance impact too.

@bluestreak01
Copy link
Member

@jerrinot we support months - don't we already adjust for leap years?

@jerrinot
Copy link
Contributor Author

jerrinot commented Jun 18, 2025

@bluestreak01 TOLERANCE does not support months either. For the same reason.

See supported units:

switch (unit) {
case 'U':
multiplier = 1;
break;
case 'T':
multiplier = Timestamps.MILLI_MICROS;
break;
case 's':
multiplier = Timestamps.SECOND_MICROS;
break;
case 'm':
multiplier = Timestamps.MINUTE_MICROS;
break;
case 'h':
multiplier = Timestamps.HOUR_MICROS;
break;
case 'd':
multiplier = Timestamps.DAY_MICROS;
break;
case 'w':
multiplier = Timestamps.WEEK_MICROS;
break;
default:
throw SqlException.$(tolerance.position, "unsupported TOLERANCE unit [unit=").put(unit).put(']');
(=no month and no year)

I took the list of units from SAMPLE BY, but removed units where SAMPLE BY has a dedicated sampler instead of converting the period to micros.

@glasstiger
Copy link
Contributor

[PR Coverage check]

😍 pass : 710 / 724 (98.07%)

file detail

path covered line new line coverage
🔵 io/questdb/cairo/DefaultCairoConfiguration.java 1 2 50.00%
🔵 io/questdb/griffin/engine/join/FilteredAsOfJoinNoKeyFastRecordCursorFactory.java 16 19 84.21%
🔵 io/questdb/griffin/engine/join/AsOfJoinNoKeyFastRecordCursorFactory.java 10 11 90.91%
🔵 io/questdb/griffin/engine/join/LtJoinNoKeyFastRecordCursorFactory.java 10 11 90.91%
🔵 io/questdb/griffin/engine/join/AbstractAsOfJoinFastRecordCursor.java 33 35 94.29%
🔵 io/questdb/griffin/engine/join/FilteredAsOfJoinFastRecordCursorFactory.java 111 116 95.69%
🔵 io/questdb/griffin/SqlCodeGenerator.java 128 129 99.22%
🔵 io/questdb/griffin/engine/join/AsOfJoinLightNoKeyRecordCursorFactory.java 21 21 100.00%
🔵 io/questdb/griffin/engine/join/AsOfJoinRecordCursorFactory.java 68 68 100.00%
🔵 io/questdb/PropServerConfiguration.java 4 4 100.00%
🔵 io/questdb/griffin/engine/join/DisabledSymbolShortCircuit.java 4 4 100.00%
🔵 io/questdb/griffin/SqlParser.java 22 22 100.00%
🔵 io/questdb/griffin/engine/groupby/TimestampSamplerFactory.java 17 17 100.00%
🔵 io/questdb/griffin/engine/join/ChainedSymbolShortCircuit.java 10 10 100.00%
🔵 io/questdb/griffin/model/QueryModel.java 11 11 100.00%
🔵 io/questdb/griffin/BasePlanSink.java 4 4 100.00%
🔵 io/questdb/std/CompactIntHashSet.java 33 33 100.00%
🔵 io/questdb/griffin/engine/join/LtJoinNoKeyRecordCursorFactory.java 22 22 100.00%
🔵 io/questdb/griffin/engine/join/SingleVarcharSymbolShortCircuit.java 15 15 100.00%
🔵 io/questdb/PropertyKey.java 2 2 100.00%
🔵 io/questdb/griffin/engine/join/AsOfJoinFastRecordCursorFactory.java 25 25 100.00%
🔵 io/questdb/griffin/engine/join/LtJoinRecordCursorFactory.java 70 70 100.00%
🔵 io/questdb/griffin/engine/ops/CreateMatViewOperationImpl.java 2 2 100.00%
🔵 io/questdb/griffin/engine/join/SingleSymbolSymbolShortCircuit.java 27 27 100.00%
🔵 io/questdb/griffin/SqlHints.java 3 3 100.00%
🔵 io/questdb/cairo/CairoConfigurationWrapper.java 2 2 100.00%
🔵 io/questdb/griffin/SqlKeywords.java 10 10 100.00%
🔵 io/questdb/griffin/engine/join/LtJoinLightRecordCursorFactory.java 3 3 100.00%
🔵 io/questdb/griffin/engine/join/SingleStringSymbolShortCircuit.java 11 11 100.00%
🔵 io/questdb/griffin/engine/join/AsOfJoinLightRecordCursorFactory.java 15 15 100.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Enhance existing functionality Java Improvements that update Java code Performance Performance improvements SQL Issues or changes relating to SQL execution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a tolerance for ASOF JOIN
4 participants