Skip to content

fix: IN(subquery) data corruption in ungrouped aggregates with NOT NULL columns#5591

Merged
jussisaurio merged 2 commits intotursodatabase:mainfrom
vimalprakashts:fix/in-subquery-nullrow-corruption
Feb 27, 2026
Merged

fix: IN(subquery) data corruption in ungrouped aggregates with NOT NULL columns#5591
jussisaurio merged 2 commits intotursodatabase:mainfrom
vimalprakashts:fix/in-subquery-nullrow-corruption

Conversation

@vimalprakashts
Copy link
Contributor

Summary

  • Data corruption bug: IN(subquery) returns 1 instead of NULL in ungrouped aggregate queries where no rows match the WHERE clause, when the LHS column is declared NOT NULL
  • Wrong values are physically stored via INSERT...SELECT, corrupting persistent data
  • Root cause: the IN expression compiler skipped the IsNull check when the LHS column was NOT NULL, but NullRow (used in ungrouped aggregates with no matching rows) overrides all column values to NULL regardless of constraints
  • Fix: always emit IsNull before NotFound/Found in IN expressions

Reproducer

CREATE TABLE src(a TEXT PRIMARY KEY, val REAL NOT NULL, flag INTEGER);
INSERT INTO src VALUES (NULL, 1.5, NULL);

-- Query returns 1|NULL instead of NULL|NULL
SELECT val IN (SELECT a FROM src), AVG(1) FROM src WHERE flag >= 99;

-- Wrong value gets stored (data corruption)
CREATE TABLE dst(in_result, agg_result);
INSERT INTO dst SELECT val IN (SELECT a FROM src), AVG(1) FROM src WHERE flag >= 99;
SELECT in_result, typeof(in_result) FROM dst;
-- SQLite: NULL|null
-- Turso (before fix): 1|integer

Simulator gap

IN(subquery) expression generation is commented out in sql_generation/generation/expr.rs line 177:

// TODO: skip InSelect as still need to implement ArbitratyFrom for Select

The differential fuzzer (which does generate IN subqueries) found this bug with seed 16648648351493581373.

Test plan

  • 3 new regression tests in testing/runner/tests/in-subquery-ungrouped-aggregate.sqltest (all pass)
  • Full sqltest suite: 6896 passed, 0 failed
  • Core unit tests: 1339 passed
  • cargo clippy clean
  • Updated 3 bytecode snapshot tests

…aggregates

IN(subquery) returns 1 instead of NULL when used in an ungrouped
aggregate query where no rows pass the WHERE clause. The stale cursor
value is evaluated against the subquery instead of NULL, and wrong
data gets physically stored via INSERT...SELECT.

Reproducer: SELECT val IN (SELECT a FROM t), AVG(1) FROM t WHERE false;
Differential fuzzer seed: 16648648351493581373

Simulator gap: IN(subquery) generation is commented out in
sql_generation/generation/expr.rs line 177.
… column

The IN expression compiler skipped the IsNull check on the LHS value
when the column was declared NOT NULL. However, NullRow (used in
ungrouped aggregates when no rows match the WHERE clause) overrides
all column values to NULL regardless of the NOT NULL constraint.

This caused IN(subquery) to evaluate against stale cursor data instead
of NULL, returning 1 (true) when it should return NULL. Wrong values
were then physically stored via INSERT...SELECT.

Fix: always emit IsNull before NotFound/Found in IN expressions,
even for NOT NULL columns.

Differential fuzzer seed: 16648648351493581373
Simulator gap: IN(subquery) generation is commented out in
sql_generation/generation/expr.rs line 177.
Copy link

@turso-bot turso-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review @PThorpe92

@jussisaurio jussisaurio merged commit 21bdf09 into tursodatabase:main Feb 27, 2026
89 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants