Skip to content

Sync engine push: column order mismatch after DROP COLUMN + ADD COLUMN (ABA problem) #5640

@sivukhin

Description

@sivukhin

Summary

The sync engine's push path (client → server) uses positional mapping from CDC binary records to resolve column names via pragma_table_info on the client's current schema. If ALTER TABLE ... DROP COLUMN followed by ALTER TABLE ... ADD COLUMN changes the ordinal positions of columns, old CDC records get misinterpreted — values map to the wrong columns, causing silent data corruption.

Root Cause

CDC binary records store values positionally. The DatabaseReplayGenerator (database_replay_generator.rs) maps these positions to column names using the client's current schema (table_columns_info()pragma_table_info). DROP COLUMN shifts ordinal positions of subsequent columns. If the same column name is re-added later (or any column is added), old CDC records encoded with the original ordering are replayed against the new ordering.

Relevant code path: create_row_full() in database_replay_generator.rs:

fn create_row_full(&self, info: &ReplayInfo, values: &[turso_core::Value])
    -> HashMap<String, turso_core::Value> {
    for (i, value) in values.iter().enumerate() {
        row.insert(info.column_names[i].clone(), value.clone());
    }
}

ABA Reproducer

  1. Start: users(id, name, temp, email) — client & server in sync
  2. Client inserts: (1, 'Alice', 'junk', '[email protected]') — CDC captures positional record [1, 'Alice', 'junk', '[email protected]'] where temp=pos2, email=pos3
  3. Client: ALTER TABLE users DROP COLUMN temp → schema becomes (id, name, email)
  4. Client: ALTER TABLE users ADD COLUMN temp TEXT → schema becomes (id, name, email, temp)temp is now at position 3, email at position 2 (swapped vs original)
  5. On push, replay generator reads current client schema via pragma_table_info[id, name, email, temp]
  6. Maps the old CDC record: values[2]='junk' → email, values[3]='[email protected]' → temp
  7. Silent data corruption — email and temp values are swapped

Simpler variant (no re-add)

Even a single DROP COLUMN without re-adding can cause corruption:

  1. Client and server have users(id, name, temp, email)
  2. Client inserts row — CDC captures 4-value record
  3. One side does ALTER TABLE users DROP COLUMN temp → 3 columns
  4. Positional mapping shifts: values[2] (which was temp) now maps to email
  5. Data corruption or SQL error on replay

Impact

  • Silent data corruption — values end up in wrong columns with no error
  • Affects only the push path (client → server via logical replay). Pull path (server → client) operates at the physical level (WAL frames) and is not affected.

Possible Mitigations

  • Store column names (not just positions) in CDC records
  • Fence/barrier that forces schema version agreement before replaying data operations
  • Detect schema changes (specifically column drops) and force a full re-sync
  • Reject DROP COLUMN when sync-engine is active (defensive, least ideal)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions