-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Connector Name
source-postgres
Connector Version
3.7.0
What step the error happened?
During the sync
Relevant information
Summary
The Postgres source connector's CTID-based full refresh scan silently drops rows that are updated by concurrent transactions during the scan. No error or warning is logged. The sync completes successfully with a lower-than-expected row count.
Root cause
InitialSyncCtidIterator.getStream() opens a new query (and therefore a new MVCC snapshot) for each CTID batch:
private Stream<RowDataWithCtid> getStream(final Pair<Ctid, Ctid> p) throws SQLException {
return database.unsafeQuery(
connection -> getCtidStatement(connection, p.getLeft(), p.getRight()),
sourceOperations::recordWithCtid);
}In PostgreSQL, when a row is updated, the old version is marked dead and a new version is written to a different physical location (typically an earlier page via the free space map). Because each batch uses a separate snapshot, a row updated mid-scan can be invisible to all batches.
Concrete example
Consider a table orders with 10,000 pages. Airbyte splits the scan into two batches:
- Batch 1: pages 0–4999
- Batch 2: pages 5000–9999
Row id=42 currently lives at CTID (7500, 3) (page 7500, tuple slot 3).
| Time | Event | State |
|---|---|---|
| T0 | Batch 1 starts. Opens Snapshot A. Scans pages 0–4999. Row id=42 is on page 7500, outside this range. |
(7500,3) = live |
| T1 | Concurrent UPDATE orders SET status='shipped' WHERE id=42. PostgreSQL marks (7500,3) as dead and writes the new version to (2000,14) (page 2000 had free space). |
(7500,3) = dead, (2000,14) = live |
| T2 | Batch 1 finishes. Connection is closed. | |
| T3 | Batch 2 starts. Opens Snapshot B. Scans pages 5000–9999. |
Under Snapshot B (taken at T3):
(7500,3)is dead (the update at T1 is committed and visible) — skipped(2000,14)is alive but on page 2000, which is outside Batch 2's range
Under Snapshot A (taken at T0):
- Batch 1 scanned page 2000, but
(2000,14)didn't exist yet at T0 — invisible
Result: Row id=42 is returned by neither batch. It is silently dropped.
Why this happens
PostgreSQL uses MVCC (Multi-Version Concurrency Control): updates don't modify rows in place but create new physical row versions. The physical location (CTID) of the new version is unrelated to the old one. By scanning physical page ranges with different snapshots, the connector creates a window where a row's old version is dead in the later snapshot but its new version landed in a range already scanned by the earlier snapshot.
The code handles VACUUM (file node changes between batches) but does not handle concurrent row updates.
Both query strategies are affected:
ctidLegacyQueryPlan(PG <14,WHERE ctid = ANY(ARRAY[...]))ctidQueryPlan(PG 14+,WHERE ctid > ? AND ctid <= ?)
Evidence from production
- Table with ~1.1M rows, scanned in 5 batches over ~11 minutes
- Sync reported 1,125,948 records emitted; Postgres estimated 1,128,581 rows
- A specific row confirmed to exist in source before and after sync was absent from destination
- Table receives frequent updates (multiple per second)
- No error, warning, or indication of data loss in logs
Expected behavior
Full refresh should return every row that exists at a consistent point in time. No row should be silently dropped.
Possible fixes
Option 1: REPEATABLE READ transaction across all batches
Open a single REPEATABLE READ transaction before the first batch and run all batch queries within it. All batches see the same MVCC snapshot, eliminating the race.
Trade-off: the transaction holds back VACUUM from cleaning dead tuples for the duration of the scan. For very large tables with multi-hour scans, this could cause table bloat. For most tables (minutes-long scans), the impact is negligible — REPEATABLE READ is fully non-blocking and does not acquire any locks.
Option 2: Exported snapshots (pg_export_snapshot / SET TRANSACTION SNAPSHOT)
Export a snapshot ID from the first batch's transaction and import it into each subsequent batch's transaction. Each batch can use its own connection, but they all see the same consistent snapshot. The originating session must stay connected until the last batch completes. Requires PostgreSQL 9.2+.
Option 3: Primary key scanning instead of CTID scanning
Scan by WHERE id > ? ORDER BY id LIMIT ? instead of by CTID ranges. Primary key values don't change when rows are updated, so the race condition is eliminated regardless of snapshots. Slower (index scan vs sequential scan) but inherently correct and naturally resumable.
Option 4: Document the limitation
At minimum, add a warning that full refresh on tables with concurrent writes may silently drop rows, and recommend incremental sync modes for write-heavy tables.
Workaround
Switch affected streams to Incremental | Append + Deduped with a cursor column (e.g. updated_at). This avoids CTID scanning for ongoing syncs. Rows missed on the initial load self-heal when next updated.
Relevant log output
2026-02-23 19:49:07 source | io.airbyte.integrations.source.postgres.ctid.InitialSyncCtidIterator : Querying ctid from (0,0) to (3437,291)
2026-02-23 19:51:12 source | io.airbyte.integrations.source.postgres.ctid.InitialSyncCtidIterator : Querying ctid from (3437,0) to (6873,291)
2026-02-23 19:53:30 source | io.airbyte.integrations.source.postgres.ctid.InitialSyncCtidIterator : Querying ctid from (6873,0) to (10309,291)
2026-02-23 19:55:23 source | io.airbyte.integrations.source.postgres.ctid.InitialSyncCtidIterator : Querying ctid from (10309,0) to (13746,291)
2026-02-23 19:57:39 source | io.airbyte.integrations.source.postgres.ctid.InitialSyncCtidIterator : Querying ctid from (13746,0) to (17182,291)
2026-02-23 20:00:01 source | io.airbyte.integrations.source.postgres.ctid.InitialSyncCtidIterator : Finished initial sync of stream accounts. Rows read: 1125948Contribute
- Yes, I want to contribute
Internal Tracking: https://github.com/airbytehq/oncall/issues/11440