fix(source-postgres): use REPEATABLE READ snapshot for CTID batch queries to prevent silent row drops by devin-ai-integration[bot] · Pull Request #74062 · airbytehq/airbyte

devin-ai-integration · 2026-02-26T12:55:56Z

What

Resolves https://github.com/airbytehq/oncall/issues/11440:

https://github.com/airbytehq/oncall/issues/11440

The Postgres source connector's CTID-based full refresh scan silently drops rows that are updated by concurrent transactions during multi-batch scans. Each batch query previously opened a new connection (and therefore a new MVCC snapshot) via database.unsafeQuery(). Under PostgreSQL's default READ COMMITTED isolation, a row updated mid-scan could become invisible to all batches: the old CTID is dead in later snapshots, and the new CTID may fall in a range already scanned.

How

Core fix: InitialSyncCtidIterator now opens a single dedicated connection with REPEATABLE READ isolation before the first batch, and reuses it for all subsequent batch queries. This guarantees all batches see the same consistent MVCC snapshot.

DataSource threading: The DataSource is passed explicitly as a constructor parameter through the chain: PostgresSource → CtidUtils.createInitialLoader → PostgresCtidHandler → InitialSyncCtidIterator. The iterator uses dataSource.getConnection() directly to obtain the snapshot connection, avoiding any need to cast or access CDK internals.

CDK change: ⚠️ The diff also contains a one-line CDK visibility change (protected val dataSource → val dataSource in DefaultJdbcDatabase.kt) from an earlier commit. This change is now redundant since the DataSource is threaded via constructor params instead. Reviewer should decide whether to keep or revert this.

VACUUM handling: On VACUUM-triggered resets (resetSubQueries), the snapshot connection is closed and a fresh one is opened, since VACUUM invalidates the physical page layout anyway.

Streaming: A custom toStream() replaces database.unsafeQuery() to stream ResultSet rows without closing the shared connection on stream completion. A fixed fetch size of 10,000 is used.

Version: Bumped to 3.7.3-rc.1 (PATCH, with -rc.1 suffix for progressive rollout).

Review guide

DefaultJdbcDatabase.kt — ⚠️ Redundant change: Makes dataSource public. This was part of an earlier approach but is no longer needed since DataSource is now threaded via constructor params. Consider reverting this change to avoid expanding CDK API surface unnecessarily.
PostgresSource.java — Stores the most recently created DataSource in currentDataSource field and passes it to all createInitialLoader call sites (3 locations). Note: This field is not synchronized; if createDatabase() is called concurrently, race conditions could occur (though unlikely in practice).
CtidUtils.java — Adds DataSource parameter to createInitialLoader() and passes it to PostgresCtidHandler constructor.
PostgresCtidHandler.java — Adds DataSource field and constructor parameter, passes it to InitialSyncCtidIterator constructor.
PostgresCdcCtidInitializer.java — Adds DataSource parameter to cdcCtidIteratorsCombined() and passes it to createInitialLoader().
InitialSyncCtidIterator.java — The main fix. Key areas:
- openSnapshotConnection() / closeSnapshotConnection() — Connection lifecycle management. Uses dataSource.getConnection() directly (no more instanceof cast).
- getStream() — Now uses the shared snapshotConnection directly instead of database.unsafeQuery().
- toStream() — Custom spliterator. Note: PreparedStatements are not explicitly closed per-batch; they are cleaned up when the snapshot connection closes. Verify this is acceptable for scans with many batches (potential memory accumulation).
- SNAPSHOT_FETCH_SIZE = 10_000 — Replaces adaptive fetch size tuning from AdaptiveStreamingQueryConfig. For tables with very wide rows this could increase memory usage vs. the previous adaptive approach.
- initSubQueries() and resetSubQueries() — Snapshot connection is opened at scan start and reopened on VACUUM resets. Both now throw SQLException.
- close() — Properly closes both the current iterator and the snapshot connection.
metadata.yaml / postgres.md — Version bump to 3.7.3-rc.1 and changelog entry.

⚠️ Items for human reviewer

Redundant CDK change: The DefaultJdbcDatabase.kt change is no longer needed. Should it be reverted to avoid expanding CDK API surface?
No new tests: This PR does not add unit/integration tests for the snapshot connection behavior. Reproduction requires concurrent writes during a multi-batch CTID scan, which is difficult to test in isolation. Consider whether tests should be added.
Long-running REPEATABLE READ transactions: For very large tables, the snapshot connection may be held open for minutes to hours, preventing VACUUM from cleaning dead tuples created during the scan. This is a known trade-off for correctness but could cause table bloat.
Thread safety: PostgresSource.currentDataSource is not synchronized. Verify that createDatabase() is never called concurrently in production.

User Impact

Positive: Eliminates silent data loss for full refresh syncs on tables with concurrent writes. Rows updated during the scan will no longer be dropped.

Negative:

Long-running transactions: The REPEATABLE READ transaction is held open for the entire CTID scan (potentially minutes to hours for large tables). This prevents PostgreSQL from VACUUMing dead tuples created during the scan, which could cause table bloat if scans are frequent and long-running. This is a known trade-off for correctness.
Fixed fetch size: Loss of adaptive fetch size tuning may increase memory usage for tables with very wide rows compared to the previous implementation.

Can this PR be safely reverted and rolled back?

YES 💚

The change is isolated to CTID scanning logic. Reverting would restore the previous behavior (separate snapshots per batch, with the silent row drop bug).

Session: https://app.devin.ai/sessions/be8cc3db060a4bc3b6e8d4b5b271879a
Requested by: bot_apk (apk@cognition.ai)

…ries Previously, each CTID batch query in InitialSyncCtidIterator opened a new database connection via database.unsafeQuery(), which meant each batch got its own MVCC snapshot under PostgreSQL's default READ COMMITTED isolation. When rows were updated by concurrent transactions during a multi-batch scan, the old CTID became dead in later snapshots and the new CTID could land in a range already scanned — causing rows to be silently dropped. This fix opens a single dedicated connection with REPEATABLE READ isolation at the start of the CTID scan and reuses it for all batch queries. This ensures all batches see the same consistent snapshot, eliminating the window where concurrent updates can cause rows to be missed. Key changes: - Add getDataSource() to DefaultJdbcDatabase for connection access - Open a REPEATABLE READ snapshot connection in initSubQueries() - Reuse the snapshot connection for all CTID batch queries - Properly close and reopen on VACUUM resets - Close the snapshot connection on iterator close Co-Authored-By: bot_apk <apk@cognition.ai>

devin-ai-integration · 2026-02-26T12:56:00Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

github-actions · 2026-02-26T12:56:20Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

🛠️ Quick Fixes
- /format-fix - Fixes most formatting issues.
- /bump-version - Bumps connector versions, scraping changelog description from the PR title.
❇️ AI Testing and Review (internal link: AI-SDLC Docs):
- /ai-prove-fix - Runs prerelease readiness checks, including testing against customer connections.
- /ai-canary-prerelease - Rolls out prerelease to 5-10 connections for canary testing.
- /ai-review - AI-powered PR review for connector safety and quality gates.
🚀 Connector Releases:
- /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-preview.{git-sha}) for all modified connectors in the PR.
- /bump-progressive-rollout-version - Bumps connector version with an RC suffix (2.16.10-rc.1) for progressive rollouts (enableProgressiveRollout: true).
  - Example: /bump-progressive-rollout-version changelog="Add new feature for progressive rollout"
☕️ JVM connectors:
- /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
  Example: /update-connector-cdk-version connector=destination-bigquery
- /bump-bulk-cdk-version bump=patch changelog='foo' - Bump the Bulk CDK's version. bump can be major/minor/patch.
🐍 Python connectors:
- /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
- /poe source example lock - Alias for /poe connector source-example lock.
- /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
- /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.
⚙️ Admin commands:
- /force-merge reason="<REASON>" - Force merges the PR using admin privileges, bypassing CI checks. Requires a reason.
  Example: /force-merge reason="CI is flaky, tests pass locally"

📚 Show Repo Guidance

Helpful Resources

Breaking Changes Guide - Breaking changes, migration guides, and upgrade deadlines
Developing Connectors Locally
Managing Connector Secrets
On-Demand Regression Tests
#connector-ci-issues
#connector-publish-updates
#connector-build-statuses

📝 Edit this welcome message.

github-actions · 2026-02-26T12:59:08Z

`source-postgres` Connector Test Results

331 tests 330 ✅ 13m 26s ⏱️
38 suites 1 💤
38 files 0 ❌

Results for commit 27f2f6f.

♻️ This comment has been updated with latest results.

- Change dataSource from protected to public in DefaultJdbcDatabase.kt (Kotlin auto-generates getDataSource() from val property; explicit getDataSource() method caused JVM signature clash) - Apply Google Java Format conventions to InitialSyncCtidIterator.java (line wrapping, Javadoc formatting) Co-Authored-By: bot_apk <apk@cognition.ai>

Version bump for CTID snapshot isolation fix. Uses -rc.1 suffix for progressive rollout. Co-Authored-By: bot_apk <apk@cognition.ai>

Co-Authored-By: bot_apk <apk@cognition.ai>

github-actions · 2026-02-26T13:07:08Z

Deploy preview for airbyte-docs ready!

✅ Preview
https://airbyte-docs-eh3o6q9k3-airbyte-growth.vercel.app

Built with commit 27f2f6f.
This pull request is being automatically deployed with vercel-action

…pply) Co-Authored-By: bot_apk <apk@cognition.ai>

…otected field Thread DataSource through the constructor chain: PostgresSource → CtidUtils.createInitialLoader → PostgresCtidHandler → InitialSyncCtidIterator This avoids the CDK dependency issue where the connector CI builds against published CDK artifacts that don't have the getDataSource() method public. Co-Authored-By: bot_apk <apk@cognition.ai>

The DataSource is now threaded via constructor params, so the CDK visibility change is no longer needed. Co-Authored-By: bot_apk <apk@cognition.ai>

octavia-bot · 2026-02-26T15:03:08Z

🔍 AI Prove Fix session starting... Running readiness checks and testing against customer connections. View playbook

🔗 Session URL: https://app.devin.ai/sessions/5893e3ae1180482b952cff68c47dc5e8

✅ Devin AI session created successfully!

devin-ai-integration · 2026-02-26T15:03:19Z

↪️ Triggering /ai-prove-fix per Hands-Free AI Triage Project triage next step.

Reason: Draft PR with CI passing (36/36 checks, 331 tests passed). Fix uses REPEATABLE READ snapshot isolation for CTID batch queries to prevent silent row drops during concurrent writes.
https://github.com/airbytehq/oncall/issues/11440

Devin session

devin-ai-integration · 2026-02-26T15:05:06Z

Fix Validation Evidence

Outcome: Could not Run Tests (live connection testing blocked on approval; regression tests passed)

Evidence Summary

Regression tests ran the pre-release version (3.7.3-rc.1-preview.27f2f6f) against the current production version in comparison mode. All 4 commands passed (SPEC, CHECK, DISCOVER, READ), confirming no regressions introduced by the fix.

Live connection testing on 2 qualified internal canary connections was blocked because the set_cloud_connector_version_override tool requires an approval comment from a GitHub user with a public @airbyte.io email, and no such comment exists on the PR or linked oncall issue. An approval request has been posted to the oncall issue.

Next Steps

To unblock live testing: An Airbyte team member with a public @airbyte.io email should reply to oncall issue 11440 approving the evidence plan.
Re-run /ai-prove-fix on this PR after approval is posted — the pre-release image and qualified connections are ready.
Alternatively, for broader validation, run /ai-canary-prerelease to test on additional connections.
The weekly /ai-release-manager will automatically monitor the release rollout after merge.

Connector & PR Details

Connector: source-postgres
PR: #74062
Pre-release Version Tested: 3.7.3-rc.1-preview.27f2f6f
Pre-release Publish: https://github.com/airbytehq/airbyte/actions/runs/22447948102 (completed successfully)
Detailed Results: https://github.com/airbytehq/oncall/issues/11440#issuecomment-3967330020

Evidence Plan

Proving Criteria

Regression test passes in comparison mode (target vs control) — MET
Live connection sync succeeds on internal canary — NOT TESTED (blocked on approval)
Sync logs show REPEATABLE READ snapshot connection — NOT TESTED (blocked on approval)

Disproving Criteria

Regression test fails or shows different output — NOT triggered
Live sync fails with new errors — NOT TESTED
Missing snapshot log message — NOT TESTED

Cases Attempted

Regression Tests: All 4 commands passed (SPEC, CHECK, DISCOVER, READ) — workflow
Case 1 (Internal AWS canary): Qualified, ready for pinning — blocked on approval
Case 2 (Internal EU canary): Qualified, ready for pinning — blocked on approval

Pre-flight Checks

Viability: Fix addresses the reported issue — opens single REPEATABLE READ snapshot connection for all CTID batches instead of per-batch READ COMMITTED connections
Safety: No malicious code or dangerous patterns — changes scoped to CTID scanning logic and DataSource threading
Breaking Change: No breaking changes detected — PATCH version bump (3.7.2-rc.1 → 3.7.3-rc.1), no schema/spec/state/stream changes
Reversibility: Can be safely downgraded/reverted — no state or config format changes

Design Intent Note: The current behavior (READ COMMITTED per batch) is clearly unintentional. No existing code uses REPEATABLE READ or snapshot isolation for CTID scans.

Detailed Evidence Log

Regression Tests (2026-02-26T15:10–15:17 UTC)

Command	Target (pre-release)	Control (production)	Comparison
SPEC	success	success	PASS
CHECK	success	success	PASS
DISCOVER	success	success	PASS
READ	success	success	PASS

Workflow: https://github.com/airbytehq/airbyte-ops-mcp/actions/runs/22448037363

Live Connection Tests

Not executed — blocked on approval comment from @airbyte.io user. Two internal canary connections qualified and ready. See oncall issue for details.

Devin session

github-actions · 2026-02-26T15:08:58Z

Pre-release Connector Publish Started

Publishing pre-release build for connector source-postgres.
PR: #74062

Pre-release versions will be tagged as {version}-preview.27f2f6f
and are available for version pinning via the scoped_configuration API.

View workflow run
Pre-release Publish: SUCCESS ✅

Docker image (pre-release):
airbyte/source-postgres:3.7.3-rc.1-preview.27f2f6f

Docker Hub: https://hub.docker.com/layers/airbyte/source-postgres/3.7.3-rc.1-preview.27f2f6f

Registry JSON:

OSS Registry

Cloud Registry

Octavia Squidington III (octavia-squidington-iii) added the connectors/source/postgres label Feb 26, 2026

devin-ai-integration bot and others added 3 commits February 26, 2026 13:01

chore: bump source-postgres to 3.7.3-rc.1 with changelog

20ce33f

Version bump for CTID snapshot isolation fix. Uses -rc.1 suffix for progressive rollout. Co-Authored-By: bot_apk <apk@cognition.ai>

style: fix Spotless Javadoc formatting in InitialSyncCtidIterator

b63d1ea

Co-Authored-By: bot_apk <apk@cognition.ai>

devin-ai-integration bot and others added 3 commits February 26, 2026 13:09

style: apply Spotless auto-formatting (ran locally via mvn spotless:a…

95c04fb

…pply) Co-Authored-By: bot_apk <apk@cognition.ai>

revert: restore protected visibility on DefaultJdbcDatabase.dataSource

27f2f6f

The DataSource is now threaded via constructor params, so the CDK visibility change is no longer needed. Co-Authored-By: bot_apk <apk@cognition.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(source-postgres): use REPEATABLE READ snapshot for CTID batch queries to prevent silent row drops#74062

fix(source-postgres): use REPEATABLE READ snapshot for CTID batch queries to prevent silent row drops#74062
devin-ai-integration[bot] wants to merge 7 commits intomasterfrom
devin/1772110320-fix-ctid-snapshot-isolation

devin-ai-integration bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

devin-ai-integration bot commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

PR Slash Commands

Helpful Resources

Uh oh!

github-actions bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

octavia-bot bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

devin-ai-integration bot commented Feb 26, 2026

Uh oh!

devin-ai-integration bot commented Feb 26, 2026 •

edited

Loading

Proving Criteria

Disproving Criteria

Cases Attempted

Regression Tests (2026-02-26T15:10–15:17 UTC)

Live Connection Tests

Uh oh!

github-actions bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devin-ai-integration bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Review guide

⚠️ Items for human reviewer

User Impact

Can this PR be safely reverted and rolled back?

Uh oh!

devin-ai-integration bot commented Feb 26, 2026

🤖 Devin AI Engineer

Uh oh!

github-actions bot commented Feb 26, 2026

👋 Greetings, Airbyte Team Member!

PR Slash Commands

Helpful Resources

Uh oh!

github-actions bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

source-postgres Connector Test Results

Uh oh!

github-actions bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

octavia-bot bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration bot commented Feb 26, 2026

Uh oh!

devin-ai-integration bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix Validation Evidence

Evidence Summary

Proving Criteria

Disproving Criteria

Cases Attempted

Regression Tests (2026-02-26T15:10–15:17 UTC)

Live Connection Tests

Uh oh!

github-actions bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devin-ai-integration bot commented Feb 26, 2026 •

edited

Loading

github-actions bot commented Feb 26, 2026 •

edited

Loading

`source-postgres` Connector Test Results

github-actions bot commented Feb 26, 2026 •

edited

Loading

octavia-bot bot commented Feb 26, 2026 •

edited

Loading

devin-ai-integration bot commented Feb 26, 2026 •

edited

Loading

github-actions bot commented Feb 26, 2026 •

edited

Loading