Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 6 additions & 2 deletions crates/codegraph-core/src/insert_nodes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,9 @@ pub(crate) fn do_insert_nodes(
// Definitions
for def in &batch.definitions {
let scope: Option<&str> = def.name.rfind('.').map(|i| &def.name[..i]);
// .as_deref() converts Option<String> → Option<&str> so rusqlite
// serialises None as SQL NULL unambiguously (#709).
let vis = def.visibility.as_deref();
stmt.execute(params![
&def.name,
&def.kind,
Expand All @@ -131,7 +134,7 @@ pub(crate) fn do_insert_nodes(
None::<i64>,
&def.name,
scope,
&def.visibility
vis
])?;
}

Expand Down Expand Up @@ -198,6 +201,7 @@ pub(crate) fn do_insert_nodes(

for child in &def.children {
let qname = format!("{}.{}", def.name, child.name);
let child_vis = child.visibility.as_deref();
child_stmt.execute(params![
&child.name,
&child.kind,
Expand All @@ -207,7 +211,7 @@ pub(crate) fn do_insert_nodes(
def_id,
&qname,
&def.name,
&child.visibility
child_vis
])?;
}
}
Expand Down
52 changes: 42 additions & 10 deletions src/domain/graph/builder/stages/insert-nodes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@
import path from 'node:path';
import { performance } from 'node:perf_hooks';
import { bulkNodeIdsByFile } from '../../../../db/index.js';
import { loadNative } from '../../../../infrastructure/native.js';
import type {
BetterSqlite3Database,
ExtractorOutput,
MetadataUpdate,
NativeDatabase,
SqliteStatement,
} from '../../../../types.js';
import type { PipelineContext } from '../context.js';
Expand All @@ -39,15 +41,28 @@ interface PrecomputedFileData {
// ── Native fast-path ─────────────────────────────────────────────────

function tryNativeInsert(ctx: PipelineContext): boolean {
// Disabled: bulkInsertNodes corrupts the DB when both the JS (better-sqlite3)
// and Rust (rusqlite) connections are open to the same WAL-mode file.
// The native path was never operational before — it always crashed on null
// visibility serialisation. See #696 for the dual-connection fix.
if (ctx.db) return false;

// Use NativeDatabase persistent connection (Phase 6.15+).
// Standalone napi functions were removed in 6.17 — falls through to JS if nativeDb unavailable.
if (!ctx.nativeDb?.bulkInsertNodes) return false;
// Open a temporary native connection for the insert. The pipeline closes
// ctx.nativeDb before pipeline stages to prevent dual-connection WAL
// corruption (#696). We open our own connection and coordinate with
// suspendJsDb / resumeJsDb WAL checkpoints (#709, same pattern as the
// analysis stages in features/).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Comment references helpers that aren't used

The comment says the function "coordinate[s] with suspendJsDb / resumeJsDb WAL checkpoints… same pattern as the analysis stages." But those callbacks are not called here — the equivalent logic is inlined directly. Readers who then look at dataflow.ts or ast.ts will find suspendJsDb?.() / resumeJsDb?.() called on engineOpts, which is a different mechanism.

The inline approach is fine, but the comment overstates the similarity and may mislead future authors into thinking they can safely call ctx.engineOpts.suspendJsDb from here (which would not make sense since that callback uses ctx.nativeDb, already closed at this point).

Suggested change
// Open a temporary native connection for the insert. The pipeline closes
// ctx.nativeDb before pipeline stages to prevent dual-connection WAL
// corruption (#696). We open our own connection and coordinate with
// suspendJsDb / resumeJsDb WAL checkpoints (#709, same pattern as the
// analysis stages in features/).
// Open a temporary native connection for the insert. The pipeline closes
// ctx.nativeDb before pipeline stages to prevent dual-connection WAL
// corruption (#696). We open our own connection and issue WAL checkpoints
// manually (flush JS WAL before open, flush native WAL before close) to
// coordinate safely (#709).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ec903cc. Updated the comment to accurately describe what happens — the logic is inlined, not delegated to suspendJsDb/resumeJsDb:

// Open a temporary native connection for the insert.  The pipeline closes
// ctx.nativeDb before pipeline stages to prevent dual-connection WAL
// corruption (#696).  We open our own connection and issue WAL checkpoints
// manually (flush JS WAL before open, flush native WAL before close) to
// coordinate safely (#709).

This matches the suggested wording and avoids misleading future contributors into thinking suspendJsDb/resumeJsDb are relevant here.

const native = loadNative();
if (!native?.NativeDatabase) return false;

let nativeDb: NativeDatabase | undefined;
try {
nativeDb = native.NativeDatabase.openReadWrite(ctx.dbPath);
} catch {
return false;
}
if (!nativeDb?.bulkInsertNodes) {
try {
nativeDb?.close();
} catch {
/* ignore */
}
return false;
}

const { allSymbols, filesToParse, metadataUpdates, rootDir, removed } = ctx;

Expand Down Expand Up @@ -144,7 +159,24 @@ function tryNativeInsert(ctx: PipelineContext): boolean {
fileHashes.push({ file: item.relPath, hash: item.hash, mtime, size });
}

return ctx.nativeDb!.bulkInsertNodes(batches, fileHashes, removed);
// WAL checkpoint dance: flush JS WAL so the native connection starts clean,
// do the write, then flush the native WAL so better-sqlite3 sees the changes.
try {
ctx.db.pragma('wal_checkpoint(TRUNCATE)');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 WAL checkpoint result silently ignored

ctx.db.pragma('wal_checkpoint(TRUNCATE)') returns { busy, log, checkpointed } and busy > 0 means the WAL could not be fully flushed because an active reader holds a shared lock. The code proceeds to open the rusqlite connection and write to the WAL regardless. If better-sqlite3's WAL frames haven't been moved to the main DB file, both SQLite instances end up appending frames to the same WAL in the same run — which is the exact scenario the checkpoint dance is designed to prevent.

The same unconditional call exists in pipeline.ts's suspendJsDb, so this pattern is established, but there it operates against a persistent connection that is about to be closed. Here the better-sqlite3 connection stays open the whole time. Consider at minimum logging a warning when busy > 0, or verifying the result and falling back to JS when the checkpoint is incomplete:

const cpResult = ctx.db.pragma('wal_checkpoint(TRUNCATE)') as { busy: number }[] | undefined;
if (cpResult?.[0]?.busy) {
  // WAL not fully flushed — fall back to JS to avoid mixed-writer WAL state
  return false;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ec903cc. The pre-write checkpoint result is now inspected:

const cpResult = ctx.db.pragma('wal_checkpoint(TRUNCATE)') as { busy: number }[] | undefined;
if (cpResult?.[0]?.busy) {
  // WAL not fully flushed — fall back to JS to avoid mixed-writer WAL state
  return false;
}

When busy > 0, tryNativeInsert returns false and the JS fallback handles the insert instead. This prevents proceeding with a partially-flushed WAL.

const ok = nativeDb!.bulkInsertNodes(batches, fileHashes, removed);
try {
nativeDb!.exec('PRAGMA wal_checkpoint(TRUNCATE)');
} catch {
/* ignore — checkpoint failure is non-fatal */
}
return ok;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Native WAL checkpoint issued before nativeDb.close()

The post-write native checkpoint (nativeDb!.exec('PRAGMA wal_checkpoint(TRUNCATE)')) runs while nativeDb is still open, which is correct. However, SQLite's TRUNCATE mode only truncates WAL pages that no reader is holding; with ctx.db (better-sqlite3) open as an active connection, the TRUNCATE may leave WAL frames in place even after close.

A more defensive approach would be to issue the post-write checkpoint through ctx.db after nativeDb is fully closed (so rusqlite holds no locks during the checkpoint), ensuring better-sqlite3 can acquire the necessary exclusive WAL lock:

// In finally, after nativeDb.close():
try { ctx.db.pragma('wal_checkpoint(TRUNCATE)'); } catch { /* ignore */ }

This would guarantee that the native WAL frames are visible to better-sqlite3 before the next stage reads from ctx.db.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ec903cc. The post-write checkpoint is now issued through ctx.db after nativeDb.close() in the finally block:

} finally {
  try {
    nativeDb!.close();
  } catch {
    /* ignore close errors */
  }
  // Post-write checkpoint through ctx.db *after* nativeDb is closed, so
  // rusqlite holds no locks and better-sqlite3 can acquire the exclusive
  // WAL lock needed for TRUNCATE (#709).
  try {
    ctx.db.pragma('wal_checkpoint(TRUNCATE)');
  } catch {
    /* ignore — checkpoint failure is non-fatal */
  }
}

This ensures rusqlite holds no locks during the TRUNCATE, so better-sqlite3 can acquire the exclusive WAL lock and fully compact the WAL frames into the main DB file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up fix in 32358c0: After further analysis of the CI SQLITE_CORRUPT failures on Linux/macOS, I identified the root cause more precisely.

The post-write checkpoint must go through nativeDb (rusqlite), NOT through ctx.db (better-sqlite3). The key insight is that better-sqlite3 must never apply WAL frames written by rusqlite — these are different SQLite libraries with potentially different WAL implementations. This is exactly what the resumeJsDb callback does in the analysis stages: it checkpoints through ctx.nativeDb (rusqlite), not ctx.db.

The corrected sequence:

  1. Pre-write: ctx.db.pragma('wal_checkpoint(TRUNCATE)') — flush JS WAL
  2. Check busy > 0 — fall back to JS if WAL not fully flushed
  3. Native write via nativeDb.bulkInsertNodes(...)
  4. Post-write: nativeDb.exec('PRAGMA wal_checkpoint(TRUNCATE)') — rusqlite flushes its own WAL frames
  5. nativeDb.close()

This matches the established suspend/resume pattern and should resolve the SQLITE_CORRUPT errors.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up fix in 38014fb: The earlier fix (32358c0) correctly checkpointed through rusqlite before close, but missed a crucial detail: better-sqlite3 keeps a cached SHM/WAL mapping that doesn't automatically refresh when another SQLite library modifies the WAL.

The final WAL coordination sequence is now:

  1. Pre-write: ctx.db.pragma('wal_checkpoint(TRUNCATE)') — flush JS WAL
  2. Busy check — fall back to JS if WAL not fully flushed
  3. Native write via nativeDb.bulkInsertNodes(...)
  4. Post-write: nativeDb.exec('PRAGMA wal_checkpoint(TRUNCATE)') — rusqlite flushes its own WAL frames to main DB
  5. nativeDb.close()
  6. New: ctx.db.pragma('wal_checkpoint(PASSIVE)') — forces better-sqlite3 to re-read the WAL header/SHM, preventing SQLITE_CORRUPT on the next read

This resolves the CI failures on Linux/macOS.

} finally {
try {
nativeDb!.close();
} catch {
/* ignore close errors */
}
}
}

// ── JS fallback: Phase 1 ────────────────────────────────────────────
Expand Down
Loading