Skip to content
15 changes: 8 additions & 7 deletions crates/codegraph-core/src/insert_nodes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,44 +6,45 @@

use std::collections::HashMap;

use napi_derive::napi;
use rusqlite::{params, Connection};
use serde::{Deserialize, Serialize};

// ── Input types (received from JS via napi) ─────────────────────────

#[napi(object)]
/// Child node of a definition (parameter, nested function, etc.).
///
/// Deserialized via serde (not napi object conversion) so that `null` visibility
/// maps to `None` instead of crashing napi's `Option<String>` conversion (#709).
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct InsertNodesChild {
pub name: String,
pub kind: String,
pub line: u32,
#[napi(js_name = "endLine")]
#[serde(default, rename = "endLine")]
pub end_line: Option<u32>,
#[serde(default)]
pub visibility: Option<String>,
}

#[napi(object)]
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct InsertNodesDefinition {
pub name: String,
pub kind: String,
pub line: u32,
#[napi(js_name = "endLine")]
#[serde(default, rename = "endLine")]
pub end_line: Option<u32>,
#[serde(default)]
pub visibility: Option<String>,
pub children: Vec<InsertNodesChild>,
}
Comment on lines +37 to 41
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 children field lacks #[serde(default)] for defensive deserialization

end_line and visibility on both InsertNodesDefinition and InsertNodesChild correctly use #[serde(default)] to tolerate missing fields, but children: Vec<InsertNodesChild> on InsertNodesDefinition does not. If a future caller or extractor omits the children key entirely (instead of sending []), serde would return a deserialization error, surfaced as a napi error from the ? in serde_json::from_value. The JS side currently always populates children (via (def.children ?? []).map(...)) but adding the attribute maintains the same defensive pattern used for the optional fields:

Suggested change
#[serde(default)]
pub visibility: Option<String>,
pub children: Vec<InsertNodesChild>,
}
#[serde(default)]
pub children: Vec<InsertNodesChild>,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 50c4a7b. Added #[serde(default)] to InsertNodesDefinition.children, consistent with the defensive pattern on the other optional fields.


#[napi(object)]
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct InsertNodesExport {
pub name: String,
pub kind: String,
pub line: u32,
}

#[napi(object)]
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct InsertNodesBatch {
pub file: String,
Expand Down
12 changes: 10 additions & 2 deletions crates/codegraph-core/src/native_db.rs
Original file line number Diff line number Diff line change
Expand Up @@ -736,13 +736,21 @@ impl NativeDatabase {
/// Bulk-insert nodes, children, containment edges, exports, and file hashes.
/// Reuses the persistent connection instead of opening a new one.
/// Returns `true` on success, `false` on failure.
#[napi]
///
/// Batches are received as `serde_json::Value` and deserialized via serde so
/// that `null` visibility values map to `None` instead of crashing napi's
/// `Option<String>` object conversion (#709).
#[napi(ts_args_type = "batches: InsertNodesBatch[], fileHashes: FileHashEntry[], removedFiles: string[]")]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 InsertNodesBatch referenced in ts_args_type but no longer declared as a napi type

InsertNodesBatch, InsertNodesDefinition, InsertNodesChild, and InsertNodesExport all had their #[napi(object)] attribute removed in insert_nodes.rs, so they will no longer appear as interface declarations in the generated .d.ts file. However, the ts_args_type annotation here still references InsertNodesBatch[] by name in the generated TypeScript signature:

// generated .d.ts would have:
bulkInsertNodes(batches: InsertNodesBatch[], fileHashes: FileHashEntry[], removedFiles: string[]): boolean
// but InsertNodesBatch would not be declared → TypeScript error

The CI item "CI validates Rust crate compilation" is unchecked, so the generated .d.ts from this PR hasn't been verified. The tsc --noEmit check in the test plan passes against a pre-built .d.ts that still has the old type definitions. Once the Rust crate is rebuilt, downstream TypeScript consumers would see Cannot find name 'InsertNodesBatch'.

Consider either re-adding #[napi(object)] to the structs (keeping serde attributes alongside), or changing ts_args_type to use an inline structural type or any[].

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 50c4a7b. Two changes:

  1. Restored use napi_derive::napi; in insert_nodes.rs — it was removed when other structs dropped #[napi(object)], but FileHashEntry still needs it for FromNapiValue. This was the root cause of the Rust compile error (FileHashEntry: FromNapiValue is not satisfied).

  2. Replaced InsertNodesBatch[] in the ts_args_type annotation with an inline structural type, since those structs are no longer napi-exported objects. FileHashEntry remains in the generated .d.ts because it retains #[napi(object)].

pub fn bulk_insert_nodes(
&self,
batches: Vec<InsertNodesBatch>,
batches: serde_json::Value,
file_hashes: Vec<FileHashEntry>,
removed_files: Vec<String>,
) -> napi::Result<bool> {
let batches: Vec<InsertNodesBatch> = serde_json::from_value(batches)
.map_err(|e| {
napi::Error::from_reason(format!("bulk_insert_nodes: invalid batches: {e}"))
})?;
let conn = self.conn()?;
Ok(insert_nodes::do_insert_nodes(conn, &batches, &file_hashes, &removed_files)
.inspect_err(|e| eprintln!("[NativeDatabase] bulk_insert_nodes failed: {e}"))
Expand Down
26 changes: 26 additions & 0 deletions src/domain/graph/builder/pipeline.ts
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,33 @@ async function runPipelineStages(ctx: PipelineContext): Promise<void> {
if (ctx.earlyExit) return;

await parseFiles(ctx);

// Temporarily reopen nativeDb for insertNodes — it uses the WAL checkpoint
// guard internally (same pattern as feature modules). Closed again before
// resolveImports/buildEdges which don't yet have the guard (#709).
if (hadNativeDb && ctx.engineName === 'native') {
const native = loadNative();
if (native?.NativeDatabase) {
try {
ctx.nativeDb = native.NativeDatabase.openReadWrite(ctx.dbPath);
} catch {
ctx.nativeDb = undefined;
}
}
}

await insertNodes(ctx);

// Close nativeDb after insertNodes — remaining pipeline stages use JS paths.
if (ctx.nativeDb && ctx.db) {
try {
ctx.nativeDb.close();
} catch {
/* ignore close errors */
}
ctx.nativeDb = undefined;
}
Comment on lines +243 to +269
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing rusqlite WAL checkpoint before closing nativeDb

After insertNodes commits its transaction, all written data lives in the WAL file as frames written by rusqlite. Closing ctx.nativeDb without first calling wal_checkpoint(TRUNCATE) means better-sqlite3 will have to apply those WAL frames itself during resolveImports, buildEdges, and buildStructure — exactly the cross-library WAL reading scenario that PRs #715 and #717 identified as causing corruption/crashes.

Every other nativeDb close site in this file applies the checkpoint first (see line 203 and line 285 — both explicitly document "Checkpoint WAL through rusqlite before closing so better-sqlite3 never needs to apply WAL frames written by a different SQLite library (#715, #717)"). This new close block is missing the same guard:

Suggested change
// Close nativeDb after insertNodes — remaining pipeline stages use JS paths.
if (ctx.nativeDb && ctx.db) {
try {
ctx.nativeDb.close();
} catch {
/* ignore close errors */
}
ctx.nativeDb = undefined;
}
// Close nativeDb after insertNodes — remaining pipeline stages use JS paths.
if (ctx.nativeDb && ctx.db) {
// Checkpoint WAL through rusqlite before closing so better-sqlite3 never
// needs to apply WAL frames written by a different SQLite library (#715, #717).
try {
ctx.nativeDb.exec('PRAGMA wal_checkpoint(TRUNCATE)');
} catch {
/* ignore checkpoint errors */
}
try {
ctx.nativeDb.close();
} catch {
/* ignore close errors */
}
ctx.nativeDb = undefined;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0f3cb1e. Added the WAL checkpoint (TRUNCATE) before close, matching the pattern at lines 199-212 and 281-294 in the same file. This was indeed causing the SQLITE_CORRUPT errors visible in CI — better-sqlite3 was reading WAL frames committed by rusqlite during resolveImports/buildEdges/buildStructure.


await resolveImports(ctx);
await buildEdges(ctx);
await buildStructure(ctx);
Expand Down
13 changes: 7 additions & 6 deletions src/domain/graph/builder/stages/insert-nodes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,16 +39,17 @@ interface PrecomputedFileData {
// ── Native fast-path ─────────────────────────────────────────────────

function tryNativeInsert(ctx: PipelineContext): boolean {
// Disabled: bulkInsertNodes corrupts the DB when both the JS (better-sqlite3)
// and Rust (rusqlite) connections are open to the same WAL-mode file.
// The native path was never operational before — it always crashed on null
// visibility serialisation. See #696 for the dual-connection fix.
if (ctx.db) return false;

// Use NativeDatabase persistent connection (Phase 6.15+).
// Standalone napi functions were removed in 6.17 — falls through to JS if nativeDb unavailable.
if (!ctx.nativeDb?.bulkInsertNodes) return false;

// WAL checkpoint: flush WAL to main DB file so the native (rusqlite) connection
// sees a clean slate. This is the same dual-connection guard used by feature
// modules (ast, cfg, complexity, dataflow). See #696, #709.
if (ctx.db) {
ctx.db.pragma('wal_checkpoint(TRUNCATE)');
}

const { allSymbols, filesToParse, metadataUpdates, rootDir, removed } = ctx;

// Marshal allSymbols → InsertNodesBatch[]
Expand Down
Loading