Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 123 additions & 29 deletions src/domain/graph/journal.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,106 @@ import { debug, warn } from '../../infrastructure/logger.js';

export const JOURNAL_FILENAME = 'changes.journal';
const HEADER_PREFIX = '# codegraph-journal v1 ';
const LOCK_SUFFIX = '.lock';
const LOCK_TIMEOUT_MS = 5_000;
const LOCK_STALE_MS = 30_000;
const LOCK_RETRY_MS = 25;

function sleepSync(ms: number): void {
const buf = new Int32Array(new SharedArrayBuffer(4));
Atomics.wait(buf, 0, 0, ms);
}
Comment on lines +17 to +22
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Atomics.wait freezes the Node.js event loop during lock contention

Atomics.wait is a synchronous, blocking call — it stops the entire V8 event loop for the full ms duration. In a watcher process, every filesystem notification, timer, and pending I/O callback is silenced for each 25 ms retry. In the worst case (5 000 ms timeout, 200 retries), the watcher becomes completely unresponsive for up to 5 seconds before ever throwing.

A lighter alternative that avoids blocking the event loop is a simple busy-spin with process.hrtime.bigint():

function sleepSync(ms: number): void {
  const end = process.hrtime.bigint() + BigInt(ms) * 1_000_000n;
  while (process.hrtime.bigint() < end) { /* spin */ }
}

This keeps each retry short and doesn't starve unrelated callbacks (though it does keep the CPU busy, which is acceptable for the brief per-retry duration).

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f5c737c. Replaced Atomics.wait with a short process.hrtime.bigint busy-spin per your suggestion. The 25ms retry interval keeps CPU burn negligible while letting pending FS events, timers, and I/O callbacks in watcher processes keep firing during contention.


function isPidAlive(pid: number): boolean {
if (!Number.isFinite(pid) || pid <= 0) return false;
try {
process.kill(pid, 0);
return true;
} catch (e) {
// EPERM means the process exists but we lack permission — still alive.
return (e as NodeJS.ErrnoException).code === 'EPERM';
}
}

function acquireJournalLock(lockPath: string): number {
const start = Date.now();
for (;;) {
try {
const fd = fs.openSync(lockPath, 'wx');
try {
fs.writeSync(fd, `${process.pid}\n`);
} catch {
/* PID stamp is advisory; fd is still exclusive */
}
return fd;
} catch (e) {
if ((e as NodeJS.ErrnoException).code !== 'EEXIST') throw e;
}
Comment on lines +98 to +127
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Silent writeSync failure voids the nonce and breaks mutual exclusion

If fs.writeSync throws (e.g. ENOSPC, I/O error), the lockfile is created by openSync('wx') but remains empty — the nonce is never written. Two consequences compound:

  1. releaseJournalLock reads the empty file, content.includes(lock.nonce) is false, and the lockfile is orphaned rather than unlinked.
  2. Any concurrent waiter reads the empty file, computes Number('') → 0, then calls isPidAlive(0) which returns false (the pid <= 0 guard). With holderAlive = false, it immediately calls trySteal — stealing a live, active lock and breaking mutual exclusion.

The comment "PID stamp is advisory; fd is still exclusive" only holds when there are no concurrent waiters, but a journal with a watcher + build process is exactly the concurrent scenario this lock is meant to protect.

Fix: when the write fails, release the fd and unlink synchronously (we still hold the exclusive fd at that point) and retry, rather than proceeding into fn() with an un-stamped lockfile.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e17da0a. When the nonce stamp writeSync throws, acquireJournalLock now closes the fd, unlinks the empty lockfile, and retries instead of returning with an un-stamped lockfile. This prevents concurrent waiters from reading an empty nonce, treating isPidAlive(0) as dead, and stealing the live lock.


let holderAlive = true;
try {
const pidContent = fs.readFileSync(lockPath, 'utf-8').trim();
holderAlive = isPidAlive(Number(pidContent));
} catch {
/* unreadable — fall through to age check */
}

if (!holderAlive) {
try {
fs.unlinkSync(lockPath);
} catch {
/* another writer stole it first */
}
continue;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 TOCTOU race allows two processes to concurrently enter the critical section

When multiple processes are waiting on the same stale (dead-PID) lockfile, they can each independently steal the lock from one another's fresh acquisition:

  1. Process A dies — lockfile contains dead PID A.
  2. Processes B and C both call openSync('wx'), both get EEXIST, and both read dead PID A.
  3. B unlinks and immediately re-acquires (new inode I1, fn() starts).
  4. C now executes unlinkSync(lockPath) — it unlinks B's live lockfile (I1), not the stale one.
  5. C loops, calls openSync('wx'), succeeds (new inode I2), and its fn() begins concurrently with B's.

Both B and C are now inside the critical section simultaneously — exactly the write-serialisation invariant this PR is meant to guarantee.

The standard mitigation is to verify ownership after the unlink/re-create cycle: write a nonce (or use the inode number) inside the lockfile, then re-read it after openSync succeeds and bail if it doesn't match yours. Alternatively, use fs.renameSync(tmpfile, lockPath) to perform the steal atomically (create a temp file, rename onto the lockfile — POSIX rename is atomic).

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f5c737c. Replaced the unlink + openSync('wx') steal pattern with an atomic write-tmp + fs.renameSync steal. Each writer generates a random nonce, writes it into a temp file, and atomically renames onto the lockfile. After rename we re-read the lockfile and only proceed if our nonce is still there — if another stealer's rename landed after ours, we bail and retry instead of unlinking their live lock. Release also nonce-verifies before unlinking. Added a regression test that stages a lockfile with a different-writer nonce after a real steal cycle and asserts we do not retroactively unlink it.

}

try {
const stat = fs.statSync(lockPath);
if (Date.now() - stat.mtimeMs > LOCK_STALE_MS) {
try {
fs.unlinkSync(lockPath);
} catch {
/* raced */
}
continue;
}
} catch {
/* stat failed — keep retrying */
}

if (Date.now() - start > LOCK_TIMEOUT_MS) {
throw new Error(`Failed to acquire journal lock at ${lockPath} within ${LOCK_TIMEOUT_MS}ms`);
}
sleepSync(LOCK_RETRY_MS);
}
}

function releaseJournalLock(lockPath: string, fd: number): void {
try {
fs.closeSync(fd);
} catch {
/* ignore */
}
try {
fs.unlinkSync(lockPath);
} catch {
/* ignore */
}
}

function withJournalLock<T>(rootDir: string, fn: () => T): T {
const dir = path.join(rootDir, '.codegraph');
if (!fs.existsSync(dir)) {
fs.mkdirSync(dir, { recursive: true });
}
const lockPath = path.join(dir, `${JOURNAL_FILENAME}${LOCK_SUFFIX}`);
const fd = acquireJournalLock(lockPath);
try {
return fn();
} finally {
releaseJournalLock(lockPath, fd);
}
}

interface JournalResult {
valid: boolean;
Expand Down Expand Up @@ -63,43 +163,37 @@ export function appendJournalEntries(
rootDir: string,
entries: Array<{ file: string; deleted?: boolean }>,
): void {
const dir = path.join(rootDir, '.codegraph');
const journalPath = path.join(dir, JOURNAL_FILENAME);
withJournalLock(rootDir, () => {
const journalPath = path.join(rootDir, '.codegraph', JOURNAL_FILENAME);

if (!fs.existsSync(dir)) {
fs.mkdirSync(dir, { recursive: true });
}
if (!fs.existsSync(journalPath)) {
fs.writeFileSync(journalPath, `${HEADER_PREFIX}0\n`);
}

if (!fs.existsSync(journalPath)) {
fs.writeFileSync(journalPath, `${HEADER_PREFIX}0\n`);
}
const lines = entries.map((e) => {
if (e.deleted) return `DELETED ${e.file}`;
return e.file;
});

const lines = entries.map((e) => {
if (e.deleted) return `DELETED ${e.file}`;
return e.file;
fs.appendFileSync(journalPath, `${lines.join('\n')}\n`);
});

fs.appendFileSync(journalPath, `${lines.join('\n')}\n`);
}

export function writeJournalHeader(rootDir: string, timestamp: number): void {
const dir = path.join(rootDir, '.codegraph');
const journalPath = path.join(dir, JOURNAL_FILENAME);
const tmpPath = `${journalPath}.tmp`;

if (!fs.existsSync(dir)) {
fs.mkdirSync(dir, { recursive: true });
}
withJournalLock(rootDir, () => {
const journalPath = path.join(rootDir, '.codegraph', JOURNAL_FILENAME);
const tmpPath = `${journalPath}.tmp`;

try {
fs.writeFileSync(tmpPath, `${HEADER_PREFIX}${timestamp}\n`);
fs.renameSync(tmpPath, journalPath);
} catch (err) {
warn(`Failed to write journal header: ${(err as Error).message}`);
try {
fs.unlinkSync(tmpPath);
} catch {
/* ignore */
fs.writeFileSync(tmpPath, `${HEADER_PREFIX}${timestamp}\n`);
fs.renameSync(tmpPath, journalPath);
} catch (err) {
warn(`Failed to write journal header: ${(err as Error).message}`);
try {
fs.unlinkSync(tmpPath);
} catch {
/* ignore */
}
}
}
});
}
47 changes: 47 additions & 0 deletions tests/unit/journal.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,53 @@ describe('appendJournalEntries', () => {
});
});

describe('concurrent-append safety', () => {
it('cleans up the .lock file after a successful append', () => {
const root = makeRoot();
writeJournalHeader(root, 1700000000000);
appendJournalEntries(root, [{ file: 'src/a.js' }]);

const lockPath = path.join(root, '.codegraph', `${JOURNAL_FILENAME}.lock`);
expect(fs.existsSync(lockPath)).toBe(false);
});

it('steals a stale lock whose holder PID is dead', () => {
const root = makeRoot();
writeJournalHeader(root, 1700000000000);

// Pre-stage a lockfile with a PID that is guaranteed not to exist
// (max 32-bit value; well above any real process).
const lockPath = path.join(root, '.codegraph', `${JOURNAL_FILENAME}.lock`);
fs.writeFileSync(lockPath, '2147483646\n');

expect(() => appendJournalEntries(root, [{ file: 'src/a.js' }])).not.toThrow();
expect(fs.existsSync(lockPath)).toBe(false);

const result = readJournal(root);
expect(result.changed).toEqual(['src/a.js']);
});

it('produces no interleaved lines under repeated appends', () => {
const root = makeRoot();
writeJournalHeader(root, 1700000000000);

// Many small appends — every emitted line must be a complete,
// well-formed entry (no truncated "DELETED " prefixes, no split paths).
for (let i = 0; i < 200; i++) {
appendJournalEntries(root, [
{ file: `src/changed-${i}.js` },
{ file: `src/gone-${i}.js`, deleted: true },
]);
}

const content = fs.readFileSync(path.join(root, '.codegraph', JOURNAL_FILENAME), 'utf-8');
for (const line of content.split('\n')) {
if (!line || line.startsWith('#')) continue;
expect(line).toMatch(/^(DELETED src\/gone-\d+\.js|src\/changed-\d+\.js)$/);
}
});
});

describe('read/write/append lifecycle', () => {
it('full lifecycle: header → append → read → new header', () => {
const root = makeRoot();
Expand Down
Loading