Skip to content

Codex heartbeat ack fails due to Node.js version mismatch (system v18 vs compiled v22) #445

@voyax

Description

@voyax

Summary

Codex bot's heartbeat ack fails because it runs c4-control.js with the system Node.js (v18) instead of the nvm-managed Node.js (v22). Since better-sqlite3 in comm-bridge was compiled with Node v22, this causes ERR_DLOPEN_FAILED, making ack permanently fail and trapping the bot in a recovery restart loop.

Fault Chain (observed on production VM)

  1. 08:32 SGT — Codex session context reaches 75%, triggers normal session rotation
  2. 08:51 SGT — New session starts, heartbeat is not acked → health transitions to RECOVERING
  3. Recovery mechanism sends heartbeat every 5 minutes (with 300s available_in delay), but recovery_timeout also fires within ~5 minutes → restart loop
  4. User messages reset cooldown, further delaying heartbeat delivery

Root Cause

The VM's default Node.js is v18 (system-installed), but comm-bridge's better-sqlite3 native addon was compiled against Node v22 (nvm-managed). When the Codex bot runs c4-control.js ack from tmux, it resolves to system node (v18), producing:

ERR_DLOPEN_FAILED

The ack never succeeds → health never transitions from recovering to ok → activity-monitor keeps restarting the bot.

Workaround

Manually run ack with the correct Node version:

~/.nvm/versions/node/v22.22.0/bin/node c4-control.js ack --id <id>

Expected Fix

Ensure that all zylos-core scripts (especially those invoked by the Codex bot in tmux, such as c4-control.js) use the nvm-managed Node.js rather than falling back to the system Node.js. Possible approaches:

  1. Shebang or wrapper: Add explicit nvm node path resolution in scripts or use a wrapper that sources nvm before execution
  2. PATH enforcement: Ensure the bot's tmux session inherits the correct PATH with nvm node first
  3. Codex runtime init: When starting the Codex runtime, explicitly set NODE / PATH in the tmux environment

Environment

  • VM default node: v18 (system)
  • nvm node: v22.22.0
  • Affected module: comm-bridge/better-sqlite3 (native addon)
  • Runtime: Codex (tmux-based)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions