fix: harden tool lifecycle and current-state docs by evelynadriaan · Pull Request #168 · tuannvm/codex-mcp-server

evelynadriaan · 2026-04-26T07:41:13Z

Summary

serialize and harden MCP tool lifecycle behavior across command-backed tools
complete timeout/abort recovery for codex, review, websearch, and help
expose and validate bypassApprovals, including resume-mode forwarding
gate startup and command logging behind explicit debug env vars
refresh API, lifecycle, session, and CLI integration docs to match current runtime behavior

Verification

npm run lint
npm run format:check
npm run build
npm test -- --runInBand

Summary by CodeRabbit

New Features
- Added bypassApprovals and timeoutMs options for codex tool calls
- Added debug toggles for startup and command logging
Updates
- Default model changed to GPT-5.4
- Tool calls serialized with improved per-request timeout handling and abort semantics
Bug Fixes
- Timeout recovery improved so subsequent tool calls continue reliably
Docs
- Expanded and reorganized documentation, CLI examples, and session/resume guidance
Tests
- New and expanded test suites covering lifecycle, timeouts, validation, and logging

coderabbitai · 2026-04-26T07:41:44Z

Walkthrough

Implements per-call timeoutMs with override, serializes tool execution via an internal queue with per-request AbortController and timeout racing, adds bypassApprovals, bumps default Codex model to gpt-5.4, refactors command execution (abort signaling, progress, gated logging), extracts runtime config, and adds extensive tests and docs.

Changes

Cohort / File(s)	Summary
Docs — Core & Navigation `CLAUDE.md`, `README.md`, `docs/api-reference.md`, `docs/codex-cli-integration.md`, `docs/TODO.md`, `docs/plan.md`, `docs/plans/2026-04-25-timeout-followups.md`, `docs/architecture-and-fixes.md`, `docs/claude-code-lifecycle-investigation.md`, `docs/codex-cli-integration.md`, `docs/session-management.md`, `docs/claude-code-lifecycle-investigation.md`	Large documentation rewrite and additions: default model changed to `gpt-5.4`, new env vars for debug, added/clarified `timeoutMs` and `bypassApprovals`, documented serialized calls, resume constraints (`sandbox`/`workingDirectory` not applied), and investigation/architecture notes.
Ignore `.gitignore`	Added `.codex` to ignored patterns.
Runtime config & bootstrap `src/runtime-config.ts`, `src/index.ts`	New runtime-config module exposing `SERVER_CONFIG`, version detection, and debug-flag helpers; `index.ts` now imports `SERVER_CONFIG`.
Server flow & timeouts `src/server.ts`	Serializes tool invocations via internal queue, creates per-request `AbortController`, enforces timeouts (global `CODEX_TOOL_TIMEOUT_MS` with per-call `timeoutMs` override for `codex`), forwards abort signals to handlers, and gates startup logging.
Tool schemas & types `src/types.ts`, `src/tools/definitions.ts`	Default model updated to `gpt-5.4`; added `bypassApprovals?: boolean` and `timeoutMs?: number`; cross-field validation rejects `bypassApprovals` with `sandbox`/`fullAuto`; `ToolHandlerContext` gains optional `abortSignal`.
Command execution utils `src/utils/command.ts`	Centralized command logic: added `signal` and `onProgress` options, streaming delegates, process-group abort (SIGTERM→SIGKILL on non-Windows), output caps, debounced progress, and conditional command logging.
Handlers & response shaping `src/tools/handlers.ts`	Introduced `CommandBackedToolHandler` base; unified execute paths using `executeCommand`/streaming, centralized result shaping (including structuredContent gating), added `bypassApprovals` CLI injection, and tightened handler typings/signatures.
Tests — new and updated `src/__tests__/*` (many files, new lifecycle/runtime-config/command-logging tests)	Extensive test additions/updates: lifecycle/integration tests, command/startup logging, runtime-config, schema and validation tests for `timeoutMs`/`bypassApprovals`, model default updates to `gpt-5.4`, dynamic imports for isolation, and added error/edge-case coverage.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant MCP as "MCP Server"
    participant Queue as "callQueue"
    participant Handler
    participant Timeout as "withTimeout"

    Client->>MCP: tools/call (args, optional timeoutMs)
    MCP->>MCP: resolve timeout (args.timeoutMs || CODEX_TOOL_TIMEOUT_MS)
    MCP->>Queue: enqueue request {handler, args, timeout}
    Queue->>MCP: dequeue next (serialized)
    MCP->>MCP: create AbortController
    par Execute handler
        MCP->>Handler: execute(args, { abortSignal })
        Handler->>Handler: run executeCommand/streaming (receives signal)
    and Monitor timeout
        Timeout->>Timeout: start timer
    end
    alt Handler completes before timeout
        Handler-->>MCP: result
        Timeout->>MCP: cancel timer
        MCP-->>Client: success result
    else Timeout fires first
        Timeout->>MCP: call abort()
        MCP->>Handler: signal abort (SIGTERM→SIGKILL)
        Handler-->>MCP: cleanup/exit (may emit stdout/stderr)
        MCP-->>Client: timeout error
    end
    MCP->>Queue: allow next dequeued request

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related issues

structuredContent disabled-by-default causes -32600 on newer Claude Code MCP clients #161: Changes centralize structured-content gating into a module-level constant in handlers, directly touching the behavior described in the issue.

Possibly related PRs

fix: set spawn cwd for workingDirectory to work around codex -C bug #138: Related changes to command execution option shapes and tests that migrate executeCommand to an options object.
feat(tools): gate structuredContent behind STRUCTURED_CONTENT_ENABLED #105: Related to structuredContent gating and tests that toggle structured content emission.
feat(model): default to gpt-5.2-codex and update model docs/options #84: Related updates to the Codex default model selection and test/doc updates.

Poem

🐰 A queue of hops, one tool at a time,

Timeouts set tidy, aborts in their prime.
Bypass the fences, model five-point-four,
Tests all green, docs updated, and more.
Hooray — small rabbit, big server chore!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 4.35% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the primary changes: hardening tool lifecycle behavior and updating documentation to reflect current state.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/tools/handlers.ts (1)
185-222: ⚠️ Potential issue | 🟠 Major

Remove fullAuto from resume mode due to Codex CLI limitation (not just flag-combination concern).

The schema already rejects bypassApprovals + fullAuto together, so the flag-combination guardrail is already in place. However, there is a more fundamental issue: according to documented Codex CLI constraints, fullAuto must not be applied in resume mode at all—only sandbox and workingDirectory are documented as constrained.

Lines 185–187 currently apply --full-auto when resuming. This should be skipped (like sandbox and workingDirectory at lines 242–245). The pattern already exists in the codebase for sandbox and workingDirectory; apply the same logic to fullAuto:
♻️ Remove fullAuto from resume mode
       if (useResume && codexConversationId) {
         // Resume mode: codex exec resume has limited flags
         // All exec options (--skip-git-repo-check, -c) must come BEFORE 'resume' subcommand
         cmdArgs = ['exec', '--skip-git-repo-check'];

         // Model must be set via -c config in resume mode (before subcommand)
         cmdArgs.push('-c', `model="${selectedModel}"`);

         // Reasoning effort via config (before subcommand)
         if (reasoningEffort) {
           cmdArgs.push('-c', `model_reasoning_effort="${reasoningEffort}"`);
         }

-        if (fullAuto) {
-          cmdArgs.push('--full-auto');
-        }

         // Bypass all approval prompts and sandboxing — use only in externally sandboxed envs
         if (bypassApprovals) {
           cmdArgs.push('--dangerously-bypass-approvals-and-sandbox');
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/handlers.ts` around lines 185 - 222, The resume branch is
incorrectly adding the --full-auto flag; update the resume-mode block that
builds cmdArgs (where it pushes 'resume', codexConversationId, enhancedPrompt)
to omit any handling of fullAuto (do not push '--full-auto' in resume mode) —
mirror the existing pattern used for sandbox and workingDirectory in that branch
so only sandbox and workingDirectory are applied for resume, and leave fullAuto
handling only in the exec branch (where cmdArgs.push('--full-auto') remains).

🧹 Nitpick comments (15)

src/__tests__/startup-logging.test.ts (2)
18-21: Consider preserving and restoring the original env var.

Reflect.deleteProperty(process.env, STARTUP_LOG_ENV_VAR) unconditionally clears the variable for every test in this suite, but it isn't restored in afterEach/afterAll. If a developer runs the suite locally with that flag set, subsequent tests in the same Jest worker (cross-suite) could observe a different env state than expected. Snapshot the original value in beforeEach and restore in afterEach for stricter isolation, mirroring the pattern used in resume-functionality.test.ts for STRUCTURED_CONTENT_ENABLED.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/__tests__/startup-logging.test.ts` around lines 18 - 21, The test setup
unconditionally deletes STARTUP_LOG_ENV_VAR in beforeEach which alters global
process.env without restoring it; change the setup to snapshot the original
value of process.env[STARTUP_LOG_ENV_VAR] at the start of beforeEach and then
restore that original value in an afterEach (mirroring the pattern used for
STRUCTURED_CONTENT_ENABLED in resume-functionality.test.ts) so tests isolate
environment changes; update beforeEach and add afterEach to use the saved
original value when restoring process.env[STARTUP_LOG_ENV_VAR].
23-38: Brittle assertion: errorSpy may catch unrelated stderr writes.

expect(errorSpy).not.toHaveBeenCalled() will fail if anything else in the startup path writes to console.error (now or in the future). Tightening to expect(errorSpy).not.toHaveBeenCalledWith(expect.stringContaining('started successfully')) keeps the intent (no startup banner) without coupling to all stderr usage.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/__tests__/startup-logging.test.ts` around lines 23 - 38, Replace the
brittle global assertion on console.error with a targeted check that it was not
called with the startup banner string: update the test using the existing
errorSpy (jest.spyOn(console, 'error')) to assert
expect(errorSpy).not.toHaveBeenCalledWith(expect.stringContaining('started
successfully')) instead of expect(errorSpy).not.toHaveBeenCalled(); keep the
rest of the setup (mocking Server.prototype.connect and instantiating
CodexMcpServer/start) unchanged so the test only verifies that the startup
message wasn't written to stderr.
src/__tests__/resume-functionality.test.ts (1)
30-43: Redundant STRUCTURED_CONTENT_ENABLED assignment.

process.env.STRUCTURED_CONTENT_ENABLED = '1' is set on line 32 (before the dynamic imports, so module-load reads see it) and then again on line 41. Unless handlers.ts mutates this variable on import, the second assignment is dead. Either drop line 41 or add a comment explaining why a re-set after construction is needed.
♻️ Proposed cleanup
   beforeEach(async () => {
     jest.resetModules();
     process.env.STRUCTURED_CONTENT_ENABLED = '1';
     ({ CodexToolHandler } = await import('../tools/handlers.js'));
     const commandModule = await import('../utils/command.js');
     mockedExecuteCommand = commandModule.executeCommand as jest.MockedFunction<
       typeof commandModule.executeCommand
     >;
     sessionStorage = new InMemorySessionStorage();
     handler = new CodexToolHandler(sessionStorage);
     mockedExecuteCommand.mockClear();
-    process.env.STRUCTURED_CONTENT_ENABLED = '1';
     delete process.env.CODEX_MCP_CALLBACK_URI;
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/__tests__/resume-functionality.test.ts` around lines 30 - 43, The
beforeEach contains a redundant second assignment to
process.env.STRUCTURED_CONTENT_ENABLED ('1'); update the setup in the test to
remove the duplicate assignment or, if a re-set is intentionally required
because an import or CodexToolHandler constructor mutates that env var, add a
brief comment explaining that intent; locate the duplicate within the beforeEach
that creates CodexToolHandler and adjust the second STRUCTURED_CONTENT_ENABLED
assignment accordingly (reference: beforeEach, CodexToolHandler,
process.env.STRUCTURED_CONTENT_ENABLED).
src/__tests__/index.test.ts (1)
68-84: Optional: assert minimum/integer constraint on timeoutMs.

The plan in docs/plans/2026-04-25-timeout-followups.md calls out rejecting 0, negatives, and non-integers for timeoutMs. This test only verifies presence; runtime/Zod-level validation is presumably covered in runtime-config.test.ts, but a quick assertion here on inputSchema.properties.timeoutMs.type === 'integer' and the documented minimum/exclusive-min would harden the public schema contract surfaced via tools/list.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/__tests__/index.test.ts` around lines 68 - 84, Add assertions to the
existing tests that the CODEX tool's public JSON schema enforces integer and
minimum constraints for timeoutMs: locate the codex tool via
toolDefinitions.find(tool => tool.name === TOOLS.CODEX) and assert on
codexTool.inputSchema.properties.timeoutMs that type equals 'integer' (or
integer-like schema) and that it has minimum/exclusiveMinimum set to reject
0/negatives (and/or exclusiveMinimum: 0 to reject 0). This ensures the public
tools/list contract for timeoutMs (symbol: timeoutMs on inputSchema.properties
for tool named TOOLS.CODEX) matches the documented plan.
src/server.ts (1)
86-118: LGTM with one caveat on the indefinite wait.

Queue serialization is correct: each tools/call chains onto this.callQueue, errors inside the .then callback are caught and reported via reject(err) (so the chain itself never rejects), and the finally await operation?.then(...) keeps the next call blocked until the in-flight operation actually settles.

One reliability caveat: if a handler's underlying child process never honors controller.abort() (e.g., it ignores SIGTERM), this await has no upper bound and the entire server stops accepting tool calls. The comment captures the intent ("Keep the queue blocked until the child process actually exits"), but consider documenting that the per-tool handler is responsible for ensuring abort eventually terminates the child (e.g., escalating to SIGKILL after a grace period in executeCommand).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server.ts` around lines 86 - 118, The finally block can block
indefinitely if a handler's child process ignores AbortController; change the
finally to wait only up to a bounded grace period instead of unbounded await on
operation?.then(...). Implement a configurable CHILD_EXIT_GRACE_MS and replace
the current await operation?.then(...) with a bounded wait (e.g., use
this.withTimeout or Promise.race) that returns after the grace period and
logs/warns if the child hasn’t exited, so this.callQueue (the chain) can
continue; reference the existing symbols: this.callQueue, handler.execute,
AbortController/createProgressContext, operation, and this.withTimeout.
docs/api-reference.md (1)
38-38: Clarify the timeoutMs default cell.

CODEX_TOOL_TIMEOUT_MS or 120000 reads as if timeoutMs itself defaults to that value. The actual semantics (per src/server.ts::getRequestTimeoutMs) are: when omitted, the request inherits the global timeout (env or 120_000). Tightening the wording avoids implying the field has its own resolved default.
📝 Proposed wording
-| `timeoutMs` | integer | No | `CODEX_TOOL_TIMEOUT_MS` or `120000` | Per-request override for this tool only |
+| `timeoutMs` | integer | No | — (falls back to `CODEX_TOOL_TIMEOUT_MS`, else `120000`) | Per-request override; positive integer, applies only to `codex` |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/api-reference.md` at line 38, The docs table entry for the timeoutMs
field is misleading: update the `timeoutMs` default description to clarify that
`timeoutMs` has no inherent default value and, if omitted, the request inherits
the global timeout (from the environment or the server fallback used in
getRequestTimeoutMs in src/server.ts, which defaults to 120_000); reference the
`timeoutMs` field and the getRequestTimeoutMs behavior and reword the cell to
something like "Optional; if omitted the request uses the global timeout (env or
120000) via getRequestTimeoutMs" so it doesn't imply timeoutMs itself defaults
to CODEX_TOOL_TIMEOUT_MS or 120000.
src/__tests__/mcp-lifecycle.test.ts (3)
11-12: Decouple the per-RPC timeout from the Jest test timeout.

TEST_TIMEOUT_MS is reused both for jest.setTimeout() and as the per-request RPC reader timeout. If a request stalls, the RPC timeout would fire at exactly the same moment Jest declares the test failed — masking which side actually timed out. Using a smaller RPC-level timeout (e.g., 8000) keeps the Jest budget as a hard outer bound.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/__tests__/mcp-lifecycle.test.ts` around lines 11 - 12, TEST_TIMEOUT_MS is
being used both for jest.setTimeout() and the per-request RPC reader timeout,
which can mask which side timed out; introduce a separate constant (e.g.,
RPC_TIMEOUT_MS = 8000) and use that for the RPC reader timeout logic while
leaving TEST_TIMEOUT_MS (10000) for jest.setTimeout(), and update any references
to the per-request timeout (search for uses of TEST_TIMEOUT_MS in RPC/read
timeout code) to use RPC_TIMEOUT_MS instead.
1-6: Consolidate the two child_process imports.

spawn and exec are imported from the same module on separate lines.
♻️ Proposed cleanup
-import { spawn } from 'child_process';
+import { exec, spawn } from 'child_process';
 import { chmodSync, existsSync, mkdtempSync, rmSync, writeFileSync } from 'fs';
 import { tmpdir } from 'os';
 import path from 'path';
 import { promisify } from 'util';
-import { exec } from 'child_process';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/__tests__/mcp-lifecycle.test.ts` around lines 1 - 6, Consolidate the
duplicate imports from 'child_process' by importing both spawn and exec in a
single import statement (replace the separate import lines that currently bring
in spawn and exec). Locate the import declarations referencing spawn and exec
and merge them into one: import { spawn, exec } from 'child_process'; ensuring
no other usage or renaming is affected (check for the separate exec import using
promisify or existing exec references).
259-694: Extract the per-test server harness to remove ~3x duplication.

The three timeout tests (kills child process on timeout, uses per-call timeout override, unblocks queued calls after a review timeout) each re-implement the spawn/stdout-parser/sendTimeoutRequest/finally-teardown scaffold with only env vars and assertions differing. Extracting a small withTimeoutServer({ env }, async (send) => { ... }) helper would significantly improve readability and make future lifecycle tests cheap to add. Equivalent scaffolding already exists for the top-level beforeAll/afterAll pair, so the helper is essentially the same shape generalized over env.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/__tests__/mcp-lifecycle.test.ts` around lines 259 - 694, Extract the
repeated spawn/stdout-parse/send/teardown scaffold into a helper function
withTimeoutServer that accepts an env override and an async callback; move logic
that creates the stub dir via createTimeoutCodexStub, spawns the child (the
current timeoutServer creation using spawn(process.execPath, [distPath],
{...})), wires stdout parsing and the timeoutPending Map/resolvers (currently
duplicated in sendTimeoutRequest and the stdout 'data' handler), sets up the
exit handler to capture timeoutExitCode/timeoutExitSignal, and performs the
finally teardown/kill+rmSync; have the helper provide the test body a
send(request) wrapper equivalent to sendTimeoutRequest and allow passing
CODEX_TOOL_TIMEOUT_MS via env and TEST_TIMEOUT_MS for timers so the three tests
(`kills child process on timeout`, `uses per-call timeout override`, `unblocks
queued calls after a review timeout`) simply call await withTimeoutServer({ env:
{...} }, async (send) => { ... }) and remove duplicate code.
src/__tests__/command-logging.test.ts (1)
47-92: Restore process.env[COMMAND_LOG_ENV_VAR] after the suite to avoid worker-level env leakage.

The second test mutates process.env but cleanup only happens in beforeEach. If this is the last test in the worker process or the worker is reused, the '1' value can outlive the suite and influence unrelated tests that read this env var.
♻️ Proposed cleanup
   beforeEach(() => {
     jest.restoreAllMocks();
     mockedSpawn.mockReset();
     Reflect.deleteProperty(process.env, COMMAND_LOG_ENV_VAR);
   });
+
+  afterAll(() => {
+    Reflect.deleteProperty(process.env, COMMAND_LOG_ENV_VAR);
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/__tests__/command-logging.test.ts` around lines 47 - 92, The test suite
mutates process.env[COMMAND_LOG_ENV_VAR] in the 'logs command execution details
when explicitly enabled' test and only relies on beforeEach to clear it, which
can leak to other tests; add a teardown that restores/deletes
process.env[COMMAND_LOG_ENV_VAR] after the suite (e.g., an afterAll or
afterEach) so the environment variable is always removed/reset once tests
complete, ensuring COMMAND_LOG_ENV_VAR is not left set between worker reuse and
that executeCommand and the tests using console.error behave deterministically.
src/tools/handlers.ts (2)
42-44: Inconsistent gating model: module-level vs per-call env evaluation.

STRUCTURED_CONTENT_ENABLED is read once at import time, while the sibling logging toggles in src/runtime-config.ts (isStartupLoggingEnabled, isCommandLoggingEnabled) are evaluated per call. That asymmetry means tests (and any future runtime toggling, e.g., reload via SIGHUP) need jest.resetModules() here but not for the logging flags. Either move this through a small helper in runtime-config.ts (isStructuredContentEnabled()) for parity, or document why structured-content gating is intentionally frozen at import time.
♻️ Proposed alignment

In src/runtime-config.ts:
+export const STRUCTURED_CONTENT_ENV_VAR = 'STRUCTURED_CONTENT_ENABLED' as const;
+
+export function isStructuredContentEnabled(
+  env: Record<string, string | undefined> = process.env
+): boolean {
+  return isTruthyDebugFlag(env[STRUCTURED_CONTENT_ENV_VAR]);
+}
In src/tools/handlers.ts:
-const STRUCTURED_CONTENT_ENABLED = ['1', 'true', 'yes', 'on'].includes(
-  (process.env.STRUCTURED_CONTENT_ENABLED ?? '').toLowerCase()
-);
+import { isStructuredContentEnabled } from '../runtime-config.js';
@@
 class CommandBackedToolHandler {
-  protected readonly structuredContentEnabled = STRUCTURED_CONTENT_ENABLED;
+  protected get structuredContentEnabled(): boolean {
+    return isStructuredContentEnabled();
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/handlers.ts` around lines 42 - 44, The module-level constant
STRUCTURED_CONTENT_ENABLED is evaluated once at import time causing asymmetry
with per-call flags isStartupLoggingEnabled and isCommandLoggingEnabled; change
to a per-call accessor: add isStructuredContentEnabled() in runtime-config.ts
that reads and normalizes process.env.STRUCTURED_CONTENT_ENABLED and then
replace usages of STRUCTURED_CONTENT_ENABLED in src/tools/handlers.ts with calls
to isStructuredContentEnabled(); alternatively, if freezing at import is
intentional, add a brief comment explaining the reason and expected behavior and
update tests to account for the frozen value.
244-254: Redundant signal in cmdOptions.

executeCodexCommand already merges options?.signal ?? context.abortSignal (L68–71), so explicitly setting signal: toolContext.abortSignal here just re-states the fallback. The ReviewToolHandler and WebSearchToolHandler paths correctly omit it; aligning CodexToolHandler keeps a single source of truth for abort wiring.
♻️ Minor cleanup
       const cmdOptions = {
         cwd: useResume ? undefined : resolvedWorkDir,
         envOverride,
-        signal: toolContext.abortSignal,
       };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/handlers.ts` around lines 244 - 254, The cmdOptions in
CodexToolHandler redundantly sets signal: toolContext.abortSignal even though
executeCodexCommand already merges options?.signal ?? context.abortSignal;
remove the signal property from the cmdOptions object (the const cmdOptions used
before calling this.executeCodexCommand) so the abort wiring is handled in one
place, matching the behavior of ReviewToolHandler and WebSearchToolHandler and
avoiding duplicate fallback logic.
src/runtime-config.ts (1)
24-39: Silent '0.0.0' fallback can mask real packaging/path issues.

If package.json cannot be located (e.g., when process.argv[1] doesn't sit under the repo and cwd is unrelated, like under some test runners or npx invocations), getServerVersion swallows the error and SERVER_CONFIG.version becomes '0.0.0' with no signal. That's fine as a runtime fallback, but consider emitting a single guarded warning (gated behind isStartupLoggingEnabled()) to make misconfiguration debuggable without changing default behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/runtime-config.ts` around lines 24 - 39, getServerVersion currently
swallows all errors and returns '0.0.0' silently; update its catch block to,
when isStartupLoggingEnabled() returns true, emit a single guarded warning
including the entryScriptPath and the caught error details (e.g., error.message
or stack) so misconfigured package.json lookups are visible at startup. Keep the
runtime fallback behavior unchanged (still return '0.0.0'), and reference the
existing getServerVersion function and SERVER_CONFIG constant when making this
change.
src/utils/command.ts (1)
260-275: Doc comment slightly overstates the delegation.

The JSDoc says "delegating to executeCommand with onProgress set", but executeCommandStreaming only forwards whatever options the caller passed — including the case where onProgress is undefined, in which case there is no streaming. Either tighten the wording (e.g., "delegating to executeCommand with the caller-provided onProgress") or assert options.onProgress here. Purely a documentation nit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/command.ts` around lines 260 - 275, The JSDoc for the
executeCommandStreaming function incorrectly implies that onProgress is always
set when delegating to executeCommand, but in reality options.onProgress can be
undefined. To fix this, update the JSDoc to clarify that the function delegates
to executeCommand with the caller-provided onProgress callback rather than
always setting it, or alternatively modify executeCommandStreaming to ensure
options.onProgress is always set before calling executeCommand.
docs/codex-cli-integration.md (1)
49-55: Consider adding a note in the flags documentation that --full-auto and --dangerously-bypass-approvals-and-sandbox are mutually exclusive.

The Codex CLI enforces this constraint and will return a parse error if both flags are passed together. Since the server independently forwards these parameters when enabled, users who set both fullAuto and bypassApprovals will encounter a CLI error. A brief caveat at lines 52–53 (and in the resume notes at lines 76–77) explaining the mutual exclusivity would clarify the expected behavior upfront and prevent confusion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/codex-cli-integration.md` around lines 49 - 55, Add a brief caveat to
the flags documentation noting that the CLI flags --full-auto and
--dangerously-bypass-approvals-and-sandbox are mutually exclusive (the CLI will
return a parse error if both are supplied); also mirror this warning in the
resume notes where fullAuto and bypassApprovals are referenced so users
understand that setting both server-side fields (fullAuto and bypassApprovals)
will cause the CLI to fail when it forwards parameters.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/api-reference.md`:
- Around line 114-123: Update the session metadata docs to match the API:
replace the `turns` field with the integer `turnCount` used by the
`listSessions` response (as implemented in the listSessions handler in
src/tools/handlers.ts) and update any example JSON and descriptive text in
docs/session-management.md to reference `turnCount` (an integer count) instead
of an array `turns`.

In `@src/utils/command.ts`:
- Around line 212-225: The close handler registered on child (child.on('close',
...)) currently always calls options.onProgress with a "[Completed]" message
when stdout or stderr exists, which incorrectly signals completion for
killed/aborted runs; update that handler to check options.signal?.aborted (or
equivalent aborted flag) before emitting "[Completed]" and either skip emitting
a progress update entirely or emit "[Aborted] ..." instead, preserving the
existing behavior for non-aborted runs and leaving the isStreaming, stdout,
stderr and commandLoggingEnabled logic unchanged.

---

Outside diff comments:
In `@src/tools/handlers.ts`:
- Around line 185-222: The resume branch is incorrectly adding the --full-auto
flag; update the resume-mode block that builds cmdArgs (where it pushes
'resume', codexConversationId, enhancedPrompt) to omit any handling of fullAuto
(do not push '--full-auto' in resume mode) — mirror the existing pattern used
for sandbox and workingDirectory in that branch so only sandbox and
workingDirectory are applied for resume, and leave fullAuto handling only in the
exec branch (where cmdArgs.push('--full-auto') remains).

---

Nitpick comments:
In `@docs/api-reference.md`:
- Line 38: The docs table entry for the timeoutMs field is misleading: update
the `timeoutMs` default description to clarify that `timeoutMs` has no inherent
default value and, if omitted, the request inherits the global timeout (from the
environment or the server fallback used in getRequestTimeoutMs in src/server.ts,
which defaults to 120_000); reference the `timeoutMs` field and the
getRequestTimeoutMs behavior and reword the cell to something like "Optional; if
omitted the request uses the global timeout (env or 120000) via
getRequestTimeoutMs" so it doesn't imply timeoutMs itself defaults to
CODEX_TOOL_TIMEOUT_MS or 120000.

In `@docs/codex-cli-integration.md`:
- Around line 49-55: Add a brief caveat to the flags documentation noting that
the CLI flags --full-auto and --dangerously-bypass-approvals-and-sandbox are
mutually exclusive (the CLI will return a parse error if both are supplied);
also mirror this warning in the resume notes where fullAuto and bypassApprovals
are referenced so users understand that setting both server-side fields
(fullAuto and bypassApprovals) will cause the CLI to fail when it forwards
parameters.

In `@src/__tests__/command-logging.test.ts`:
- Around line 47-92: The test suite mutates process.env[COMMAND_LOG_ENV_VAR] in
the 'logs command execution details when explicitly enabled' test and only
relies on beforeEach to clear it, which can leak to other tests; add a teardown
that restores/deletes process.env[COMMAND_LOG_ENV_VAR] after the suite (e.g., an
afterAll or afterEach) so the environment variable is always removed/reset once
tests complete, ensuring COMMAND_LOG_ENV_VAR is not left set between worker
reuse and that executeCommand and the tests using console.error behave
deterministically.

In `@src/__tests__/index.test.ts`:
- Around line 68-84: Add assertions to the existing tests that the CODEX tool's
public JSON schema enforces integer and minimum constraints for timeoutMs:
locate the codex tool via toolDefinitions.find(tool => tool.name ===
TOOLS.CODEX) and assert on codexTool.inputSchema.properties.timeoutMs that type
equals 'integer' (or integer-like schema) and that it has
minimum/exclusiveMinimum set to reject 0/negatives (and/or exclusiveMinimum: 0
to reject 0). This ensures the public tools/list contract for timeoutMs (symbol:
timeoutMs on inputSchema.properties for tool named TOOLS.CODEX) matches the
documented plan.

In `@src/__tests__/mcp-lifecycle.test.ts`:
- Around line 11-12: TEST_TIMEOUT_MS is being used both for jest.setTimeout()
and the per-request RPC reader timeout, which can mask which side timed out;
introduce a separate constant (e.g., RPC_TIMEOUT_MS = 8000) and use that for the
RPC reader timeout logic while leaving TEST_TIMEOUT_MS (10000) for
jest.setTimeout(), and update any references to the per-request timeout (search
for uses of TEST_TIMEOUT_MS in RPC/read timeout code) to use RPC_TIMEOUT_MS
instead.
- Around line 1-6: Consolidate the duplicate imports from 'child_process' by
importing both spawn and exec in a single import statement (replace the separate
import lines that currently bring in spawn and exec). Locate the import
declarations referencing spawn and exec and merge them into one: import { spawn,
exec } from 'child_process'; ensuring no other usage or renaming is affected
(check for the separate exec import using promisify or existing exec
references).
- Around line 259-694: Extract the repeated spawn/stdout-parse/send/teardown
scaffold into a helper function withTimeoutServer that accepts an env override
and an async callback; move logic that creates the stub dir via
createTimeoutCodexStub, spawns the child (the current timeoutServer creation
using spawn(process.execPath, [distPath], {...})), wires stdout parsing and the
timeoutPending Map/resolvers (currently duplicated in sendTimeoutRequest and the
stdout 'data' handler), sets up the exit handler to capture
timeoutExitCode/timeoutExitSignal, and performs the finally
teardown/kill+rmSync; have the helper provide the test body a send(request)
wrapper equivalent to sendTimeoutRequest and allow passing CODEX_TOOL_TIMEOUT_MS
via env and TEST_TIMEOUT_MS for timers so the three tests (`kills child process
on timeout`, `uses per-call timeout override`, `unblocks queued calls after a
review timeout`) simply call await withTimeoutServer({ env: {...} }, async
(send) => { ... }) and remove duplicate code.

In `@src/__tests__/resume-functionality.test.ts`:
- Around line 30-43: The beforeEach contains a redundant second assignment to
process.env.STRUCTURED_CONTENT_ENABLED ('1'); update the setup in the test to
remove the duplicate assignment or, if a re-set is intentionally required
because an import or CodexToolHandler constructor mutates that env var, add a
brief comment explaining that intent; locate the duplicate within the beforeEach
that creates CodexToolHandler and adjust the second STRUCTURED_CONTENT_ENABLED
assignment accordingly (reference: beforeEach, CodexToolHandler,
process.env.STRUCTURED_CONTENT_ENABLED).

In `@src/__tests__/startup-logging.test.ts`:
- Around line 18-21: The test setup unconditionally deletes STARTUP_LOG_ENV_VAR
in beforeEach which alters global process.env without restoring it; change the
setup to snapshot the original value of process.env[STARTUP_LOG_ENV_VAR] at the
start of beforeEach and then restore that original value in an afterEach
(mirroring the pattern used for STRUCTURED_CONTENT_ENABLED in
resume-functionality.test.ts) so tests isolate environment changes; update
beforeEach and add afterEach to use the saved original value when restoring
process.env[STARTUP_LOG_ENV_VAR].
- Around line 23-38: Replace the brittle global assertion on console.error with
a targeted check that it was not called with the startup banner string: update
the test using the existing errorSpy (jest.spyOn(console, 'error')) to assert
expect(errorSpy).not.toHaveBeenCalledWith(expect.stringContaining('started
successfully')) instead of expect(errorSpy).not.toHaveBeenCalled(); keep the
rest of the setup (mocking Server.prototype.connect and instantiating
CodexMcpServer/start) unchanged so the test only verifies that the startup
message wasn't written to stderr.

In `@src/runtime-config.ts`:
- Around line 24-39: getServerVersion currently swallows all errors and returns
'0.0.0' silently; update its catch block to, when isStartupLoggingEnabled()
returns true, emit a single guarded warning including the entryScriptPath and
the caught error details (e.g., error.message or stack) so misconfigured
package.json lookups are visible at startup. Keep the runtime fallback behavior
unchanged (still return '0.0.0'), and reference the existing getServerVersion
function and SERVER_CONFIG constant when making this change.

In `@src/server.ts`:
- Around line 86-118: The finally block can block indefinitely if a handler's
child process ignores AbortController; change the finally to wait only up to a
bounded grace period instead of unbounded await on operation?.then(...).
Implement a configurable CHILD_EXIT_GRACE_MS and replace the current await
operation?.then(...) with a bounded wait (e.g., use this.withTimeout or
Promise.race) that returns after the grace period and logs/warns if the child
hasn’t exited, so this.callQueue (the chain) can continue; reference the
existing symbols: this.callQueue, handler.execute,
AbortController/createProgressContext, operation, and this.withTimeout.

In `@src/tools/handlers.ts`:
- Around line 42-44: The module-level constant STRUCTURED_CONTENT_ENABLED is
evaluated once at import time causing asymmetry with per-call flags
isStartupLoggingEnabled and isCommandLoggingEnabled; change to a per-call
accessor: add isStructuredContentEnabled() in runtime-config.ts that reads and
normalizes process.env.STRUCTURED_CONTENT_ENABLED and then replace usages of
STRUCTURED_CONTENT_ENABLED in src/tools/handlers.ts with calls to
isStructuredContentEnabled(); alternatively, if freezing at import is
intentional, add a brief comment explaining the reason and expected behavior and
update tests to account for the frozen value.
- Around line 244-254: The cmdOptions in CodexToolHandler redundantly sets
signal: toolContext.abortSignal even though executeCodexCommand already merges
options?.signal ?? context.abortSignal; remove the signal property from the
cmdOptions object (the const cmdOptions used before calling
this.executeCodexCommand) so the abort wiring is handled in one place, matching
the behavior of ReviewToolHandler and WebSearchToolHandler and avoiding
duplicate fallback logic.

In `@src/utils/command.ts`:
- Around line 260-275: The JSDoc for the executeCommandStreaming function
incorrectly implies that onProgress is always set when delegating to
executeCommand, but in reality options.onProgress can be undefined. To fix this,
update the JSDoc to clarify that the function delegates to executeCommand with
the caller-provided onProgress callback rather than always setting it, or
alternatively modify executeCommandStreaming to ensure options.onProgress is
always set before calling executeCommand.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e4905f6c-3d1e-4252-a1d2-6757e14b5f32

📥 Commits

Reviewing files that changed from the base of the PR and between a57acb6 and f9231fc.

📒 Files selected for processing (30)

.gitignore
CLAUDE.md
README.md
docs/TODO.md
docs/api-reference.md
docs/architecture-and-fixes.md
docs/claude-code-lifecycle-investigation.md
docs/codex-cli-integration.md
docs/plan.md
docs/plans/2026-04-25-timeout-followups.md
docs/session-management.md
src/__tests__/command-logging.test.ts
src/__tests__/context-building.test.ts
src/__tests__/default-model.test.ts
src/__tests__/edge-cases.test.ts
src/__tests__/error-scenarios.test.ts
src/__tests__/index.test.ts
src/__tests__/mcp-lifecycle.test.ts
src/__tests__/model-selection.test.ts
src/__tests__/resume-functionality.test.ts
src/__tests__/runtime-config.test.ts
src/__tests__/startup-logging.test.ts
src/__tests__/working-directory.test.ts
src/index.ts
src/runtime-config.ts
src/server.ts
src/tools/definitions.ts
src/tools/handlers.ts
src/types.ts
src/utils/command.ts

coderabbitai · 2026-04-26T07:52:08Z

    child.on('close', (code) => {
-      // Send final progress if there's any remaining output
      if (options.onProgress && (stdout || stderr)) {
        const finalOutput = stdout || stderr;
-        const lastChunk = finalOutput.slice(-500); // Last 500 chars
+        const lastChunk = finalOutput.slice(-500);
        if (lastChunk.trim()) {
          options.onProgress(
            `[Completed] ${lastChunk.trim().slice(0, 200)}...`
          );
        }
      }

+      if (!isStreaming && stderr && commandLoggingEnabled) {
+        console.error(chalk.yellow('Command stderr:'), stderr);
+      }


⚠️ Potential issue | 🟡 Minor

[Completed] progress is still emitted after an aborted/killed run.

When options.signal triggers and the child is killed, the close handler still fires and, because options.onProgress && (stdout || stderr) is typically true, calls options.onProgress with [Completed] …. For an aborted/timed-out run that's a misleading "completed" signal to upstream MCP progress consumers. Consider checking options.signal?.aborted here and either skipping the message or reporting [Aborted] instead.

♻️ Suggested guard

- if (options.onProgress && (stdout || stderr)) { + if ( + options.onProgress && + (stdout || stderr) && + !options.signal?.aborted + ) { const finalOutput = stdout || stderr; const lastChunk = finalOutput.slice(-500); if (lastChunk.trim()) { options.onProgress( `[Completed] ${lastChunk.trim().slice(0, 200)}...` ); } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

child.on('close', (code) => {

// Send final progress if there's any remaining output

if (options.onProgress && (stdout || stderr)) {

const finalOutput = stdout || stderr;

const lastChunk = finalOutput.slice(-500); // Last 500 chars

const lastChunk = finalOutput.slice(-500);

if (lastChunk.trim()) {

options.onProgress(

`[Completed] ${lastChunk.trim().slice(0, 200)}...`

);

}

}

if (!isStreaming && stderr && commandLoggingEnabled) {

console.error(chalk.yellow('Command stderr:'), stderr);

}

child.on('close', (code) => {

if (

options.onProgress &&

(stdout || stderr) &&

!options.signal?.aborted

) {

const finalOutput = stdout || stderr;

const lastChunk = finalOutput.slice(-500);

if (lastChunk.trim()) {

options.onProgress(

`[Completed] ${lastChunk.trim().slice(0, 200)}...`

);

}

}

if (!isStreaming && stderr && commandLoggingEnabled) {

console.error(chalk.yellow('Command stderr:'), stderr);

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/command.ts` around lines 212 - 225, The close handler registered on child (child.on('close', ...)) currently always calls options.onProgress with a "[Completed]" message when stdout or stderr exists, which incorrectly signals completion for killed/aborted runs; update that handler to check options.signal?.aborted (or equivalent aborted flag) before emitting "[Completed]" and either skip emitting a progress update entirely or emit "[Aborted] ..." instead, preserving the existing behavior for non-aborted runs and leaving the isStreaming, stdout, stderr and commandLoggingEnabled logic unchanged.

- serialize and harden MCP tool lifecycle behavior across command-backed tools - complete timeout/abort recovery for codex, review, websearch, and help - expose and validate bypassApprovals, including resume-mode forwarding - gate startup and command logging behind explicit debug env vars - refresh API, lifecycle, session, and CLI integration docs to match current runtime behavior - fix: session-management.md: correct `turns` → `turnCount` (matches listSessions response) - fix: command.ts: skip [Completed] progress notification when signal is aborted Signed-off-by: evelynadriaan <[email protected]>

- docs/session-management.md: correct `turns` → `turnCount` to match actual listSessions response field (handlers.ts line 407) - src/utils/command.ts: skip [Completed] progress notification when signal is aborted, avoiding misleading output after timeout/kill Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

src/utils/command.ts (1)
212-219: ⚠️ Potential issue | 🟡 Minor

Don't emit [Completed] after an abort.

The close handler still reports completion even when options.signal has already been aborted, so timed-out/killed runs can look successful to MCP progress consumers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/command.ts` around lines 212 - 219, The close handler
(child.on('close')) currently emits a "[Completed]" progress update even when
the run was aborted; modify the close-handler logic in the function handling
child so it checks options.signal?.aborted (or an internal aborted flag set by
an options.signal 'abort' listener) and return early if aborted, so
options.onProgress is not called for aborted/killed runs; update references in
the same handler where stdout/stderr/finalOutput/lastChunk are used to only call
options.onProgress when not aborted.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/codex-cli-integration.md`:
- Around line 7-9: Replace the literal example that shows passing an API key on
the command line (the "codex login --api-key \"...\"" text) with guidance to use
the interactive "codex login" flow or a safer secret input/credential store
(e.g., environment variable or OS keyring) and update adjacent references like
"codex exec resume" and "codex review" to assume the user is already
authenticated via that safer method; ensure the docs explicitly warn against
shell-argument API keys and show the preferred command (e.g., "codex login" or
instructions to set CODEx_API_KEY in a secure place) instead of the current
`--api-key` example.

In `@src/server.ts`:
- Around line 173-186: Change the bare timer calls and their type annotation to
use globalThis: update the declaration of timeoutHandle from ReturnType<typeof
setTimeout> | undefined to ReturnType<typeof globalThis.setTimeout> | undefined,
replace setTimeout(...) with globalThis.setTimeout(...) when creating
timeoutPromise, and replace clearTimeout(timeoutHandle) with
globalThis.clearTimeout(timeoutHandle); keep the rest of the logic around
timeoutPromise, controller, timeoutMs and operation intact.

In `@src/tools/handlers.ts`:
- Around line 56-58: getCommandResponse currently returns stdout first which
hides meaningful stderr output; update the function (getCommandResponse and its
CommandResult handling) to prefer stderr and merge stderr before stdout (e.g.,
if stderr exists return stderr plus any stdout appended, otherwise return
stdout, otherwise fallback) so stderr-first responses are preserved for all
command-backed tools.

---

Duplicate comments:
In `@src/utils/command.ts`:
- Around line 212-219: The close handler (child.on('close')) currently emits a
"[Completed]" progress update even when the run was aborted; modify the
close-handler logic in the function handling child so it checks
options.signal?.aborted (or an internal aborted flag set by an options.signal
'abort' listener) and return early if aborted, so options.onProgress is not
called for aborted/killed runs; update references in the same handler where
stdout/stderr/finalOutput/lastChunk are used to only call options.onProgress
when not aborted.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 64f28494-e38c-48f1-b8ea-c9c8598412d2

📥 Commits

Reviewing files that changed from the base of the PR and between f9231fc and 30d8935.

📒 Files selected for processing (30)

.gitignore
CLAUDE.md
README.md
docs/TODO.md
docs/api-reference.md
docs/architecture-and-fixes.md
docs/claude-code-lifecycle-investigation.md
docs/codex-cli-integration.md
docs/plan.md
docs/plans/2026-04-25-timeout-followups.md
docs/session-management.md
src/__tests__/command-logging.test.ts
src/__tests__/context-building.test.ts
src/__tests__/default-model.test.ts
src/__tests__/edge-cases.test.ts
src/__tests__/error-scenarios.test.ts
src/__tests__/index.test.ts
src/__tests__/mcp-lifecycle.test.ts
src/__tests__/model-selection.test.ts
src/__tests__/resume-functionality.test.ts
src/__tests__/runtime-config.test.ts
src/__tests__/startup-logging.test.ts
src/__tests__/working-directory.test.ts
src/index.ts
src/runtime-config.ts
src/server.ts
src/tools/definitions.ts
src/tools/handlers.ts
src/types.ts
src/utils/command.ts

✅ Files skipped from review due to trivial changes (9)

.gitignore
docs/plan.md
src/tests/edge-cases.test.ts
docs/plans/2026-04-25-timeout-followups.md
docs/architecture-and-fixes.md
README.md
src/tests/startup-logging.test.ts
src/runtime-config.ts
src/tests/mcp-lifecycle.test.ts

🚧 Files skipped from review as they are similar to previous changes (12)

src/index.ts
src/tests/runtime-config.test.ts
docs/TODO.md
docs/claude-code-lifecycle-investigation.md
src/tests/index.test.ts
src/tests/error-scenarios.test.ts
CLAUDE.md
src/tools/definitions.ts
docs/session-management.md
src/tests/command-logging.test.ts
src/tests/working-directory.test.ts
src/tests/model-selection.test.ts

coderabbitai · 2026-04-28T20:51:37Z

+- Recommended minimum: Codex CLI `v0.75.0+`
+- Authentication: `codex login --api-key "..."`
+- This server expects modern `codex exec resume`, `codex review`, sandbox, and


⚠️ Potential issue | 🟠 Major

Avoid documenting API keys directly in shell arguments.

Using codex login --api-key "..." encourages secret exposure via shell history and process inspection. Prefer documenting interactive codex login (or a safer secret input flow) instead.

🔐 Suggested doc tweak

-- Authentication: `codex login --api-key "..."` +- Authentication: run `codex login` (interactive) to avoid placing API keys in shell history.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- Recommended minimum: Codex CLI `v0.75.0+`

- Authentication: `codex login --api-key "..."`

- This server expects modern `codex exec resume`, `codex review`, sandbox, and

- Recommended minimum: Codex CLI `v0.75.0+`

- Authentication: run `codex login` (interactive) to avoid placing API keys in shell history.

- This server expects modern `codex exec resume`, `codex review`, sandbox, and

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/codex-cli-integration.md` around lines 7 - 9, Replace the literal example that shows passing an API key on the command line (the "codex login --api-key \"...\"" text) with guidance to use the interactive "codex login" flow or a safer secret input/credential store (e.g., environment variable or OS keyring) and update adjacent references like "codex exec resume" and "codex review" to assume the user is already authenticated via that safer method; ensure the docs explicitly warn against shell-argument API keys and show the preferred command (e.g., "codex login" or instructions to set CODEx_API_KEY in a secure place) instead of the current `--api-key` example.

coderabbitai · 2026-04-28T20:51:37Z

+        return await new Promise<ToolResult>((resolve, reject) => {
+          this.callQueue = this.callQueue.then(async () => {
+            let operation: Promise<ToolResult> | undefined;

-        const handler = toolHandlers[name];
-        const context = createProgressContext();
-        return await handler.execute(args, context);
+            try {
+              if (!this.isValidToolName(name)) {
+                throw new Error(`Unknown tool: ${name}`);
+              }
+
+              const handler = toolHandlers[name];
+              const controller = new AbortController();
+              const context = createProgressContext(controller.signal);
+              const timeoutMs = this.getRequestTimeoutMs(name, args);
+              operation = handler.execute(args, context);
+
+              resolve(
+                await this.withTimeout(
+                  operation,
+                  timeoutMs,
+                  controller
+                )
+              );
+            } catch (err) {
+              reject(err);
+            } finally {
+              // Keep the queue blocked until the child process actually exits.
+              await operation?.then(
+                () => undefined,
+                () => undefined
+              );
+            }
+          });
+        });


⚠️ Potential issue | 🟠 Major

Start the timeout before enqueueing the request.

timeoutMs and the AbortController are created only after the call reaches the head of this.callQueue, so a queued request can wait indefinitely behind another tool call and still get its full timeout once execution starts. That means the configured timeout does not cap end-to-end tool latency.

coderabbitai · 2026-04-28T20:51:37Z

+    let timeoutHandle: ReturnType<typeof setTimeout> | undefined;
+
+    try {
+      const timeoutPromise = new Promise<never>((_resolve, reject) => {
+        timeoutHandle = setTimeout(() => {
+          controller?.abort();
+          reject(new Error(`Tool call timed out after ${timeoutMs}ms`));
+        }, timeoutMs);
+      });
+
+      return await Promise.race([operation, timeoutPromise]);
+    } finally {
+      if (timeoutHandle) {
+        clearTimeout(timeoutHandle);


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the current src/server.ts around lines 173-186 head -190 src/server.ts | tail -30

Repository: tuannvm/codex-mcp-server

Length of output: 767

🏁 Script executed:

# Check src/utils/command.ts for globalThis usage patterns cat -n src/utils/command.ts | grep -A 2 -B 2 "globalThis\|setTimeout\|clearTimeout"

Repository: tuannvm/codex-mcp-server

Length of output: 410

🏁 Script executed:

# Check ESLint configuration if [ -f .eslintrc.json ]; then cat .eslintrc.json; elif [ -f .eslintrc.js ]; then cat .eslintrc.js; elif [ -f .eslintrc.yml ]; then cat .eslintrc.yml; elif [ -f eslintrc.json ]; then cat eslintrc.json; fi

Repository: tuannvm/codex-mcp-server

Length of output: 50

🏁 Script executed:

# Check tsconfig.json for lib configuration cat tsconfig.json

Repository: tuannvm/codex-mcp-server

Length of output: 557

🏁 Script executed:

# Search for all timer usage patterns in src directory rg "setTimeout|clearTimeout" src/ -A 1 -B 1

Repository: tuannvm/codex-mcp-server

Length of output: 3740

🏁 Script executed:

# Check package.json for ESLint configuration cat package.json | grep -A 30 "eslint"

Repository: tuannvm/codex-mcp-server

Length of output: 1100

🏁 Script executed:

# Check if .eslintrc is hidden or elsewhere find . -name ".eslintrc*" -o -name "eslint.config.*" 2>/dev/null | head -20

Repository: tuannvm/codex-mcp-server

Length of output: 290

🏁 Script executed:

# Verify the ReturnType for globalThis.setTimeout in TypeScript node -e "const x: ReturnType<typeof globalThis.setTimeout> = globalThis.setTimeout(() => {}, 100); console.log('Type check passes')" 2>&1 || echo "Type check result above"

Repository: tuannvm/codex-mcp-server

Length of output: 86

🏁 Script executed:

# Check if there are any npm lint scripts that might reveal ESLint setup grep -A 5 '"lint"' package.json

Repository: tuannvm/codex-mcp-server

Length of output: 276

🏁 Script executed:

# Check eslint.config.mjs cat eslint.config.mjs

Repository: tuannvm/codex-mcp-server

Length of output: 1932

🏁 Script executed:

# Also check if there's a jest.config or similar that might affect linting cat jest.config.js 2>/dev/null || cat jest.config.ts 2>/dev/null || echo "No jest config file found"

Repository: tuannvm/codex-mcp-server

Length of output: 94

Use globalThis for timer functions to pass ESLint checks.

Lines 177 and 186 use bare setTimeout and clearTimeout, which are not defined as globals in the ESLint configuration and will trigger no-undef errors. Prefixing with globalThis aligns with the pattern used throughout the codebase (src/utils/command.ts and all test files already follow this convention). Additionally, update the type annotation on line 173 from ReturnType<typeof setTimeout> to ReturnType<typeof globalThis.setTimeout>.

Proposed fix

- let timeoutHandle: ReturnType<typeof setTimeout> | undefined; + let timeoutHandle: ReturnType<typeof globalThis.setTimeout> | undefined; @@ - timeoutHandle = setTimeout(() => { + timeoutHandle = globalThis.setTimeout(() => { controller?.abort(); reject(new Error(`Tool call timed out after ${timeoutMs}ms`)); }, timeoutMs); @@ - clearTimeout(timeoutHandle); + globalThis.clearTimeout(timeoutHandle);

🧰 Tools

🪛 ESLint

[error] 173-173: 'setTimeout' is not defined.

(no-undef)

[error] 177-177: 'setTimeout' is not defined.

(no-undef)

[error] 186-186: 'clearTimeout' is not defined.

(no-undef)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/server.ts` around lines 173 - 186, Change the bare timer calls and their type annotation to use globalThis: update the declaration of timeoutHandle from ReturnType<typeof setTimeout> | undefined to ReturnType<typeof globalThis.setTimeout> | undefined, replace setTimeout(...) with globalThis.setTimeout(...) when creating timeoutPromise, and replace clearTimeout(timeoutHandle) with globalThis.clearTimeout(timeoutHandle); keep the rest of the logic around timeoutPromise, controller, timeoutMs and operation intact.

coderabbitai · 2026-04-28T20:51:37Z

+function getCommandResponse(result: CommandResult, fallback: string): string {
+  return result.stdout || result.stderr || fallback;
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "handlers.ts" -type f 2>/dev/null | head -20

Repository: tuannvm/codex-mcp-server

Length of output: 50

🏁 Script executed:

git ls-files | grep -E "(handlers|command)" | head -20

Repository: tuannvm/codex-mcp-server

Length of output: 149

🏁 Script executed:

# Read the handlers.ts file around lines 56-58 and surrounding context cat -n src/tools/handlers.ts | sed -n '40,100p'

Repository: tuannvm/codex-mcp-server

Length of output: 2136

🏁 Script executed:

# Search for getCommandResponse usage to understand impact rg "getCommandResponse" -A 2 -B 2

Repository: tuannvm/codex-mcp-server

Length of output: 1456

🏁 Script executed:

# Check if there's any stderr/stdout handling elsewhere in the file rg "stderr|stdout" src/tools/handlers.ts -A 2 -B 2

Repository: tuannvm/codex-mcp-server

Length of output: 1154

Preserve stderr-first output instead of discarding it.

getCommandResponse() returns stdout whenever it is non-empty, so incidental stdout noise will hide the actual Codex response written to stderr. This affects every command-backed tool routed through this helper. The issue is inconsistent with how the codebase handles thread ID extraction elsewhere (lines 270–273), which correctly merges stderr before stdout.

♻️ Proposed fix

function getCommandResponse(result: CommandResult, fallback: string): string { - return result.stdout || result.stderr || fallback; + const output = [result.stderr, result.stdout] + .map((chunk) => chunk.trim()) + .filter(Boolean) + .join('\n'); + + return output || fallback; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/tools/handlers.ts` around lines 56 - 58, getCommandResponse currently returns stdout first which hides meaningful stderr output; update the function (getCommandResponse and its CommandResult handling) to prefer stderr and merge stderr before stdout (e.g., if stderr exists return stderr plus any stdout appended, otherwise return stdout, otherwise fallback) so stderr-first responses are preserved for all command-backed tools.

coderabbitai Bot reviewed Apr 26, 2026

View reviewed changes

evelynadriaan force-pushed the upstream-pr/harden-tool-lifecycle branch from f9231fc to 30d8935 Compare April 28, 2026 20:43

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden tool lifecycle and current-state docs#168

fix: harden tool lifecycle and current-state docs#168
evelynadriaan wants to merge 1 commit intotuannvm:mainfrom
evelynadriaan:upstream-pr/harden-tool-lifecycle

evelynadriaan commented Apr 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 26, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 26, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Uh oh!

coderabbitai Bot Apr 28, 2026

Uh oh!

coderabbitai Bot Apr 28, 2026

Uh oh!

coderabbitai Bot Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

evelynadriaan commented Apr 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

evelynadriaan commented Apr 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 26, 2026 •

edited

Loading