Skip to content

fix(cli): show inference health in sandbox status output#2002

Open
ericksoa wants to merge 2 commits intomainfrom
fix/995-status-inference-health
Open

fix(cli): show inference health in sandbox status output#2002
ericksoa wants to merge 2 commits intomainfrom
fix/995-status-inference-health

Conversation

@ericksoa
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa commented Apr 17, 2026

Summary

  • Adds remote provider health probing to nemoclaw <name> status so all providers (not just local) show an Inference line
  • Local probing (vllm-local, ollama-local) already worked — this fills the gap for remote providers (nvidia-prod, openai-api, anthropic-prod, gemini-api)
  • Creates a unified probeProviderHealth() dispatcher in new inference-health.ts module that handles both local and remote providers
  • Remote probes use lightweight reachability checks (any HTTP response including 401/403 = reachable, no API keys sent)
  • compatible-* providers show "not probed" since their endpoint URLs aren't known

Fixes #995

Test plan

  • 23 new unit tests in inference-health.test.ts covering endpoint mapping, reachability semantics, timeouts, and unified dispatch
  • All 1832 existing tests continue to pass
  • Manual: nemoclaw <sandbox> status with a remote provider shows new Inference line
  • Manual: nemoclaw <sandbox> status with a local provider output is unchanged

Summary by CodeRabbit

  • New Features

    • Added unified health checking for inference providers, supporting both local and remote provider monitoring.
    • Enhanced status reporting with more granular health states and detailed diagnostics.
  • Tests

    • Added comprehensive test coverage for inference provider health probing and endpoint configuration.

sandboxStatus() already probed local providers (vllm-local, ollama-local)
but showed no Inference line for remote providers. Add a unified
probeProviderHealth() dispatcher that performs lightweight reachability
checks for remote cloud endpoints (nvidia-prod, openai-api, anthropic-prod,
gemini-api) and a "not probed" fallback for compatible-* providers whose
URLs are unknown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

Introduces a unified inference-provider health-probing layer with functions to map providers to endpoints, probe remote providers via curl with timeout control, and delegate between local and remote probing strategies. Includes comprehensive test coverage and integrates the new health-probing API into nemoclaw status reporting.

Changes

Cohort / File(s) Summary
Health Probing Infrastructure
src/lib/inference-health.ts, src/lib/inference-health.test.ts
New unified health-probing module exporting probeProviderHealth, probeRemoteProviderHealth, and getRemoteProviderHealthEndpoint. Implements provider-to-endpoint mapping, curl-based reachability checking with 3s connect and 5s max timeouts, and delegation logic. Treats HTTP 401/403 as reachable. Special-cases compatible endpoints as "not probed". Comprehensive test suite validates provider mapping, curl integration, probe outcomes for reachable/unreachable cases, and timeout/error handling.
Integration & Usage
src/nemoclaw.ts
Updated sandboxStatus to replace local-only health probing with unified probeProviderHealth call. Enhanced Inference: status reporting to distinguish three states: "not probed" (when probed: false), "healthy" (when ok: true), and "unreachable" (when ok: false), with detail output on probe failure.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant probeProviderHealth
    participant probeLocalProviderHealth
    participant probeRemoteProviderHealth
    participant getRemoteProviderHealthEndpoint
    participant curl as runCurlProbeImpl<br/>(curl probe)

    Caller->>probeProviderHealth: provider, options
    
    alt Local Provider
        probeProviderHealth->>probeLocalProviderHealth: Attempt local probe
        probeLocalProviderHealth-->>probeProviderHealth: ProviderHealthStatus | null
    else Remote Provider
        probeProviderHealth->>probeRemoteProviderHealth: Delegate to remote
        probeRemoteProviderHealth->>getRemoteProviderHealthEndpoint: Map provider to endpoint
        getRemoteProviderHealthEndpoint-->>probeRemoteProviderHealth: endpoint URL | null
        
        alt Compatible Endpoint
            probeRemoteProviderHealth-->>probeProviderHealth: {probed: false, ok: true}
        else Remote Endpoint Found
            probeRemoteProviderHealth->>curl: curl with timeouts + endpoint
            curl-->>probeRemoteProviderHealth: CurlProbeResult
            probeRemoteProviderHealth-->>probeProviderHealth: {probed: true, ok: boolean}
        end
    else Unknown Provider
        probeProviderHealth-->>Caller: null
    end
    
    probeProviderHealth-->>Caller: ProviderHealthStatus | null
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A probe hops through the provider maze,
Checking endpoints in curious ways,
Local or remote, it finds the right path,
Curl whispers secrets, health in its grasp,
Now nemoclaw knows when all's okay! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding inference health visibility to the sandbox status command output.
Linked Issues check ✅ Passed The PR addresses issue #995 by implementing unified provider health probing for both local and remote providers, improving visibility of inference backend health through the status command.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing inference health probing: new test suite, health probing module, and integration into status command. No unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/995-status-inference-health

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/nemoclaw.ts (1)

1214-1226: Extract inference rendering to keep sandboxStatus complexity in check.

Line 1200’s function is already complexity-suppressed, and this new branch block adds more decision paths. Consider moving this rendering logic to a small helper.

♻️ Proposed refactor
+function printInferenceHealthStatus(inferenceHealth) {
+  if (!inferenceHealth) return;
+  if (!inferenceHealth.probed) {
+    console.log(`    Inference: ${D}not probed${R} (${inferenceHealth.detail})`);
+    return;
+  }
+  if (inferenceHealth.ok) {
+    console.log(`    Inference: ${G}healthy${R} (${inferenceHealth.endpoint})`);
+    return;
+  }
+  console.log(`    Inference: ${_RD}unreachable${R} (${inferenceHealth.endpoint})`);
+  console.log(`      ${inferenceHealth.detail}`);
+}
...
-    if (inferenceHealth) {
-      if (!inferenceHealth.probed) {
-        console.log(`    Inference: ${D}not probed${R} (${inferenceHealth.detail})`);
-      } else if (inferenceHealth.ok) {
-        console.log(
-          `    Inference: ${G}healthy${R} (${inferenceHealth.endpoint})`,
-        );
-      } else {
-        console.log(
-          `    Inference: ${_RD}unreachable${R} (${inferenceHealth.endpoint})`,
-        );
-        console.log(`      ${inferenceHealth.detail}`);
-      }
-    }
+    printInferenceHealthStatus(inferenceHealth);

As per coding guidelines, **/*.{js,ts,tsx,jsx}: Limit cyclomatic complexity to 20 in JavaScript/TypeScript files, with target of 15.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nemoclaw.ts` around lines 1214 - 1226, The inference rendering block
inside the sandboxStatus function is increasing cyclomatic complexity; extract
it into a small helper named something like
renderInferenceHealth(inferenceHealth) that takes the existing inferenceHealth
object and the color constants (D, R, G, _RD) and returns or prints the exact
same lines (handle !probed, ok, and unreachable cases including detail and
endpoint) and replace the inline branch in sandboxStatus with a single call to
that helper to preserve behavior and reduce complexity.
src/lib/inference-health.ts (1)

92-95: Prefer not probed over null for recognized-but-unmapped providers.

If a provider is recognized by config but missing endpoint mapping, returning null drops the Inference line entirely. Returning a probed: false status is safer and keeps output stable as providers evolve.

♻️ Proposed refactor
   const endpoint = getRemoteProviderHealthEndpoint(provider);
   if (!endpoint) {
-    return null;
+    if (config) {
+      return {
+        ok: true,
+        probed: false,
+        providerLabel,
+        endpoint: "",
+        detail: "Health probe endpoint is not defined for this provider.",
+      };
+    }
+    return null;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/inference-health.ts` around lines 92 - 95, The current code in
inference-health.ts calls getRemoteProviderHealthEndpoint(provider) and returns
null when endpoint is missing, which removes the provider from output; change
the behavior so that when endpoint is falsy you return an object indicating the
provider is recognized but not probed (e.g., { provider, probed: false, status:
'not probed' } or matching the existing Inference/Health shape) instead of null.
Update the branch that checks `if (!endpoint)` (the code referencing endpoint
from getRemoteProviderHealthEndpoint) to construct and return the non-probed
status object so downstream consumers still see the provider entry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/lib/inference-health.ts`:
- Around line 92-95: The current code in inference-health.ts calls
getRemoteProviderHealthEndpoint(provider) and returns null when endpoint is
missing, which removes the provider from output; change the behavior so that
when endpoint is falsy you return an object indicating the provider is
recognized but not probed (e.g., { provider, probed: false, status: 'not probed'
} or matching the existing Inference/Health shape) instead of null. Update the
branch that checks `if (!endpoint)` (the code referencing endpoint from
getRemoteProviderHealthEndpoint) to construct and return the non-probed status
object so downstream consumers still see the provider entry.

In `@src/nemoclaw.ts`:
- Around line 1214-1226: The inference rendering block inside the sandboxStatus
function is increasing cyclomatic complexity; extract it into a small helper
named something like renderInferenceHealth(inferenceHealth) that takes the
existing inferenceHealth object and the color constants (D, R, G, _RD) and
returns or prints the exact same lines (handle !probed, ok, and unreachable
cases including detail and endpoint) and replace the inline branch in
sandboxStatus with a single call to that helper to preserve behavior and reduce
complexity.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b11baa0d-7e08-4b7b-a922-78b0e2db8c65

📥 Commits

Reviewing files that changed from the base of the PR and between 56ee83f and efd5a8f.

📒 Files selected for processing (3)
  • src/lib/inference-health.test.ts
  • src/lib/inference-health.ts
  • src/nemoclaw.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[nemoclaw] [MacOS] No clear error message when Ollama backend is stopped

1 participant