Conversation
📝 WalkthroughWalkthroughThe PR updates Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (11)
nemoclaw-blueprint/policies/presets/docker-proxy.yaml-36-36 (1)
36-36:⚠️ Potential issue | 🟡 MinorDuplicate rule:
POST /v*/containers/**appears twice.Line 36 and line 44 both specify the same allow rule. This is redundant and should be removed.
🔧 Proposed fix
- allow: { method: GET, path: "/v*/networks/**" } - allow: { method: GET, path: "/v*/volumes/**" } - allow: { method: GET, path: "/v*/events" } - - allow: { method: POST, path: "/v*/containers/**" } binaries:Also applies to: 44-44
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemoclaw-blueprint/policies/presets/docker-proxy.yaml` at line 36, Remove the duplicated policy rule entry '- allow: { method: POST, path: "/v*/containers/**" }' so the POST /v*/containers/** allow rule appears only once in the docker-proxy preset; locate both identical allow entries and delete the redundant occurrence, leaving the single canonical rule intact.bin/nemoclaw.js-602-609 (1)
602-609:⚠️ Potential issue | 🟡 MinorMissing input validation for
--portargument.The
--portvalue is passed directly to the shell without validating it's a valid port number. A malicious or accidental non-numeric value could cause unexpected behavior.🛡️ Proposed fix to add port validation
if (args.includes("--port")) { const idx = args.indexOf("--port"); - if (args[idx + 1]) envParts.push(`DASHBOARD_PORT=${shellQuote(args[idx + 1])}`); + const portVal = args[idx + 1]; + if (portVal && /^\d+$/.test(portVal) && +portVal > 0 && +portVal <= 65535) { + envParts.push(`DASHBOARD_PORT=${shellQuote(portVal)}`); + } else if (portVal) { + console.error(` Invalid port: ${portVal} (must be 1-65535)`); + process.exit(1); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/nemoclaw.js` around lines 602 - 609, Validate the value passed for --port before adding it to envParts: in the block that reads args/indexOf("--port") and pushes DASHBOARD_PORT using shellQuote, parse and verify the next token is a numeric integer within the valid TCP port range (1–65535); if it’s invalid, either throw/exit with a clear error or skip adding the env var and log an error; keep the same shellQuote usage for the safe value when valid. Use the same symbols: args, idx, envParts, shellQuote and the DASHBOARD_PORT handling to locate where to add validation.Dockerfile.base-65-68 (1)
65-68:⚠️ Potential issue | 🟡 MinorPin xvfb, scrot, and x11-utils to exact Debian bookworm versions.
The file header (lines 17–18) states "apt packages — pinned to exact Debian bookworm versions," but
xvfb,scrot, andx11-utilsat lines 66–68 are installed without version pins. This breaks the stated policy and risks reproducibility issues. Add version pins matching the pattern of other packages in this RUN command:xvfb=2:21.1.7-3+deb12u11 \ scrot=1.8.1-1 \ x11-utils=7.7+5 \Also pin
docker-ce-cliat line 83 (currently unpinned from the Docker official repository).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile.base` around lines 65 - 68, Update the RUN apt-get install line to pin the virtual framebuffer and X utilities by appending exact Debian bookworm versions for xvfb, scrot, and x11-utils (use xvfb=2:21.1.7-3+deb12u11, scrot=1.8.1-1, x11-utils=7.7+5) so they match the existing package pinning pattern, and likewise pin docker-ce-cli to its exact version in the Docker official repo; modify the package tokens xvfb, scrot, x11-utils, and docker-ce-cli to include the specified =version strings where they are installed so the build is reproducible..agents/skills/nemoclaw-gpu-sandbox/SKILL.md-1-4 (1)
1-4:⚠️ Potential issue | 🟡 MinorMissing SPDX license header.
As per coding guidelines, Markdown files require an HTML comment SPDX header at the top.
📝 Proposed fix
+<!-- +SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +SPDX-License-Identifier: Apache-2.0 +--> + --- name: nemoclaw-gpu-sandbox🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/skills/nemoclaw-gpu-sandbox/SKILL.md around lines 1 - 4, The SKILL.md file is missing the required SPDX license HTML comment header; add an SPDX identifier comment as the first line of .agents/skills/nemoclaw-gpu-sandbox/SKILL.md (above the YAML frontmatter starting with ---) using the project's standard header (for example <!-- SPDX-License-Identifier: Apache-2.0 -->) so the file begins with the SPDX HTML comment followed by the existing frontmatter/name/description block.Dockerfile.sandbox-ai-1-17 (1)
1-17:⚠️ Potential issue | 🟡 MinorMissing SPDX license header.
As per coding guidelines, all source files require an SPDX header. For Dockerfiles, use
#comments.📝 Proposed fix
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# # NemoClaw AI sandbox image — CUDA-enabled for Python AI/ML workloads.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile.sandbox-ai` around lines 1 - 17, Add a Dockerfile SPDX header as a comment at the very top of Dockerfile.sandbox-ai (before any other comments) using the format "# SPDX-License-Identifier: Apache-2.0" (or replace Apache-2.0 with the repo's chosen identifier), ensuring the header is a single-line comment so the Dockerfile passes the project's licensing check..agents/skills/nemoclaw-docker-proxy/SKILL.md-1-4 (1)
1-4:⚠️ Potential issue | 🟡 MinorMissing SPDX license header.
As per coding guidelines, Markdown files require an HTML comment SPDX header at the top.
📝 Proposed fix
+<!-- +SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +SPDX-License-Identifier: Apache-2.0 +--> + --- name: nemoclaw-docker-proxy🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/skills/nemoclaw-docker-proxy/SKILL.md around lines 1 - 4, Add an HTML comment SPDX license header to the very top of the SKILL.md document (the one whose frontmatter has name: nemoclaw-docker-proxy) so the file begins with a valid SPDX identifier in an HTML comment (for example include "SPDX-License-Identifier: MIT" inside the comment); ensure the comment is the first line before the YAML frontmatter and matches the project's chosen SPDX identifier.scripts/docker-proxy.js-14-18 (1)
14-18:⚠️ Potential issue | 🟡 MinorComment contradicts the allowlist.
Line 16 states "No exec, no attach" but line 106 allows
POST /containers/{id}/attach. The attach endpoint is necessary fordocker run -itfunctionality, so either the comment should be updated or the allowlist reconsidered.📝 Suggested comment fix
// SECURITY MODEL: // - Only a hard-coded allowlist of Docker API operations is forwarded. // - Container-create requests are inspected; Privileged mode, host network, // CAP_ADD, and mounts of sensitive host paths are rejected. -// - No exec, no attach, no build — agents can run pre-built images only. +// - No exec, no build — agents can run pre-built images only. +// - Attach is allowed for interactive container sessions (docker run -it). // - No token auth here; OpenShell network policy is gate 1 (sandbox traffic // only), this allowlist is gate 2.Also applies to: 106-106
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/docker-proxy.js` around lines 14 - 18, The top-of-file comment in docker-proxy.js contradicts the allowlist: it says "No exec, no attach, no build" but the proxy allows POST /containers/{id}/attach; update the comment to reflect the actual allowlist by removing or revising the "no attach" statement (or explicitly note that /containers/{id}/attach is permitted for interactive runs) so the comment matches the implemented allowlist (reference the "No exec, no attach, no build" comment block and the POST /containers/{id}/attach endpoint).scripts/resume.sh-54-57 (1)
54-57:⚠️ Potential issue | 🟡 MinorDuplicate example line in usage text.
Line 55 and 56 show identical examples.
📝 Suggested fix
echo "Examples:" echo " $0 my-assistant" - echo " $0 my-assistant" + echo " DASHBOARD_PORT=8080 $0 my-assistant" echo ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/resume.sh` around lines 54 - 57, The usage text prints two identical example lines; remove the duplicate echo or replace the second line with a different example so the output shows only unique examples (update the two echo " $0 my-assistant" lines in the usage section to a single example or a distinct second example that uses "$0" appropriately).scripts/setup-autostart.sh-79-80 (1)
79-80:⚠️ Potential issue | 🟡 MinorNon-portable
grep -oPmay fail on macOS.The
-P(Perl regex) flag is not available in BSD grep (macOS default). Consider using a more portable alternative.🐛 Portable alternative using sed
-GATEWAY_NAME=$(openshell gateway info 2>&1 | grep -oP 'Gateway:\s+\K\S+' || echo "nemoclaw") +GATEWAY_NAME=$(openshell gateway info 2>&1 | sed -n 's/.*Gateway:[[:space:]]*\([^[:space:]]*\).*/\1/p' | head -1) +GATEWAY_NAME="${GATEWAY_NAME:-nemoclaw}"Or using awk:
GATEWAY_NAME=$(openshell gateway info 2>&1 | awk '/Gateway:/ {print $2; exit}') GATEWAY_NAME="${GATEWAY_NAME:-nemoclaw}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/setup-autostart.sh` around lines 79 - 80, The current extraction of GATEWAY_NAME uses non-portable grep -oP which can fail on macOS; replace the grep -oP usage in the GATEWAY_NAME assignment with a portable command (e.g., awk or sed) that parses the "Gateway:" line and defaults to "nemoclaw" if empty, leaving GATEWAY_CTR="openshell-cluster-${GATEWAY_NAME}" unchanged; update the GATEWAY_NAME assignment (the line that currently calls openshell gateway info | grep -oP) to use the portable parser and preserve the existing fallback behavior.bin/lib/sandbox-init.js-276-292 (1)
276-292:⚠️ Potential issue | 🟡 MinorPolicy preset failures are silently marked as successful.
Based on context from
bin/lib/policies.js,applyPresetreturnsfalsewhen a preset is not found (instead of throwing). The current code only catches exceptions but doesn't check the return value, so a missing preset will show a green checkmark.🐛 Suggested fix to check return value
- const policiestoApply = skipGithub + const policiesToApply = skipGithub ? [...allPolicies] : ["github", ...allPolicies]; - if (policiestoApply.length === 0) { + if (policiesToApply.length === 0) { console.log(" • No policies to apply (--no-github and no --policy flags)"); } else { - for (const preset of [...new Set(policiestoApply)]) { + for (const preset of [...new Set(policiesToApply)]) { process.stdout.write(` • ${preset} … `); try { - policies.applyPreset(sandboxName, preset); - console.log("\x1b[32m✓\x1b[0m"); + const success = policies.applyPreset(sandboxName, preset); + if (success) { + console.log("\x1b[32m✓\x1b[0m"); + } else { + console.log("\x1b[33m⚠\x1b[0m (skipped)"); + console.warn(` Preset '${preset}' not found or invalid`); + } } catch (err) { console.log("\x1b[33m⚠\x1b[0m (skipped)"); console.warn(` ${err.message}`); } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/sandbox-init.js` around lines 276 - 292, The loop treating policies.applyPreset(sandboxName, preset) as successful on non-throwing failures must be fixed: check the return value of policies.applyPreset (called with sandboxName and preset) and if it returns false treat it as skipped (print the yellow "⚠ (skipped)" and console.warn with the error/diagnostic), otherwise print the green check; keep the existing try/catch to handle thrown errors but ensure the success branch only runs when applyPreset returns a truthy success value.scripts/docker-proxy.js-163-181 (1)
163-181:⚠️ Potential issue | 🟡 MinorMount validation may miss top-level
Mountsarray.The Docker API allows mounts to be specified at
spec.Mounts(top-level) in addition tospec.HostConfig.Mounts. The current validation only checksHostConfig.Mounts.🐛 Add check for top-level Mounts
// Also check Mounts (new-style) const mounts = spec.HostConfig?.Mounts || []; for (const mount of mounts) { if (mount.Type === "bind") { const source = mount.Source || ""; if (source.startsWith("/") && isBlockedMount(source)) { return `Mount of host path '${source}' is not allowed`; } } } + // Also check top-level Mounts (used by some API versions) + const topLevelMounts = spec.Mounts || []; + for (const mount of topLevelMounts) { + if (mount.Type === "bind") { + const source = mount.Source || ""; + if (source.startsWith("/") && isBlockedMount(source)) { + return `Mount of host path '${source}' is not allowed`; + } + } + } + return null; // safe }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/docker-proxy.js` around lines 163 - 181, Validation currently only inspects spec.HostConfig.Mounts and can miss a top-level spec.Mounts array; update the mount validation in scripts/docker-proxy.js to also iterate spec.Mounts (in addition to HostConfig.Mounts) and apply the same checks (Type === "bind", inspect mount.Source, call isBlockedMount) so bind mounts declared at the top level are blocked just like HostConfig.Mounts; reuse the existing variables/naming (binds, mounts, mount.Source, isBlockedMount) and return the same error strings when a blocked path is detected.
🧹 Nitpick comments (30)
bin/nemoclaw.js (2)
516-520: Consider extracting the inline Python command.The kubectl exec with embedded Python for reading the auth token is functional but hard to read. Consider extracting this to a helper function or using a simpler approach like
jqif available in the sandbox.🔧 Suggested refactor using jq (if available)
- const token = runCapture( - `openshell doctor exec -- kubectl exec ${sandboxName} -n openshell -- ` + - `bash -c "HOME=/sandbox python3 -c \\"import json; cfg=json.load(open('/sandbox/.openclaw/openclaw.json')); print(cfg.get('gateway',{}).get('auth',{}).get('token',''))\\""`, - { ignoreError: true } - ) || null; + const token = runCapture( + `openshell doctor exec -- kubectl exec ${sandboxName} -n openshell -- ` + + `bash -c "HOME=/sandbox jq -r '.gateway.auth.token // empty' /sandbox/.openclaw/openclaw.json 2>/dev/null"`, + { ignoreError: true } + ) || null;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/nemoclaw.js` around lines 516 - 520, Extract the inline Python kubectl command used to populate token into a small helper function (e.g., getSandboxAuthToken) that takes sandboxName and returns the token, or switch to a simpler jq-based command if jq is available in the sandbox; replace the long inline command passed to runCapture with a call to that helper, keep the runCapture invocation and ignoreError option, and ensure the helper handles missing file/field cases and returns null on failure so the token assignment (const token = ...) remains unchanged.
652-664: Help text formatting issue on line 653.Line 653 combines multiple commands on one line, making it harder to read. The alignment appears inconsistent with other help entries.
📝 Suggested formatting fix
${G}Sandbox Management:${R} - ${B}nemoclaw list${R} List all sandboxes ${B}nemoclaw add-gpu-agent <name>${R} Create a GPU-enabled agent + register in parent dashboard nemoclaw <name> connect Shell into a running sandbox + ${B}nemoclaw list${R} List all sandboxes + ${B}nemoclaw add-gpu-agent <name>${R} Create a GPU-enabled agent + register in parent + nemoclaw <name> connect Shell into a running sandbox🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/nemoclaw.js` around lines 652 - 664, The help text line containing multiple commands ("nemoclaw list", "nemoclaw add-gpu-agent <name>", "nemoclaw <name> connect") is compressed onto one line and misaligned with other entries; split these into separate lines so each command has its own entry, match the spacing/column alignment used elsewhere, and preserve the color tokens (${G}, ${B}, ${D}, ${R}) and surrounding padding; update the block that renders "Sandbox Management" so entries like "nemoclaw list", "nemoclaw add-gpu-agent <name>", and "nemoclaw <name> connect" are each on their own aligned line consistent with "Data & Filesystem" formatting.docs/deployment/run-multiple-gpu-agents.md (2)
122-123: Unusual relative path in troubleshooting section.The path
./../../../.agents/skills/nemoclaw-gpu-sandbox/SKILL.mduses an unusual upward traversal pattern. Per docs guidelines, documentation source files should be edited underdocs/directory. Consider linking to a docs page instead of an internal skill file, or use a simpler relative path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/deployment/run-multiple-gpu-agents.md` around lines 122 - 123, In docs/deployment/run-multiple-gpu-agents.md replace the unusual upward-traversal link target "./../../../.agents/skills/nemoclaw-gpu-sandbox/SKILL.md" with a docs-friendly target (e.g., a new or existing docs page under docs/ such as "docs/agents/gpu-sandbox.md" or a simpler relative path from this file), update the link text if needed, and ensure the referenced page exists and is added to the docs index/TOC so the troubleshooting link resolves correctly.
23-24: Multiple sentences on single line.Per the docs style guide, each sentence should be on its own line for readable diffs.
📝 Suggested fix
-NemoClaw supports running multiple GPU-accelerated agent sandboxes on a single physical GPU using NVIDIA device plugin **time-slicing**. -Each sandbox gets its own `nvidia.com/gpu` allocation and sees the full GPU (with shared access at the hardware scheduler level). +NemoClaw supports running multiple GPU-accelerated agent sandboxes on a single physical GPU using NVIDIA device plugin **time-slicing**. +Each sandbox gets its own `nvidia.com/gpu` allocation and sees the full GPU (with shared access at the hardware scheduler level).(Split into separate lines in the source)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/deployment/run-multiple-gpu-agents.md` around lines 23 - 24, The two sentences in the paragraph ("NemoClaw supports running multiple GPU-accelerated agent sandboxes on a single physical GPU using NVIDIA device plugin **time-slicing**." and "Each sandbox gets its own `nvidia.com/gpu` allocation and sees the full GPU (with shared access at the hardware scheduler level).") must be split onto separate lines in the markdown source so each sentence occupies its own line for cleaner diffs; update the text block containing those sentences so the first sentence is on one line and the second sentence starts on the next line.Dockerfile.base (1)
83-83:docker-ce-cliis not version-pinned.For reproducibility, consider pinning
docker-ce-clito a specific version (e.g.,docker-ce-cli=5:27.5.1-1~debian.12~bookworm). The comment mentions 29.x compatibility but the package could auto-upgrade to a newer major version on rebuild.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile.base` at line 83, The Dockerfile installs docker-ce-cli without a pinned version (see the apt-get install token docker-ce-cli); update the apt-get install command to pin docker-ce-cli to a specific package version (e.g., docker-ce-cli=5:27.5.1-1~debian.12~bookworm) and update the accompanying comment to document the chosen major-version compatibility (e.g., "pin to 27.x for reproducible builds"); ensure the pinned version exists in the configured APT repos (or add the appropriate repository) and include the version pin in any subsequent rebuild instructions so the image remains reproducible.extensions/docker-proxy/index.ts (2)
136-140: Empty catch block silently swallows errors.The container removal is best-effort, but logging the error would aid debugging when cleanup fails unexpectedly.
♻️ Proposed improvement
// Auto-remove after collecting output try { await docker.containerRemove(containerId); - } catch { - // best-effort cleanup + } catch (err) { + // best-effort cleanup — log but don't fail + api.logger.warn(`[docker_run] Failed to remove container ${shortId}: ${err}`); }Note: This requires capturing
apiin scope or passingloggerto the execute function.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@extensions/docker-proxy/index.ts` around lines 136 - 140, The empty catch around docker.containerRemove(containerId) swallows errors; change it to log the failure instead: catch the error and call the available logger (or capture/pass api to get api.logger) with a clear message including containerId and the error; update the execute function signature or closure to accept a logger/api if necessary so execute can call logger.error when docker.containerRemove fails for best-effort cleanup.
64-68: Consider using a more specific type for PluginApi.The
PluginApitype usesunknownandRecord<string, unknown>extensively. If OpenClaw provides type definitions, importing them would improve type safety and IDE support.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@extensions/docker-proxy/index.ts` around lines 64 - 68, The PluginApi type is too generic; replace the widespread use of unknown/Record<string, unknown> by importing or defining concrete types from OpenClaw (or your app) and applying them to PluginApi's fields: use the library's PluginConfig type for pluginConfig, a Logger interface for logger with proper method signatures and optional metadata, and a typed RegisterTool signature for registerTool (e.g., ToolDefinition / RegisterOptions) instead of Record<string, unknown>; update the type alias name PluginApi to reference these imported/defined interfaces so IDEs and TS checks can validate usages of pluginConfig, logger, and registerTool.scripts/mount-sandbox.sh (2)
160-170:find -printfis GNU-specific.The
-printfoption is not available in BSDfind(macOS). If cross-platform support is needed, usefind ... -exec basename {} \;orls -1instead.♻️ Proposed portable alternative
if mountpoint -q "$mountpoint" 2>/dev/null; then info "Mounted at ${mountpoint}" echo " Contents:" - find "$mountpoint" -maxdepth 1 -mindepth 1 -printf ' %f\n' 2>/dev/null + ls -1 "$mountpoint" 2>/dev/null | sed 's/^/ /' else🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/mount-sandbox.sh` around lines 160 - 170, The do_status function uses GNU-only find -printf which breaks on BSD/macOS; update the listing command inside do_status (the find "$mountpoint" ... -printf call) to a portable alternative such as using find "$mountpoint" -maxdepth 1 -mindepth 1 -exec basename {} \; or piping find to xargs -n1 basename or simply using ls -1 "$mountpoint" to produce the same indented " name" output so the function works cross-platform.
51-70: Potential race condition when appending to SSH config.Multiple concurrent invocations could produce duplicate or corrupted config entries. The
grepcheck and append are not atomic. Consider using a lock file for the SSH config modification.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/mount-sandbox.sh` around lines 51 - 70, The ensure_ssh_config function has a race between the grep check and appending to ${HOME}/.ssh/config which can produce duplicates or corruption under concurrent runs; to fix it, serialize access with a filesystem lock (e.g., use flock or a dedicated lockfile) around the entire check-and-append section in ensure_ssh_config so the host existence test and the append are performed atomically, perform the write to a temporary file and move it into place (preserving chmod 0600) while still holding the lock, and ensure the lock is released on all exit paths; reference ensure_ssh_config, the host variable (${SSH_HOST_PREFIX}${sandbox}), and config (${HOME}/.ssh/config) when making the change.Dockerfile.sandbox-ai (2)
73-75: Missing--no-install-recommendsfor Node.js installation.The NodeSource setup script may pull in recommended packages that increase image size unnecessarily.
♻️ Proposed fix
RUN curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \ - && apt-get install -y nodejs \ + && apt-get install -y --no-install-recommends nodejs \ && rm -rf /var/lib/apt/lists/*🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile.sandbox-ai` around lines 73 - 75, The RUN instruction installs Node.js with apt-get without suppressing recommended packages; update the RUN command that pipes NodeSource setup and calls apt-get (the line containing "apt-get install -y nodejs") to use apt-get install -y --no-install-recommends nodejs so recommended packages are not pulled in, keeping the rest of the chain (curl setup and rm -rf /var/lib/apt/lists/*) unchanged.
164-179:NEMOCLAW_PROVIDER_KEYin ARG/ENV may expose sensitive data in image layers.Trivy flagged this as a potential secret exposure. While the value appears to be
"nvidia"(a provider identifier, not an API key), the variable name suggests it could be confused with actual credentials. Consider renaming toNEMOCLAW_PROVIDER_IDto clarify it's not a secret.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile.sandbox-ai` around lines 164 - 179, The ARG/ENV name NEMOCLAW_PROVIDER_KEY likely triggers secret scanners; rename it to NEMOCLAW_PROVIDER_ID to clarify it's an identifier, not a credential: update the ARG declaration (replace NEMOCLAW_PROVIDER_KEY), the corresponding ENV entry, and any references elsewhere (e.g., build-time uses or scripts reading process.env.NEMOCLAW_PROVIDER_KEY) to use NEMOCLAW_PROVIDER_ID so the value "nvidia" is treated as a non-secret provider id.scripts/resume-all-sandboxes.sh (1)
68-76: Python JSON parser silently fails on errors.The
|| trueat line 76 suppresses all Python errors, including malformed JSON or missing keys. Ifsandboxes.jsonis corrupted, the script proceeds with zero sandboxes and reports success. Consider logging a warning when Python parsing fails.♻️ Proposed fix to log parsing failures
done < <(python3 -c " import json, sys -d = json.load(open('${SANDBOXES_FILE}')) -sbs = d.get('sandboxes', {}) -if isinstance(sbs, dict): - for sb in sbs.values(): print(sb.get('name','')) -else: - for sb in sbs: print(sb.get('name','')) -" 2>/dev/null || true) +try: + d = json.load(open('${SANDBOXES_FILE}')) + sbs = d.get('sandboxes', {}) + if isinstance(sbs, dict): + for sb in sbs.values(): print(sb.get('name','')) + else: + for sb in sbs: print(sb.get('name','')) +except Exception as e: + import sys + print(f'[warning] Failed to parse sandboxes.json: {e}', file=sys.stderr) +" 2>&1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/resume-all-sandboxes.sh` around lines 68 - 76, The Python JSON parsing invocation using python3 -c (which reads ${SANDBOXES_FILE}) is currently silenced by "2>/dev/null || true", so parsing errors are ignored; remove the "|| true", capture the python command's exit status and any stderr, and if it fails log a warning that includes the error and file name (reference SANDBOXES_FILE and the python3 -c invocation) and exit or skip with a non-zero status as appropriate; ensure the script prints the parsed sandbox names only on successful parse and propagate failures instead of silently continuing.scripts/start-services.sh (1)
193-209: Consider a readiness check instead of fixed sleep.The
sleep 1after startingkubectl port-forwardis a timing assumption. If the port-forward takes longer to establish, the relay may fail to connect. Consider polling for port availability.♻️ Proposed improvement
docker exec -d "$_gw_container" \ kubectl port-forward pod/"${SANDBOX_NAME}" "${DASHBOARD_PORT}:${DASHBOARD_PORT}" \ -n openshell --address 0.0.0.0 >/dev/null 2>&1 || true - sleep 1 + # Wait for port-forward to become ready (up to 5 seconds) + for _ in $(seq 1 10); do + if docker exec "$_gw_container" nc -z 127.0.0.1 "${DASHBOARD_PORT}" 2>/dev/null; then + break + fi + sleep 0.5 + done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/start-services.sh` around lines 193 - 209, Replace the fixed sleep 1 with a readiness poll that waits for the forwarded dashboard port to be accepting connections before launching start_service gateway-relay; specifically, after the docker exec ... kubectl port-forward command, implement a loop that tests the port (e.g., using nc -z, timeout + nc, curl --max-time, or bash /dev/tcp/"$_gw_container_ip"/"$DASHBOARD_PORT") with a short sleep and an overall timeout, and only proceed to call start_service gateway-relay when the port is reachable (otherwise log an error and abort); update the block around the kubectl port-forward and the start_service gateway-relay invocation to perform this readiness check instead of relying on sleep 1.scripts/nemoclaw-start.sh (1)
314-325: Minor: PID file ownership may be overly restrictive.The PID file is set to
root:rootwith mode600, which prevents the sandbox user from reading it. If any cleanup logic needs to read the Xvfb PID as the sandbox user, this would fail. Consider whether644would be more appropriate for a read-only PID file, or if the current restriction is intentional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/nemoclaw-start.sh` around lines 314 - 325, The PID file /tmp/xvfb.pid is currently owned root:root with mode 600 which prevents the sandbox user from reading it; change the post-start permissions so the sandbox user can read the PID (either make it world-readable or make sandbox the owner). Concretely, replace the chown/chmod pair for /tmp/xvfb.pid (the lines using chown root:root /tmp/xvfb.pid and chmod 600 /tmp/xvfb.pid) with either chown root:root /tmp/xvfb.pid && chmod 644 /tmp/xvfb.pid or chown sandbox:sandbox /tmp/xvfb.pid && chmod 600 /tmp/xvfb.pid depending on whether you want root to remain owner; update any cleanup logic that assumes the file is unreadable accordingly.test/sandbox-add-gpu-agent.test.js (3)
142-154: Dead code: firstrunCaptureassignment is immediately overwritten.Line 144 assigns
runner.runCapture = makeDefaultCaptureMock()but line 146 immediately overwrites it with a new mock, making the first assignment unused.♻️ Remove the dead assignment
describe("GPU time-slicing", () => { it("skips time-slicing when GPU count is already > 1", async () => { - // Default mock returns "4" for kubectl get nodes - runner.runCapture = makeDefaultCaptureMock(); - // Sandbox list: cortana ready, jarvis NOT in list yet (so we can create it) + // Sandbox list: cortana ready, jarvis NOT in list yet (so we can create it) + // Mock returns "4" for kubectl get nodes (triggers skip path) runner.runCapture = vi.fn().mockImplementation((cmd) => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/sandbox-add-gpu-agent.test.js` around lines 142 - 154, The test contains a dead assignment where runner.runCapture is first set to makeDefaultCaptureMock() and then immediately overwritten; remove the initial assignment to avoid confusion and ensure the test only defines runner.runCapture once (keep the vi.fn() mock block). Update the test in test/sandbox-add-gpu-agent.test.js by deleting the line that calls makeDefaultCaptureMock() so only the intended mock implementation for runner.runCapture remains.
10-12: Settingprocess.env.HOMEat module scope may affect other test files.This global mutation happens before any test setup and persists across the module's lifetime. Consider documenting this or using Vitest's
beforeAll/afterAllto restore the originalHOMEafter tests complete.♻️ Suggested improvement
+const originalHome = process.env.HOME; const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-add-gpu-agent-test-")); process.env.HOME = tmpDir; + +// Restore HOME after all tests (optional cleanup) +afterAll(() => { + process.env.HOME = originalHome; + fs.rmSync(tmpDir, { recursive: true, force: true }); +});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/sandbox-add-gpu-agent.test.js` around lines 10 - 12, The test sets process.env.HOME at module scope using fs.mkdtempSync into tmpDir which mutates global state for other tests; change this to set and restore HOME inside test lifecycle hooks by creating tmpDir in a beforeAll (or beforeEach) and assigning process.env.HOME there, and restore the original HOME in an afterAll (or afterEach); reference the tmpDir variable, the fs.mkdtempSync call, and process.env.HOME so you move creation into beforeAll and restore originalHome in afterAll to avoid leaking the environment change.
235-247: Dead code: firstrunCaptureassignment is immediately overwritten.Same issue as above — line 236 assigns a mock that is immediately overwritten at line 239.
♻️ Remove the dead assignment
it("skips import when image is already in k3s", async () => { - runner.runCapture = makeDefaultCaptureMock(); // returns "1" for images ls - - // Sandbox list: only cortana (so jarvis does not yet exist) + // Sandbox list: only cortana (so jarvis does not yet exist) + // Returns "1" for images ls (image already present) runner.runCapture = vi.fn().mockImplementation((cmd) => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/sandbox-add-gpu-agent.test.js` around lines 235 - 247, Remove the dead assignment to runner.runCapture by deleting the initial call to makeDefaultCaptureMock() so the test only sets runner.runCapture once via vi.fn().mockImplementation; update the test "skips import when image is already in k3s" to rely solely on the mockImplementation block (referencing runner.runCapture and makeDefaultCaptureMock and vi.fn().mockImplementation) to avoid the overwritten, unnecessary assignment.scripts/resume.sh (1)
100-107: Consider usinggrep -Ffor fixed-string matching.Although the RFC 1123 validation restricts sandbox names to
[a-z0-9-], using-F(fixed string) withgrepis more robust and explicit about intent.♻️ Example fix at line 104
- if echo "$sandbox_list" | grep -q "$SANDBOX_NAME" && echo "$sandbox_list" | grep "$SANDBOX_NAME" | grep -q "Ready"; then + if echo "$sandbox_list" | grep -qF "$SANDBOX_NAME" && echo "$sandbox_list" | grep -F "$SANDBOX_NAME" | grep -q "Ready"; thenAlso applies to: 169-177
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/resume.sh` around lines 100 - 107, The grep checks that search the sandbox name in resume.sh should use fixed-string matching to avoid accidental regex interpretation: replace the two occurrences where sandbox_list is piped into grep "$SANDBOX_NAME" (the pair used to detect presence and to check for "Ready") with grep -F "$SANDBOX_NAME" (and the second pipeline where you also check for "Ready" keep the grep -F for the sandbox name and leave the "Ready" grep unchanged); update both the block around the current_status check (variables current_status, sandbox_list, sandbox_reachable) and the similar checks at lines ~169-177 to use grep -F for the sandbox name.scripts/docker-proxy.js (1)
380-384: Binding to0.0.0.0exposes proxy to all network interfaces.While this is needed for sandbox access via
host.openshell.internal, it also exposes the proxy to any other network interface on the host (e.g., public IP). Consider documenting this risk or binding more restrictively.Consider adding a comment explaining why
0.0.0.0is necessary, or provide an environment variable to restrict the bind address:const BIND_ADDR = process.env.NEMOCLAW_DOCKER_PROXY_BIND || "0.0.0.0"; server.listen(PROXY_PORT, BIND_ADDR, () => { ... });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/docker-proxy.js` around lines 380 - 384, The current server.listen call binds to 0.0.0.0 (server.listen(PROXY_PORT, "0.0.0.0", ...)), which exposes the proxy on all host interfaces; update the binding to use a configurable address and document why 0.0.0.0 is used for sandbox access: introduce an environment-driven BIND_ADDR (e.g., NEMOCLAW_DOCKER_PROXY_BIND) defaulting to "0.0.0.0", replace the literal in server.listen(PROXY_PORT, BIND_ADDR, ...), and add an inline comment explaining that 0.0.0.0 is required for host.openshell.internal sandbox access but can be restricted in production via the env var to limit exposure.extensions/claw3d/index.ts (1)
36-44: Inconsistent error handling compared to PUT/POST helpers.
claw3dGetthrows an error without including the response body, whileclaw3dPutandclaw3dPostboth include the response text in errors. Additionally,res.json()will throw if the response is empty or not valid JSON.♻️ Suggested improvement for consistency and robustness
async function claw3dGet(baseUrl: string, path: string, params?: Record<string, string>): Promise<unknown> { const url = new URL(`${baseUrl}${path}`); if (params) { for (const [k, v] of Object.entries(params)) url.searchParams.set(k, v); } const res = await fetch(url.toString(), { headers: { Accept: "application/json" } }); - if (!res.ok) throw new Error(`Claw3D GET ${path} failed: ${res.status} ${res.statusText}`); - return res.json(); + if (!res.ok) { + const text = await res.text().catch(() => ""); + throw new Error(`Claw3D GET ${path} failed: ${res.status} ${res.statusText}${text ? " — " + text : ""}`); + } + const text = await res.text(); + return text ? JSON.parse(text) : {}; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@extensions/claw3d/index.ts` around lines 36 - 44, claw3dGet currently throws errors without the response body and directly calls res.json() which can throw on empty/invalid JSON; update claw3dGet to mirror claw3dPut/claw3dPost by reading the response text when !res.ok and including that text in the thrown Error, and parse JSON defensively (try JSON.parse on the response text and fall back to returning the raw text or undefined if empty) so callers don't crash on empty or non-JSON responses; locate the logic in function claw3dGet to implement these changes.scripts/setup-autostart.sh (1)
131-140: Path handling may break ifNEMOCLAW_ROOTcontains spaces.When
NEMOCLAW_BINis set tonode ${NEMOCLAW_ROOT}/bin/nemoclaw.js(line 134), and later used unquoted in the generated script (line 160:${NEMOCLAW_CMD} "$@"), paths with spaces will cause issues. The# shellcheck disable=SC2139acknowledges this but doesn't fix it.♻️ Suggested fix using an array or proper quoting
# Final fallback: run directly from the repo via node if [[ -f "${NEMOCLAW_ROOT}/bin/nemoclaw.js" ]]; then - NEMOCLAW_BIN="node ${NEMOCLAW_ROOT}/bin/nemoclaw.js" + NEMOCLAW_BIN="node '${NEMOCLAW_ROOT}/bin/nemoclaw.js'" info "nemoclaw not linked globally — using repo binary directly" elseThen in the generated script, use
eval:nemoclaw_run() { eval ${NEMOCLAW_CMD} "\$@" }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/setup-autostart.sh` around lines 131 - 140, The generated fallback sets NEMOCLAW_BIN to an unquoted string "node ${NEMOCLAW_ROOT}/bin/nemoclaw.js" which will break if NEMOCLAW_ROOT contains spaces; update the assignment and invocation so the command is safely preserved—either store the command as a properly quoted string or an array and change the runtime invocation (where NEMOCLAW_CMD is used) to evaluate it safely (e.g., use eval with quoted positional parameters or invoke the array form), ensuring references to NEMOCLAW_BIN / NEMOCLAW_CMD and the later call `${NEMOCLAW_CMD} "$@"` are modified accordingly to handle spaces in paths.bin/lib/sandbox-add-gpu-agent.js (2)
22-29: DuplicateDASHBOARD_PORTconstant creates drift risk.
DASHBOARD_PORTis defined here and also insandbox-resume.js(line 16). Per context snippet 2,bin/nemoclaw.jsimports it only fromsandbox-resume.js. Consider importing from a shared location to avoid potential drift.♻️ Suggested approach
Either import from
sandbox-resume.js:+const { DASHBOARD_PORT } = require("./sandbox-resume"); + const IMAGE_NAME = "nemoclaw-sandbox-ai"; const IMAGE_TAG = "v3"; const IMAGE_REF = `${IMAGE_NAME}:${IMAGE_TAG}`; -const DASHBOARD_PORT = 18789; const MAX_READY_WAIT_S = 60;Or create a shared constants file that both modules import from.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/sandbox-add-gpu-agent.js` around lines 22 - 29, DASHBOARD_PORT is duplicated in sandbox-add-gpu-agent.js and sandbox-resume.js; remove the local constant from sandbox-add-gpu-agent.js and centralize the value by exporting DASHBOARD_PORT from a single source (either export it from sandbox-resume.js and import it where needed, or create a new shared constants module and import from that) so both sandbox-add-gpu-agent.js and any other consumers (e.g., bin/nemoclaw.js) read the same exported DASHBOARD_PORT; update the imports and remove the redundant declaration in the file that previously declared it.
31-34: Platform-specific sleep implementation.Using
spawnSync("sleep", [...])will not work on Windows. Consider using a cross-platform approach.♻️ Cross-platform alternative
/** `@internal` — exposed for testing */ function _sleep(ms) { - spawnSync("sleep", [String(ms / 1000)]); + const end = Date.now() + ms; + while (Date.now() < end) { + // busy-wait (acceptable for short CLI pauses) + } }Or use
Atomics.waitif available:function _sleep(ms) { Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/sandbox-add-gpu-agent.js` around lines 31 - 34, The _sleep function uses spawnSync("sleep", ...) which fails on Windows; replace it with a cross-platform implementation: detect if Atomics.wait is available and use it for a blocking sleep, otherwise branch on process.platform—use the POSIX sleep command on non-win32 and use the Windows timeout command (e.g. "timeout /T ... /NOBREAK") on win32—so update the _sleep function to pick the correct mechanism at runtime.test/sandbox-init.test.js (1)
10-22: Consider cleaning up the temp directory after tests complete.The temp directory created at line 12 is never removed. While OS temp directories are eventually cleaned, explicitly removing it in an
afterAllhook would be cleaner and prevent accumulation during repeated test runs.Proposed addition
+import { afterAll } from "vitest"; ... +afterAll(() => { + fs.rmSync(tmpDir, { recursive: true, force: true }); +});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/sandbox-init.test.js` around lines 10 - 22, Add cleanup for the temporary HOME created by tmpDir (from fs.mkdtempSync) by adding an afterAll test hook that removes the directory and restores environment state; specifically, in the test file add an afterAll that calls fs.rmSync(tmpDir, { recursive: true, force: true }) (or fs.rmdirSync for older Node) and optionally resets process.env.HOME to its original value saved before assignment, referencing the tmpDir variable and the code block that sets process.env.HOME.extensions/docker-proxy/docker-client.ts (2)
173-187: Pull errors may be silently ignored.Docker image pull errors appear as
{"error": "..."}in the stream. The current implementation only returnslast?.status, potentially masking pull failures. Consider checking for an error field.Suggested improvement
const lines = raw.trim().split("\n"); const statuses = lines.map((l) => { try { return JSON.parse(l); } catch { return null; } }).filter(Boolean); const last = statuses[statuses.length - 1]; + if (last?.error) { + throw new Error(`Image pull failed: ${last.error}`); + } return last?.status ?? "Pull complete";🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@extensions/docker-proxy/docker-client.ts` around lines 173 - 187, The imagePull function currently parses the stream into statuses but only returns last?.status, which can hide errors because Docker streams error objects like {"error":"..."}; update imagePull (and its parsing of raw/lines/statuses/last) to inspect parsed status objects for an error or Error field (check any status with error before taking last), and if found either throw a descriptive Error or return the error message instead of a success status; ensure JSON.parse failures are still handled and include the error text from the status object in the thrown/returned value so pull failures are not silently ignored.
98-103: Inconsistent status code handling.The
noContentmethod accepts 3xx status codes (>= 400), whilejsonandtextreject anything>= 300. This inconsistency could mask unexpected redirect responses.Suggested fix for consistency
private async noContent(opts: RequestOptions): Promise<void> { const { status, body } = await this.request(opts); - if (status < 200 || status >= 400) { + if (status < 200 || status >= 300) { throw new Error(`Docker API ${opts.method} ${opts.path} returned ${status}: ${body.slice(0, 200)}`); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@extensions/docker-proxy/docker-client.ts` around lines 98 - 103, The noContent method currently allows 3xx redirects while json and text reject any status >=300; update noContent (method noContent and its use of RequestOptions) to treat any status <200 or >=300 as error to match json/text behavior by changing the condition from "status < 200 || status >= 400" to "status < 200 || status >= 300" and keep the same error construction (`Docker API ${opts.method} ${opts.path} returned ${status}: ${body.slice(0, 200)`) so redirect responses are not silently accepted.bin/lib/sandbox-resume.js (1)
39-48: Substring match may cause false positives.
sbListOut.includes(sandboxName)could incorrectly match if one sandbox name is a substring of another (e.g., "foo" matches "foobar"). Consider using a regex with word boundaries or parsing the structured output.Suggested improvement
- if (!sbListOut.includes(sandboxName)) { + // Use word boundary to avoid substring false positives + const namePattern = new RegExp(`\\b${sandboxName.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')}\\b`); + if (!namePattern.test(sbListOut)) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/sandbox-resume.js` around lines 39 - 48, The current check uses sbListOut.includes(sandboxName) which can false-positive on substrings; update the validation after runner.runCapture("openshell sandbox list...") to detect exact sandbox names by splitting sbListOut into lines (e.g., sbListOut.split(/\r?\n/).map(l => l.trim())) and checking for an exact equality match against sandboxName, or use a regex with anchors/word-boundaries (e.g., new RegExp(`^${escapeRegExp(sandboxName)}$`, 'm')) to ensure whole-name matches; adjust the error branch to use the new boolean (e.g., foundSandbox) and keep the existing messaging and return shape.scripts/add-gpu-agent.sh (3)
330-365: Shell variable injection into Python code is fragile.The pattern
'"${AGENT_NAME}"'embeds shell variables directly into Python source. IfAGENT_NAMEcontains quotes or special characters, the Python syntax breaks. Earlier validation (line 87) helps, but consider passing values via environment variables for safety.Safer approach using environment variables
-docker exec -i "${GATEWAY_CONTAINER}" kubectl exec -n openshell "${PARENT_NAME}" -- python3 -c ' +docker exec -i "${GATEWAY_CONTAINER}" kubectl exec -n openshell "${PARENT_NAME}" -- \ + env AGENT_NAME="${AGENT_NAME}" PARENT_NAME="${PARENT_NAME}" python3 -c ' import json, os, stat config_path = "/sandbox/.openclaw/openclaw.json" config = json.load(open(config_path)) +agent_name = os.environ["AGENT_NAME"] +parent_name = os.environ["PARENT_NAME"] if "agents" not in config: config["agents"] = {"defaults": {}, "list": []} ... -existing = [a for a in agent_list if a.get("id") == "'"${AGENT_NAME}"'"] +existing = [a for a in agent_list if a.get("id") == agent_name]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/add-gpu-agent.sh` around lines 330 - 365, The Python snippet in scripts/add-gpu-agent.sh injects shell variables directly into source (e.g., '"${AGENT_NAME}"' and '"${PARENT_NAME}"'), which breaks on special chars; instead export/pass AGENT_NAME and PARENT_NAME as environment variables into the docker/kubectl exec call and read them inside the Python block via os.environ (use os.environ.get("AGENT_NAME") / os.environ.get("PARENT_NAME")), then use those values when building/ searching agent_list and parent; keep using config_path, parent, and agent_list identifiers but remove all inline '"${...}"' substitutions so the Python source is static and robust.
107-108: GNU grep-Pflag limits portability.
grep -oPuses Perl-compatible regex which isn't available on BSD/macOS grep. Since this script targets GPU setups (Linux), this is likely acceptable, but consider usingsedorawkfor broader compatibility.Portable alternative using sed
-GATEWAY_NAME=$(openshell gateway info 2>&1 | grep -oP 'Gateway:\s+\K\S+' || echo "nemoclaw") +GATEWAY_NAME=$(openshell gateway info 2>&1 | sed -n 's/.*Gateway:[[:space:]]*\([^[:space:]]*\).*/\1/p' | head -1) +[[ -n "$GATEWAY_NAME" ]] || GATEWAY_NAME="nemoclaw"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/add-gpu-agent.sh` around lines 107 - 108, The use of grep -oP to extract the gateway name is not portable; replace the grep -oP invocation used to set GATEWAY_NAME (the command invoking "openshell gateway info") with a POSIX-friendly text extraction such as awk or sed that matches the "Gateway:" label and returns the following non-space token (so GATEWAY_NAME remains the gateway identifier). Update the line that defines GATEWAY_NAME to pipe the openshell output into the chosen awk/sed expression and keep the fallback echo "nemoclaw" intact.
372-392:datetime.utcnow()is deprecated in Python 3.12+.Line 382 uses
datetime.datetime.utcnow()which is deprecated. For forward compatibility, usedatetime.datetime.now(datetime.UTC).Suggested fix
- 'createdAt': datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S.000Z'), + 'createdAt': datetime.datetime.now(datetime.UTC).strftime('%Y-%m-%dT%H:%M:%S.000Z'),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/add-gpu-agent.sh` around lines 372 - 392, The Python snippet in scripts/add-gpu-agent.sh uses datetime.datetime.utcnow(), which is deprecated; update the imports and call to use an explicit UTC-aware timestamp (e.g., use datetime.datetime.now(datetime.timezone.utc) or from datetime import datetime, timezone and call datetime.now(timezone.utc)) and keep the existing .strftime('%Y-%m-%dT%H:%M:%S.000Z') formatting; specifically modify the import line and replace datetime.datetime.utcnow() in the block that constructs 'createdAt' (the dict assigned to d['sandboxes']['${AGENT_NAME}']) so the timestamp is timezone-aware and compatible with Python 3.12+.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8f9deee4-3e2f-44cf-bd59-53684922ab8f
📒 Files selected for processing (36)
.agents/skills/docs/nemoclaw-mount-filesystem/SKILL.md.agents/skills/nemoclaw-docker-proxy/SKILL.md.agents/skills/nemoclaw-gpu-sandbox/SKILL.mdDockerfileDockerfile.baseDockerfile.sandbox-aibin/lib/nim.jsbin/lib/onboard.jsbin/lib/sandbox-add-gpu-agent.jsbin/lib/sandbox-init.jsbin/lib/sandbox-resume.jsbin/nemoclaw.jsdocs/deployment/run-multiple-gpu-agents.mddocs/index.mdextensions/claw3d/index.tsextensions/claw3d/openclaw.plugin.jsonextensions/docker-proxy/docker-client.tsextensions/docker-proxy/index.tsextensions/docker-proxy/openclaw.plugin.jsonnemoclaw-blueprint/policies/presets/claw3d.yamlnemoclaw-blueprint/policies/presets/docker-proxy.yamlscripts/add-gpu-agent.shscripts/backup-workspace.shscripts/docker-proxy.jsscripts/mount-sandbox.shscripts/nemoclaw-start.shscripts/post-onboard-gpu.shscripts/resume-all-sandboxes.shscripts/resume.shscripts/setup-autostart.shscripts/start-services.shtest/cli.test.jstest/nim.test.jstest/sandbox-add-gpu-agent.test.jstest/sandbox-init.test.jstest/sandbox-resume.test.js
f72f57f to
07a8307
Compare
|
✨ Thanks for submitting this PR with a detailed summary, it addresses a bug with GPU detection on specific hardware platforms and proposes a fix to improve the compatibility of NemoClaw with these devices. |
detectGpu() queries nvidia-smi for VRAM, which returns [N/A] on unified-memory architectures. The existing fallback matched GPU names containing "GB10" (DGX Spark) but missed Jetson AGX Thor and Orin, leaving those devices undetected. Broaden the name check to ["GB10", "Thor", "Orin"]. Align nim.js with the runner.runCapture() indirection already used in sandbox-add-gpu-agent.js to enable mocked test coverage of the fallback path. Five new tests exercise each device tag, a desktop GPU negative case, and the standard VRAM-queryable early return. Fixes NVIDIA#300
d3f80af to
f49bd0c
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/nim.test.js`:
- Around line 138-150: The test currently uses an if (gpu) guard which can hide
failures; instead assert the GPU was detected and that unified-memory fallback
was not taken: first assert gpu is defined (or truthy) after calling
nim.detectGpu(), then assert runner.runCapture was never called with a command
containing "free -m" (to ensure no fallback probe occurred), and finally assert
gpu.spark is undefined to confirm no unified-memory tagging; reference
nim.detectGpu and runner.runCapture in your changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e709151a-2e91-451a-981e-c650e5617499
📒 Files selected for processing (2)
bin/lib/nim.jstest/nim.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
- bin/lib/nim.js
| it("does not trigger fallback for desktop GPU names", () => { | ||
| // Desktop GPUs return valid VRAM, but even if VRAM were [N/A], | ||
| // the name must not match any unified-memory tag. | ||
| runner.runCapture = vi.fn().mockImplementation((cmd) => { | ||
| if (cmd.includes("--query-gpu=memory.total")) return "[N/A]"; | ||
| if (cmd.includes("--query-gpu=name")) return "NVIDIA GeForce RTX 4090"; | ||
| return ""; | ||
| }); | ||
| const gpu = nim.detectGpu(); | ||
| // Should NOT match as a unified-memory device | ||
| if (gpu) { | ||
| expect(gpu.spark).toBeUndefined(); | ||
| } |
There was a problem hiding this comment.
Tighten the negative-case assertion to avoid false positives.
The if (gpu) guard allows this test to pass even when it validates nothing. Assert that unified-memory fallback was not taken by checking command behavior (no free -m call), and keep the spark assertion.
Suggested test hardening
const gpu = nim.detectGpu();
- // Should NOT match as a unified-memory device
- if (gpu) {
- expect(gpu.spark).toBeUndefined();
- }
+ const commands = runner.runCapture.mock.calls.map(([cmd]) => cmd);
+ expect(commands.some((cmd) => cmd.includes("--query-gpu=name"))).toBe(true);
+ expect(commands.some((cmd) => cmd.includes("free -m"))).toBe(false);
+ expect(gpu?.spark).toBeUndefined();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it("does not trigger fallback for desktop GPU names", () => { | |
| // Desktop GPUs return valid VRAM, but even if VRAM were [N/A], | |
| // the name must not match any unified-memory tag. | |
| runner.runCapture = vi.fn().mockImplementation((cmd) => { | |
| if (cmd.includes("--query-gpu=memory.total")) return "[N/A]"; | |
| if (cmd.includes("--query-gpu=name")) return "NVIDIA GeForce RTX 4090"; | |
| return ""; | |
| }); | |
| const gpu = nim.detectGpu(); | |
| // Should NOT match as a unified-memory device | |
| if (gpu) { | |
| expect(gpu.spark).toBeUndefined(); | |
| } | |
| it("does not trigger fallback for desktop GPU names", () => { | |
| // Desktop GPUs return valid VRAM, but even if VRAM were [N/A], | |
| // the name must not match any unified-memory tag. | |
| runner.runCapture = vi.fn().mockImplementation((cmd) => { | |
| if (cmd.includes("--query-gpu=memory.total")) return "[N/A]"; | |
| if (cmd.includes("--query-gpu=name")) return "NVIDIA GeForce RTX 4090"; | |
| return ""; | |
| }); | |
| const gpu = nim.detectGpu(); | |
| const commands = runner.runCapture.mock.calls.map(([cmd]) => cmd); | |
| expect(commands.some((cmd) => cmd.includes("--query-gpu=name"))).toBe(true); | |
| expect(commands.some((cmd) => cmd.includes("free -m"))).toBe(false); | |
| expect(gpu?.spark).toBeUndefined(); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/nim.test.js` around lines 138 - 150, The test currently uses an if (gpu)
guard which can hide failures; instead assert the GPU was detected and that
unified-memory fallback was not taken: first assert gpu is defined (or truthy)
after calling nim.detectGpu(), then assert runner.runCapture was never called
with a command containing "free -m" (to ensure no fallback probe occurred), and
finally assert gpu.spark is undefined to confirm no unified-memory tagging;
reference nim.detectGpu and runner.runCapture in your changes.
## Summary - stop requiring `NVIDIA_API_KEY` for local-only `nemoclaw start` and only gate the Telegram bridge when that bridge actually needs the key - clean up the dashboard forward, `nemoclaw` gateway, and `openshell-cluster-nemoclaw` Docker volumes when the last sandbox is destroyed - broaden unified-memory NVIDIA GPU detection beyond `GB10` while keeping `spark: true` specific to GB10 - harden policy merge/retry behavior so truncated or error-like current-policy reads rebuild from a clean `version: 1` scaffold instead of producing malformed YAML ## Issue Mapping Fixes #1191 Fixes #1160 Fixes #1182 Fixes #1162 Related #991 ## Notes - `#1188` was investigated but is not included in this PR. - The current evidence still points to a deeper runtime / proxy reachability problem on macOS + Colima rather than a bounded NemoClaw-only fix. - Keeping it out of this branch avoids speculative networking changes without strong reproduction and cross-platform coverage. ## Validation ```bash npx vitest run npx eslint bin/nemoclaw.js bin/lib/nim.js bin/lib/policies.js test/cli.test.js test/nim.test.js test/policies.test.js test/service-env.test.js npx tsc -p jsconfig.json --noEmit ``` ## References Reviewed - #1106 - #308 - #95 - #770 Signed-off-by: Kevin Jones <kejones@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** - Core services can start without an NVIDIA API key. - Enhanced unified‑memory GPU detection with more accurate capability reporting. * **Bug Fixes** - Gateway and forwarded‑port cleanup only runs when the last sandbox is removed and no live sandboxes remain. - Telegram bridge now starts only when both required tokens are present; clearer startup warnings. - Policy parsing/merge more robust for metadata‑only or malformed inputs; consistent version header formatting. * **Tests** - Added tests covering GPU detection, policy parsing/merge, CLI sandbox/gateway flows, and service startup. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary - stop requiring `NVIDIA_API_KEY` for local-only `nemoclaw start` and only gate the Telegram bridge when that bridge actually needs the key - clean up the dashboard forward, `nemoclaw` gateway, and `openshell-cluster-nemoclaw` Docker volumes when the last sandbox is destroyed - broaden unified-memory NVIDIA GPU detection beyond `GB10` while keeping `spark: true` specific to GB10 - harden policy merge/retry behavior so truncated or error-like current-policy reads rebuild from a clean `version: 1` scaffold instead of producing malformed YAML ## Issue Mapping Fixes #1191 Fixes #1160 Fixes #1182 Fixes #1162 Related #991 ## Notes - `#1188` was investigated but is not included in this PR. - The current evidence still points to a deeper runtime / proxy reachability problem on macOS + Colima rather than a bounded NemoClaw-only fix. - Keeping it out of this branch avoids speculative networking changes without strong reproduction and cross-platform coverage. ## Validation ```bash npx vitest run npx eslint bin/nemoclaw.js bin/lib/nim.js bin/lib/policies.js test/cli.test.js test/nim.test.js test/policies.test.js test/service-env.test.js npx tsc -p jsconfig.json --noEmit ``` ## References Reviewed - #1106 - #308 - #95 - #770 Signed-off-by: Kevin Jones <kejones@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** - Core services can start without an NVIDIA API key. - Enhanced unified‑memory GPU detection with more accurate capability reporting. * **Bug Fixes** - Gateway and forwarded‑port cleanup only runs when the last sandbox is removed and no live sandboxes remain. - Telegram bridge now starts only when both required tokens are present; clearer startup warnings. - Policy parsing/merge more robust for metadata‑only or malformed inputs; consistent version header formatting. * **Tests** - Added tests covering GPU detection, policy parsing/merge, CLI sandbox/gateway flows, and service startup. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary - stop requiring `NVIDIA_API_KEY` for local-only `nemoclaw start` and only gate the Telegram bridge when that bridge actually needs the key - clean up the dashboard forward, `nemoclaw` gateway, and `openshell-cluster-nemoclaw` Docker volumes when the last sandbox is destroyed - broaden unified-memory NVIDIA GPU detection beyond `GB10` while keeping `spark: true` specific to GB10 - harden policy merge/retry behavior so truncated or error-like current-policy reads rebuild from a clean `version: 1` scaffold instead of producing malformed YAML ## Issue Mapping Fixes NVIDIA#1191 Fixes NVIDIA#1160 Fixes NVIDIA#1182 Fixes NVIDIA#1162 Related NVIDIA#991 ## Notes - `NVIDIA#1188` was investigated but is not included in this PR. - The current evidence still points to a deeper runtime / proxy reachability problem on macOS + Colima rather than a bounded NemoClaw-only fix. - Keeping it out of this branch avoids speculative networking changes without strong reproduction and cross-platform coverage. ## Validation ```bash npx vitest run npx eslint bin/nemoclaw.js bin/lib/nim.js bin/lib/policies.js test/cli.test.js test/nim.test.js test/policies.test.js test/service-env.test.js npx tsc -p jsconfig.json --noEmit ``` ## References Reviewed - NVIDIA#1106 - NVIDIA#308 - NVIDIA#95 - NVIDIA#770 Signed-off-by: Kevin Jones <kejones@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** - Core services can start without an NVIDIA API key. - Enhanced unified‑memory GPU detection with more accurate capability reporting. * **Bug Fixes** - Gateway and forwarded‑port cleanup only runs when the last sandbox is removed and no live sandboxes remain. - Telegram bridge now starts only when both required tokens are present; clearer startup warnings. - Policy parsing/merge more robust for metadata‑only or malformed inputs; consistent version header formatting. * **Tests** - Added tests covering GPU detection, policy parsing/merge, CLI sandbox/gateway flows, and service startup. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
|
Thanks for working on this! This was superseded by #1271, which migrated |
Problem
detectGpu()fails to recognize NVIDIA Jetson AGX Thor and Jetson AGX Orin as valid GPU devices, returningnulland blocking sandbox creation on supported hardware.Root Cause
On unified-memory architectures,
nvidia-smi --query-gpu=memory.totalreturns[N/A]rather than a numeric value. The VRAM query path parses this asNaN, filters it out, and falls through to a name-based fallback. That fallback only matched"GB10"(DGX Spark), so Thor and Orin were silently dropped.Fix
Broaden the unified-memory fallback to match
["GB10", "Thor", "Orin"]via a declarative tag list. No changes to detection semantics — the same fallback path executes, reading system RAM as the VRAM estimate.The
runCapturecall site was aligned with therunner.runCapture()indirection already established insandbox-add-gpu-agent.js, enabling direct unit test coverage of the fallback without introducing a new mocking pattern.Validation
Fixes #300
Summary by CodeRabbit
New Features
Bug Fixes
Tests