Summary
Three distinct issues in the NVIDIA Endpoints (nvidia-prod) onboarding probe path make custom model selection from the public NVIDIA Build catalog look broken even when the model would otherwise work.
The pre-listed Nemotron models (the default NemoClaw onboarding path) work fine today. These bugs only bite users who pick a custom model ID from https://integrate.api.nvidia.com/v1/models. Bugs 2 and 3 are the actual user-impact issues; Bug 1 is cosmetic noise that only becomes visible on already-failing models.
Reported by @kaviin27 in the NVIDIA Developer Discord.
Reporter's symptoms
Setting the model to mistralai/mistral-medium-3-instruct (or other IDs from https://integrate.api.nvidia.com/v1/models):
Responses API with tool calling: HTTP 404: 404 page not found
Chat Completions API: HTTP 404: Function 'ZZZZ': Not found for account 'XXXX'
Pre-listed Nemotron 120B models work fine.
Live reproduction (against https://integrate.api.nvidia.com/v1 with a free NVIDIA Build key)
| Probe |
Result |
Notes |
GET /v1/models/mistralai/mistral-medium-3-instruct |
200 |
model exists in catalog |
POST /v1/chat/completions (mistral-medium-3-instruct) |
timeout >60s, no bytes |
NVIDIA backend hangs |
POST /v1/chat/completions (mistral-large) |
404 Function 'XXX': Not found for account 'YYY' |
NVCF deployment missing for account |
POST /v1/chat/completions (mistral-7b-instruct-v0.3) |
200 in 0.64s |
works |
POST /v1/chat/completions (mistral-small-3.1-24b-instruct-2503) |
200 in 0.55s |
works |
POST /v1/chat/completions (nvidia/llama-3.3-nemotron-super-49b-v1) |
200 in 0.67s |
control |
POST /v1/responses (any model, including bogus IDs) |
404 page not found |
endpoint does not exist on NVIDIA Build at all |
POST /v1/chat/completions (bogus/does-not-exist) |
404 page not found |
distinct from the NVCF "Function not found" string |
Status
All three bugs are fixed in PR #1602.
Bug 1 (cosmetic): Responses API probe fires on every NVIDIA Build model and always returns 404
- Code:
bin/lib/onboard.js:997-1001 — shouldRequireResponsesToolCalling() returns true for nvidia-prod
- NVIDIA Build does not expose
/v1/responses for any model. Confirmed by sending the probe to bogus model IDs and even to nvidia/llama-3.3-nemotron-super-49b-v1 — same 404 page not found every time.
- The probe loop in
bin/lib/onboard.js:1101-1113 silently falls through to the Chat Completions probe on failure, so working models like Nemotron still onboard successfully — the 404 is only user-visible when chat completions also fails. That's why @kaviin27's error message had both halves.
- Severity: cosmetic noise. Skipping a known-404 probe removes the misleading first half of failure messages and makes the real error easier to read.
Bug 2: NVCF "Function not found for account" is not detected and the user has no actionable message
- Code:
bin/lib/onboard.js:1115-1119 — probeOpenAiLikeEndpoint() joins all probe failures into one raw string
- Many catalog models (e.g.
mistral-large) return 404 Function '<uuid>': Not found for account '<account-id>'. This is NVCF's way of saying "the model is in the catalog but not deployed for your account/org".
- The raw NVCF error gets surfaced with no explanation, so users assume NemoClaw is broken.
- Suggested fix: detect the
Function .*Not found for account pattern in chat-completions probe failures and reframe with a clear message like:
Model <id> is in the NVIDIA Build catalog but is not deployed for your account. Pick a different model, or check the model card on https://build.nvidia.com to see if it requires org-level access.
Bug 3: Probes have no per-request timeout, so a hung NVIDIA backend hangs onboard forever
- Confirmed: chat completions for
mistralai/mistral-medium-3-instruct consistently hangs 30+ seconds with zero bytes returned (3 attempts, all timed out).
runCurlProbe's curl invocation needs a --max-time so a misbehaving model doesn't lock up the wizard. Recommend ~15s for the validation probe.
Repro command (anyone can run with their own key)
KEY="<your nvapi-... key>"
echo "=== broken model that hangs ==="
curl -sS -m 30 -w "\nHTTP %{http_code}\n" \
https://integrate.api.nvidia.com/v1/chat/completions \
-H "Authorization: Bearer $KEY" \
-H "Content-Type: application/json" \
-d '{"model":"mistralai/mistral-medium-3-instruct","messages":[{"role":"user","content":"hi"}],"max_tokens":1}'
echo "=== broken model that NVCF-404s ==="
curl -sS -m 30 -w "\nHTTP %{http_code}\n" \
https://integrate.api.nvidia.com/v1/chat/completions \
-H "Authorization: Bearer $KEY" \
-H "Content-Type: application/json" \
-d '{"model":"mistralai/mistral-large","messages":[{"role":"user","content":"hi"}],"max_tokens":1}'
echo "=== responses API 404 (all models) ==="
curl -sS -m 15 -w "\nHTTP %{http_code}\n" \
https://integrate.api.nvidia.com/v1/responses \
-H "Authorization: Bearer $KEY" \
-H "Content-Type: application/json" \
-d '{"model":"nvidia/llama-3.3-nemotron-super-49b-v1","input":"hi"}'
Summary
Three distinct issues in the NVIDIA Endpoints (
nvidia-prod) onboarding probe path make custom model selection from the public NVIDIA Build catalog look broken even when the model would otherwise work.The pre-listed Nemotron models (the default NemoClaw onboarding path) work fine today. These bugs only bite users who pick a custom model ID from
https://integrate.api.nvidia.com/v1/models. Bugs 2 and 3 are the actual user-impact issues; Bug 1 is cosmetic noise that only becomes visible on already-failing models.Reported by @kaviin27 in the NVIDIA Developer Discord.
Reporter's symptoms
Live reproduction (against
https://integrate.api.nvidia.com/v1with a free NVIDIA Build key)GET /v1/models/mistralai/mistral-medium-3-instructPOST /v1/chat/completions(mistral-medium-3-instruct)POST /v1/chat/completions(mistral-large)Function 'XXX': Not found for account 'YYY'POST /v1/chat/completions(mistral-7b-instruct-v0.3)POST /v1/chat/completions(mistral-small-3.1-24b-instruct-2503)POST /v1/chat/completions(nvidia/llama-3.3-nemotron-super-49b-v1)POST /v1/responses(any model, including bogus IDs)page not foundPOST /v1/chat/completions(bogus/does-not-exist)page not foundStatus
All three bugs are fixed in PR #1602.
Bug 1 (cosmetic): Responses API probe fires on every NVIDIA Build model and always returns 404
bin/lib/onboard.js:997-1001—shouldRequireResponsesToolCalling()returnstruefornvidia-prod/v1/responsesfor any model. Confirmed by sending the probe to bogus model IDs and even tonvidia/llama-3.3-nemotron-super-49b-v1— same404 page not foundevery time.bin/lib/onboard.js:1101-1113silently falls through to the Chat Completions probe on failure, so working models like Nemotron still onboard successfully — the 404 is only user-visible when chat completions also fails. That's why @kaviin27's error message had both halves.Bug 2: NVCF "Function not found for account" is not detected and the user has no actionable message
bin/lib/onboard.js:1115-1119—probeOpenAiLikeEndpoint()joins all probe failures into one raw stringmistral-large) return404 Function '<uuid>': Not found for account '<account-id>'. This is NVCF's way of saying "the model is in the catalog but not deployed for your account/org".Function .*Not found for accountpattern in chat-completions probe failures and reframe with a clear message like:Bug 3: Probes have no per-request timeout, so a hung NVIDIA backend hangs onboard forever
mistralai/mistral-medium-3-instructconsistently hangs 30+ seconds with zero bytes returned (3 attempts, all timed out).runCurlProbe's curl invocation needs a--max-timeso a misbehaving model doesn't lock up the wizard. Recommend ~15s for the validation probe.Repro command (anyone can run with their own key)