fix(onboard): NVIDIA Endpoints model selection has 4 broken probe behaviors causing misleading errors and hung wizard

## Summary

Three distinct issues in the NVIDIA Endpoints (`nvidia-prod`) onboarding probe path make custom model selection from the public NVIDIA Build catalog look broken even when the model would otherwise work.

The pre-listed Nemotron models (the default NemoClaw onboarding path) work fine today. These bugs only bite users who pick a custom model ID from `https://integrate.api.nvidia.com/v1/models`. Bugs 2 and 3 are the actual user-impact issues; Bug 1 is cosmetic noise that only becomes visible on already-failing models.

Reported by **@kaviin27** in the NVIDIA Developer Discord.

## Reporter's symptoms

> Setting the model to `mistralai/mistral-medium-3-instruct` (or other IDs from `https://integrate.api.nvidia.com/v1/models`):
>
> - `Responses API with tool calling: HTTP 404: 404 page not found`
> - `Chat Completions API: HTTP 404: Function 'ZZZZ': Not found for account 'XXXX'`
>
> Pre-listed Nemotron 120B models work fine.

## Live reproduction (against `https://integrate.api.nvidia.com/v1` with a free NVIDIA Build key)

| Probe | Result | Notes |
|---|---|---|
| `GET /v1/models/mistralai/mistral-medium-3-instruct` | **200** | model exists in catalog |
| `POST /v1/chat/completions` (mistral-medium-3-instruct) | **timeout >60s, no bytes** | NVIDIA backend hangs |
| `POST /v1/chat/completions` (mistral-large) | **404** `Function 'XXX': Not found for account 'YYY'` | NVCF deployment missing for account |
| `POST /v1/chat/completions` (mistral-7b-instruct-v0.3) | **200** in 0.64s | works |
| `POST /v1/chat/completions` (mistral-small-3.1-24b-instruct-2503) | **200** in 0.55s | works |
| `POST /v1/chat/completions` (nvidia/llama-3.3-nemotron-super-49b-v1) | **200** in 0.67s | control |
| `POST /v1/responses` (any model, including bogus IDs) | **404** `page not found` | endpoint does not exist on NVIDIA Build at all |
| `POST /v1/chat/completions` (`bogus/does-not-exist`) | **404** `page not found` | distinct from the NVCF "Function not found" string |

## Status

All three bugs are fixed in PR #1602.

## Bug 1 (cosmetic): Responses API probe fires on every NVIDIA Build model and always returns 404

- Code: [`bin/lib/onboard.js:997-1001`](https://github.com/NVIDIA/NemoClaw/blob/main/bin/lib/onboard.js#L997-L1001) — `shouldRequireResponsesToolCalling()` returns `true` for `nvidia-prod`
- NVIDIA Build does **not expose** `/v1/responses` for any model. Confirmed by sending the probe to bogus model IDs and even to `nvidia/llama-3.3-nemotron-super-49b-v1` — same `404 page not found` every time.
- The probe loop in [`bin/lib/onboard.js:1101-1113`](https://github.com/NVIDIA/NemoClaw/blob/main/bin/lib/onboard.js#L1101-L1113) silently falls through to the Chat Completions probe on failure, so working models like Nemotron still onboard successfully — the 404 is only user-visible when chat completions **also** fails. That's why @kaviin27's error message had both halves.
- Severity: **cosmetic noise**. Skipping a known-404 probe removes the misleading first half of failure messages and makes the real error easier to read.

## Bug 2: NVCF "Function not found for account" is not detected and the user has no actionable message

- Code: [`bin/lib/onboard.js:1115-1119`](https://github.com/NVIDIA/NemoClaw/blob/main/bin/lib/onboard.js#L1115-L1119) — `probeOpenAiLikeEndpoint()` joins all probe failures into one raw string
- Many catalog models (e.g. `mistral-large`) return `404 Function '<uuid>': Not found for account '<account-id>'`. This is NVCF's way of saying "the model is in the catalog but not deployed for your account/org".
- The raw NVCF error gets surfaced with no explanation, so users assume NemoClaw is broken.
- Suggested fix: detect the `Function .*Not found for account` pattern in chat-completions probe failures and reframe with a clear message like:
  > Model `<id>` is in the NVIDIA Build catalog but is not deployed for your account. Pick a different model, or check the model card on https://build.nvidia.com to see if it requires org-level access.

## Bug 3: Probes have no per-request timeout, so a hung NVIDIA backend hangs onboard forever

- Confirmed: chat completions for `mistralai/mistral-medium-3-instruct` consistently hangs **30+ seconds** with zero bytes returned (3 attempts, all timed out).
- `runCurlProbe`'s curl invocation needs a `--max-time` so a misbehaving model doesn't lock up the wizard. Recommend ~15s for the validation probe.

## Repro command (anyone can run with their own key)

```bash
KEY="<your nvapi-... key>"
echo "=== broken model that hangs ==="
curl -sS -m 30 -w "\nHTTP %{http_code}\n" \
  https://integrate.api.nvidia.com/v1/chat/completions \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"mistralai/mistral-medium-3-instruct","messages":[{"role":"user","content":"hi"}],"max_tokens":1}'

echo "=== broken model that NVCF-404s ==="
curl -sS -m 30 -w "\nHTTP %{http_code}\n" \
  https://integrate.api.nvidia.com/v1/chat/completions \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"mistralai/mistral-large","messages":[{"role":"user","content":"hi"}],"max_tokens":1}'

echo "=== responses API 404 (all models) ==="
curl -sS -m 15 -w "\nHTTP %{http_code}\n" \
  https://integrate.api.nvidia.com/v1/responses \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"nvidia/llama-3.3-nemotron-super-49b-v1","input":"hi"}'
```


Probe	Result	Notes
`GET /v1/models/mistralai/mistral-medium-3-instruct`	200	model exists in catalog
`POST /v1/chat/completions` (mistral-medium-3-instruct)	timeout >60s, no bytes	NVIDIA backend hangs
`POST /v1/chat/completions` (mistral-large)	404 `Function 'XXX': Not found for account 'YYY'`	NVCF deployment missing for account
`POST /v1/chat/completions` (mistral-7b-instruct-v0.3)	200 in 0.64s	works
`POST /v1/chat/completions` (mistral-small-3.1-24b-instruct-2503)	200 in 0.55s	works
`POST /v1/chat/completions` (nvidia/llama-3.3-nemotron-super-49b-v1)	200 in 0.67s	control
`POST /v1/responses` (any model, including bogus IDs)	404 `page not found`	endpoint does not exist on NVIDIA Build at all
`POST /v1/chat/completions` (`bogus/does-not-exist`)	404 `page not found`	distinct from the NVCF "Function not found" string

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(onboard): NVIDIA Endpoints model selection has 4 broken probe behaviors causing misleading errors and hung wizard #1601

Summary

Reporter's symptoms

Live reproduction (against `https://integrate.api.nvidia.com/v1` with a free NVIDIA Build key)

Status

Bug 1 (cosmetic): Responses API probe fires on every NVIDIA Build model and always returns 404

Bug 2: NVCF "Function not found for account" is not detected and the user has no actionable message

Bug 3: Probes have no per-request timeout, so a hung NVIDIA backend hangs onboard forever

Repro command (anyone can run with their own key)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fix(onboard): NVIDIA Endpoints model selection has 4 broken probe behaviors causing misleading errors and hung wizard #1601

Description

Summary

Reporter's symptoms

Live reproduction (against https://integrate.api.nvidia.com/v1 with a free NVIDIA Build key)

Status

Bug 1 (cosmetic): Responses API probe fires on every NVIDIA Build model and always returns 404

Bug 2: NVCF "Function not found for account" is not detected and the user has no actionable message

Bug 3: Probes have no per-request timeout, so a hung NVIDIA backend hangs onboard forever

Repro command (anyone can run with their own key)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Live reproduction (against `https://integrate.api.nvidia.com/v1` with a free NVIDIA Build key)