Skip to content

Studio: local diffusion image generation (CI validation)#88

Open
danielhanchen wants to merge 187 commits into
mainfrom
studio-diffusion-images-staging
Open

Studio: local diffusion image generation (CI validation)#88
danielhanchen wants to merge 187 commits into
mainfrom
studio-diffusion-images-staging

Conversation

@danielhanchen
Copy link
Copy Markdown
Member

Staging-only PR to validate the Studio diffusion image generation pipeline on Ubuntu, macOS, and Windows GitHub Actions runners before opening the upstream PR.

Backend

  • core/inference/diffusion.py - DiffusionBackend that loads diffusion GGUFs from Hugging Face via diffusers.GGUFQuantizationConfig and runs them on the active CUDA / MPS / CPU device. Supports FLUX.2, FLUX.2 klein, FLUX.1, Qwen-Image, Stable Diffusion 3, and SDXL.
  • routes/inference.py - POST /api/inference/images/load, POST /api/inference/images/generate, POST /api/inference/images/unload, GET /api/inference/images/status mirroring the existing llama-server lifecycle.
  • models/inference.py - DiffusionLoadRequest / DiffusionGenerateRequest / DiffusionGenerateResponse schemas with prompt and size validation up front.
  • requirements/no-torch-runtime.txt - pin gguf alongside the existing diffusers entry so GGUFQuantizationConfig works out of the box.
  • tests/test_diffusion_backend.py + tests/test_diffusion_routes.py - 27 CPU-only unit tests covering family detection, lifecycle, validation, and the full FastAPI round trip with diffusers stubbed.

Frontend

  • features/images/ - standalone Images page with curated FLUX.2 picker, HF token, prompt + negative prompt, resolution presets, steps + guidance sliders, seed input, and a result gallery rendering base64 PNGs inline.
  • app/routes/images.tsx - lazy /images route wired into router.tsx.
  • components/app-sidebar.tsx - PaintBrush02Icon nav item between Recipes and Export.

CI scope (staging only)

Three targeted workflows pinned to ubuntu-latest, macos-14, and windows-latest. Each scoped via paths to studio/backend/** so unrelated changes do not re-trigger them. The inherited workflow bloat from the staging fork was cleared so we stay under the 5-Windows-runner cap.

Not for merge upstream from this fork. Upstream PR will follow once the runners go green.

melroy89 and others added 30 commits May 18, 2026 04:04
* Add a simple --version flag

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Small code clean-up, less ugly

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Slightly better function names. And use again None

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
* studio: load cached GGUF models when fully offline

When huggingface.co is unreachable, GGUF model loads fail in three distinct
places even though the bits are already in ~/.cache/huggingface/hub. Each
failure has a different surface symptom:

1. list_gguf_variants() raises straight through HTTPException(500), so the
   variant dropdown shows 'Failed to list GGUF variants'.

2. detect_gguf_model_remote() silently returns None after retries fail. The
   caller then treats a GGUF-only repo as non-GGUF and routes it through the
   transformers/MLX path. On Apple Silicon this surfaces as 'Unsloth currently
   only works on NVIDIA, AMD and Intel GPUs.'

3. _download_gguf() loses list_repo_files() to the network and falls back to a
   filename heuristic ('{repo}-{variant}.gguf'). When the repo name does not
   echo the filenames (e.g. repo 'Qwen3.6-27B-MTP-GGUF' contains a file
   'Qwen3.6-27B-UD-Q4_K_XL.gguf' with no MTP), hf_hub_download cannot find
   that invented filename in the cache and aborts.

Fix in three layers:

- list_gguf_variants / detect_gguf_model_remote: honor HF_HUB_OFFLINE and
  fall back to scanning the local HF cache snapshot when the API throws.
  detect_gguf_model_remote still keeps its retry loop for transient flakes;
  the cache fallback only kicks in after every attempt fails.

- _download_gguf: when list_repo_files() fails, look up variant -> real
  filename inside the cached snapshot before resorting to the heuristic.

- llama_cpp.load_model / inference worker startup: when DNS for
  huggingface.co fails (2s probe), set HF_HUB_OFFLINE=1 for the process so
  every hf_hub_download call below resolves from cache instantly instead of
  spending ~25s on five exponential retries.

Online behavior is unchanged: the API is tried first and only used to fail
over. The cache scan is a strict subset of what list_local_gguf_variants
already does today for local paths.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: tighten inline comments on offline GGUF fallback

* studio: address review feedback on offline GGUF fallback

Fixes from the review pass on unslothai#5505:

* ruff F823 (lint CI red): the late `import os` at the bottom of
  LlamaCppBackend.load_model made `os` a function-local name, so my
  new `os.environ` reference at the top of the same method was a
  use-before-bind. Surfaces at runtime as
  'cannot access local variable os where it is not associated with a value'
  and is why the Mac/Windows Studio API jobs were failing too. The
  env-var mutation has been moved into a module-level contextmanager,
  so load_model no longer touches `os` directly.

* Codex P1: cache variant match now uses the relative path, not the
  basename. Layouts like `BF16/foo.gguf` (variant token only in
  parent dir) were silently skipped, falling through to the bogus
  `{repo}-{variant}.gguf` heuristic and failing offline loads of
  models stored under quant-named subdirs.

* Codex P1: HF_HUB_OFFLINE no longer persists past one model load.
  llama_cpp.load_model now uses a contextmanager that probes DNS,
  sets HF_HUB_OFFLINE/TRANSFORMERS_OFFLINE only when DNS is dead,
  and pops them in finally (preserving any prior user setting of
  TRANSFORMERS_OFFLINE). Pre-existing user-set HF_HUB_OFFLINE is
  respected as a no-op. worker.py keeps the startup probe because the
  orchestrator spawns a fresh worker per load -- comment updated to
  make that lifecycle explicit, and a warning is now logged.

* Gemini: cache-dir lookup centralized in `_iter_hf_cache_snapshots`.
  Three near-identical copies (in list/detect helpers and the
  llama_cpp offline scan) now go through one helper.

* Gemini: `huggingface_hub.utils.is_offline_mode` does not exist in
  1.x (verified locally); `huggingface_hub.constants.HF_HUB_OFFLINE`
  is snapshot-at-import-time and does not reflect runtime mutations.
  Manual env-var parsing kept.

* socket probe now saves and restores the prior default timeout
  instead of unconditionally setting None on exit, so it composes
  with caller code that already configured a timeout.

* worker.py probe now logs a warning when offline mode is auto-enabled
  so debugging the case isn't blind.

* studio: regression tests for offline GGUF cache fallback

Lock in the offline fallback path from unslothai#5505 so future refactors can't
silently regress either bug. 26 tests, 0.55 s, no network/GPU/subprocess.

Covers:

* _iter_hf_cache_snapshots: missing cache, missing repo, missing
  snapshots/, newest-mtime ordering, case-insensitive repo match.
* _list_gguf_variants_from_hf_cache and the list_gguf_variants
  online/offline-env/API-exception/reraise paths.
* _detect_gguf_from_hf_cache and detect_gguf_model_remote 3x-fail
  fallback. Pre-existing RepositoryNotFoundError early-return preserved.
* Codex P1 #1 regression: BF16/foo.gguf (quant only in subdir name)
  must resolve via _detect_gguf_from_hf_cache, which now matches the
  snapshot-relative path rather than the basename.
* _probe_dns_dead: returns True/False, restores prior socket timeout.
* Codex P1 #2 regression: _hf_offline_if_dns_dead sets env only inside
  the block, restores on exit (including on exception), re-probes DNS
  on the next call so a transient hiccup cannot lock the long-lived
  LlamaCppBackend singleton offline. Honors a user-set HF_HUB_OFFLINE
  as a no-op. Preserves a user-set TRANSFORMERS_OFFLINE across exit.

Follows the existing studio backend test stub pattern (loggers /
structlog / httpx stubs + backend dir on sys.path).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: extend offline cache fallback to _download_mmproj and quant label

Two follow-up fixes from the review pass on unslothai#5505:

* _download_mmproj() now mirrors _download_gguf()'s offline path:
  when list_repo_files() fails, scan the local HF cache snapshot for
  any GGUF whose basename starts with mmproj-. Without this, offline
  vision GGUF loads succeed at the main weight (the existing PR fix)
  but the mmproj returns None and llama-server starts without vision
  support. Same _iter_hf_cache_snapshots helper, F16 preference and
  fallback to the first match are preserved.

* _extract_quant_label() now considers parent directory segments when
  the basename has no quant token. Layouts like BF16/foo.gguf are
  already documented in this file and are returned by the new
  snapshot-relative-path filter in _download_gguf; before this fix
  their variant label collapsed to "foo" (the last hyphen segment of
  the basename). Regex is the same; the search just walks parent
  segments innermost-first if the basename misses.

Tests (studio/backend/tests/test_offline_gguf_cache_fallback.py):

* TestExtractQuantLabelSubdir: basename quant unchanged, quant-only-
  in-parent, UD- prefix in parent, deeper nesting picks the
  innermost matching segment.
* TestDownloadMmprojOfflineCacheFallback: cache fallback returns the
  mmproj when list_repo_files fails, F16 preference holds when both
  variants are in cache, no-mmproj cache returns None.
* httpx stub now prefers the real package when installed (the CI
  install list already includes it) and falls back to the stub only
  when httpx is genuinely missing. Newer huggingface_hub imports
  HTTPError/Response/Request at module load, so the previous
  fixed-set stub broke when those names were added upstream.

26 existing cases plus 7 new = 33 pass in 0.74s.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix/adjust offline cache + DNS probe per PR unslothai#5505 review

Four review findings tightened, with regression tests:

- list_local_gguf_variants subdir collapse (P1 codex 10:08): pass the
  snapshot-relative path to _extract_quant_label so BF16/foo.gguf and
  Q4_K_M/foo.gguf produce distinct labels instead of folding to the same
  basename pseudo-quant.
- list_gguf_variants cache fallback (P2 codex 12:10): surface
  RepositoryNotFoundError / GatedRepoError / RevisionNotFoundError /
  EntryNotFoundError to the caller instead of masking with stale cache,
  matching detect_gguf_model_remote.
- _detect_gguf_from_hf_cache mmproj (P2 codex 12:10): exclude mmproj
  files from the candidate list so a partial cache with only a vision
  projector cannot route the projector as the main model.
- _probe_dns_dead global timeout (P2 codex 13:06): run the gethostbyname
  on a daemon thread with join timeout so concurrent sockets in the same
  interpreter never inherit a process-wide socket.setdefaulttimeout
  mutation. Same shape applied in worker.py's startup probe.

* Make llama-server health check tolerant of warmup races

Two layered fixes for the Windows GGUF smoke CI Tool calling Tests
flake that exit-22'd on a single httpx.ReadError during llama-server
warmup. The 'windows-latest -> windows-2025-vs2026' image rollout is
hitting main with the identical symptom.

A. _wait_for_health: catch httpx.ReadError, RemoteProtocolError,
   WriteError alongside ConnectError and TimeoutException. A TCP RST
   mid-read while llama-server is still binding the port (WinError
   10054) is a 'still warming up' signal, not fatal. The existing
   _process.poll() check still wins for real crashes.

B. _drain_stdout + spawn: tee llama-server stdout/stderr to a
   per-launch log file at ~/.unsloth/studio/logs/llama-server/
   <port>.log. Any future subprocess crash leaves a forensic trace
   on disk even when Studio's traceback only captures the symptom
   (ReadError) and not the cause. Best-effort: a logging-side OSError
   never blocks the load.

Regression coverage: TestWaitForHealthRetriesOnReadError pins the
retry behaviour for the three new exception types and verifies that a
real process exit still short-circuits the loop.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci(windows): retry inference/load + collect llama-server logs

Composite fix for the Tool calling Tests flake that exit-22'd on a
single httpx.ReadError during llama-server warm-up. The
windows-latest -> windows-2025-vs2026 runner image rollout has been
hitting main with the identical symptom.

- All three jobs (openai-anthropic, tool-calling, json-images) now
  retry POST /api/inference/load up to 3 times with 10s backoff and
  preserve the response body for post-mortem. One transient 500 no
  longer fails the whole job.
- A new "Collect llama-server logs" step copies the per-launch
  llama-server stdout teed by Studio under ~/.unsloth/studio/logs/
  llama-server/ into the workspace, and the upload-artifact step
  now includes logs/llama-server/*.log so any future subprocess
  crash leaves a forensic trace.

---------

Co-authored-by: shimmyshimmer <shimmyshimmer@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
…hai#5486)

* studio: expose launcher capability bits on unauth /api/health

PR unslothai#5375 reduced the unauthenticated /api/health response to {status,
timestamp} only, on the theory that the rest of the payload was useful
fingerprinting. That was too aggressive: the Tauri watchdog reads
`service == "Unsloth UI Backend"` and `studio_root_id` to re-adopt its
own backend across restarts (src-tauri/src/desktop_backend_owner.rs
and commands.rs), and the SPA bootstrap fetches the same payload
unauth to detect chat-only mode and native path lease support before
any token is available (frontend src/config/env.ts and
features/native-intents/use-native-readiness.ts). With the post-unslothai#5375
shape, the watchdog kills its own healthy backend, the SPA never
flips out of "full Studio" mode on chat-only Linux/Windows, and the
About tab shows "dev" in place of the real version.

The actual fingerprint-ish fields are `version` / `studio_version` /
`device_type` (and to a lesser extent the hostname inside
`device_type`). `service`, `studio_root_id` (already a hex digest of
the install path, not the raw path), `chat_only`, the desktop_*
capability flags, and `native_path_leases_supported` do not leak the
install path or version.

This patch keeps the auth gate but rebalances which fields sit on
each side of it:

  unauth      service, studio_root_id, chat_only, desktop_protocol_version,
              desktop_manageability_version, supports_desktop_auth,
              supports_desktop_backend_ownership, native_path_leases_supported,
              desktop_owner (when present)

  authed      + version, studio_version, device_type

Existing must-change-password sessions still fall through to the base
payload because get_current_subject (strict) rejects them; that
matches prior behaviour.

test_middleware.py is updated to pin the new contract: launcher bits
present unauth, fingerprint fields present only with a valid bearer.

* studio: complete launcher-bits health unauth contract on Tauri + About tab

Reviewer follow-ups to the unauth /api/health launcher bits split.

Tauri preflight:
backend_capability_stale_reason() fell through to
backend_version_stale_reason(health.version.as_deref()) when capability
bits were present but version was absent. With the unauth payload now
exposing service + studio_root_id + desktop_* bits but gating version
behind a bearer, the desktop watchdog was reading the new payload,
parsing all capability bits, then classifying the same-root backend
as desktop_backend_version_missing and refusing to adopt it.

A backend that exposes desktop_protocol_version=1,
desktop_manageability_version>=1, supports_desktop_auth=true and
supports_desktop_backend_ownership=true was introduced together with
MIN_DESKTOP_BACKEND_VERSION=2026.5.3 in unslothai#5341, so a present capability
bitset is itself a version-compatibility signal. Skip the version
sub-check when version is None/empty; keep it for non-empty values
so genuinely-too-old backends that do echo a version still get
desktop_backend_version_too_old.

About tab:
fetchStudioVersions() did a bare fetch(apiUrl("/api/health")), which
the unauth payload no longer carries version/studio_version for, so
Settings -> About kept rendering "dev"/"dev" for any logged-in user.
Attach Authorization: Bearer <token> when getAuthToken() returns one;
fall back to bare fetch (still 200, just truncated payload) for the
not-logged-in case. No new endpoint.

Comment:
studio_root_id is no longer a hex digest of the install path; it is
an opaque per-install id written by the launcher. Updated the inline
comment to match.

Test:
  - python -m pytest studio/backend/tests/test_middleware.py::TestHealthAuthGate studio/backend/tests/test_desktop_auth.py -q
    -> 29 passed
  - npm run typecheck clean, npm run build produces fresh dist

* Trigger CI rerun for flaky Mac Chat UI step
…unslothai#5487)

* studio: tighten sandbox blocklist precision (bash, hf upload, NOFILE)

Three precision fixes in core/inference/tools.py. Same security
boundary; fewer false positives that broke legitimate sandbox use.

bash blocklist:
The per-token loop introduced in unslothai#5375 fired on any blocklist word in
any token position, so the entirely benign `grep -r curl .`,
`echo source the data`, and `ls /usr/bin/curl` were rejected with
"blocked command 'curl'". The position-anchored regex already covers
real command-position invocations, including `;rm`, `&&wget`, `$(rm)`,
`<(rm)`, backticked subshells, and `/usr/bin/sudo`. The token loop is
re-scoped: it only fires when the previous shlex token is a shell
separator (or at start of line), so split-quoting obfuscations like
`r''m -rf /` are still caught (shlex collapses them to a single
command-position token) while argument-position blocklist words pass
through. Trailing meta-chars glued to a shlex token (`rm;`) are
stripped before basename matching.

hf upload AST gate:
`_method_call_is_hf_upload` previously matched any method named
`upload_file` / `upload_folder` / `upload_large_folder` / `create_commit`
on any receiver, so paramiko.SFTPClient.upload_file, boto3.create_commit,
and similar non-HF SDK methods were rejected. The fallback now requires
an `import huggingface_hub` / `import hf_api` / `from huggingface_hub
import ...` somewhere in the same module. Fully-qualified
huggingface_hub.upload_file(...) calls are unchanged.

NOFILE env knob:
`RLIMIT_NOFILE = (1024, 1024)` was the only sandbox rlimit without an
env override. 1024 is below Linux's typical soft default and below
what multi-shard safetensors mmap chains need on Llama-3 70B-class
loads. Default is now 16384 with UNSLOTH_STUDIO_SANDBOX_NOFILE, parity
with the other rlimits.

15 new bash-blocklist-position tests pin both the false-positive
fixes and the still-blocked invariants (semicolon, &&, subshell,
backtick, split-quote, /usr/bin/ prefix, nested bash -c).
4 new hf-upload-import-gate tests pin both the false-positive
allowances and that HF-imported uses are still blocked.
1 new pin asserts the NOFILE env var is wired.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: cover command wrappers, find -exec, dynamic HF imports, NOFILE clamp

Reviewer follow-ups to the sandbox blocklist precision change.

Command-position scanner missed Bash command-prefix wrappers and inline
shell assignments. shlex tokenised `env curl`, `time curl`, `nohup rm`,
`FOO=bar curl`, `sudo rm`, etc. with the prefix at command position and
the real command at argument position, so the position-anchored check
returned set() while pre-PR's per-token scan caught them. Likewise the
position-anchored regex requires `^` or a shell separator before the
command, so `env curl` slipped through.

Reworked the scanner to track an expect_command flag plus a
prefix_pending flag:
  - assignments (FOO=bar) keep expect_command=True for the next token,
  - flags ('-oL', '--') keep it intact while prefix_pending is set,
  - numeric duration args ('timeout 1 cmd') skip without breaking
    expect_command,
  - known wrappers (env, command, builtin, exec, time, nohup, nice,
    setsid, stdbuf, timeout, ionice, chroot, sudo, doas, su, xargs)
    set prefix_pending so the wrapper's command is still checked,
  - shell separators now include `{`, `}`, `)`, `then`, `do`,
    `else`, `elif` so brace groups and if/then/while/do bodies are
    recognised as command positions.

Also lex with `shlex.shlex(punctuation_chars=";&|()`")` so split-quote
forms like `echo done; r''m -rf /tmp/x` and `echo done;r''m` tokenise
as `[..., ';', 'rm', ...]` and the command position check fires.

Added a small `find -exec CMD ... ;` / `-execdir CMD ... ;` pass so
`find . -exec rm -f {} +` and friends are caught even though the
direct token is at argument position to `find`.

Dynamic Hugging Face imports were treated as no-HF-in-scope. The
upload-method gate now also resolves `__import__('huggingface_hub')`,
`importlib.import_module('huggingface_hub')`, and bare
`import_module('huggingface_hub')` (via `from importlib import
import_module`) as HF imports, so HfApi().upload_file via dynamic
import is still blocked.

RLIMIT_NOFILE: setrlimit(NOFILE, (16384, 16384)) silently failed if
the parent's hard cap is below the requested value; the broad
except swallowed the OSError and left the sandbox at the parent's
default. Clamp the requested value to the inherited hard limit
before calling setrlimit.

Test cleanup: the existing test_cat_with_word_source_allowed had
`assert ... or True` so it could not fail; rewrote it to assert the
actual return value plus the two membership checks. Added
parametrised coverage for shell prefix wrappers, find -exec / xargs,
brace groups, if/then, while/do, split-quote command-name forms, and
dynamic HF import upload patterns.

Test:
  - python -m pytest studio/backend/tests/test_sandbox_tools.py -q
    -> 90 passed (was 67 before this commit)
  - full studio/backend/tests/ minus llama_cpp_load_progress_live and
    GPU CUDA_VISIBLE_DEVICES tests (pre-existing isolation flake)
    -> 1063 passed

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: catch bare-name HF upload calls in AST gate

`from huggingface_hub import upload_file; upload_file(...)` is a
canonical HF call shape that the previous Attribute-only check missed:
the bare-name call lands as ast.Name (not ast.Attribute), so the
fuzzy gate skipped it.

Extend _method_call_is_hf_upload to also match ast.Name when HF is in
scope. Same import-gating discipline as the Attribute branch, so
paramiko/boto3 and locally-defined `def upload_file(...)` helpers
without HF imports still pass.

Pins: 4 new TestHfUploadImportGate cases (upload_file/folder/create_commit
bare-name imports blocked; local upload_file without HF import allowed).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: scope HF uploads to sandbox-local literals; block env / token leaks

The previous gate dropped every HF upload call. Two refinements make it
precise enough to allow legitimate sandbox->HF uploads while still
catching credential / file exfil:

- path_or_fileobj / folder_path / create_commit operation paths must be
  sandbox-local relative-path literals (no '/', '~', drive letter, or
  '..' segments). Variable / dynamic paths are rejected.

- Any positional or keyword argument that statically resolves to
  os.environ / os.environ.get / os.getenv / bare getenv / subprocess
  shape readers is rejected (env-var exfil).

- token / hf_token / api_token / api_key / auth_token / access_token /
  password / secret kwargs are always rejected; sandbox env strips all
  parent credentials by construction, so any value here is hard-coded
  or lifted.

Recursive subtree walk in _reads_env_or_secret catches wrapper shapes
(str(os.environ), json.dumps(os.environ.items()), etc.).

Add TestSandboxEnvIsolation: pin that _build_safe_env builds the env
from a whitelist, not by stripping. Cover Linux/macOS/WSL/Windows
secret shapes. The whitelist is PATH / HOME / TMPDIR / LANG / TERM /
PYTHONIOENCODING (+ VIRTUAL_ENV / SystemRoot when applicable); HOME
points at the sandbox workdir, so HF / wandb / aws SDKs cannot reach
the operator's ~/.cache credentials.

Test classes added:
- TestHfUploadSandboxLocalPaths (relative literals allowed; absolute,
  drive-letter, '~', '..', mid-path traversal, dynamic vars, and
  open() of unsafe paths blocked, including create_commit recursion).
- TestHfUploadEnvAndSecretLeakBlock (os.environ subscript/get/getenv,
  bare getenv, subprocess.check_output, str(os.environ), token=,
  hf_token=, api_key=, and create_commit operations referencing env).
- TestSandboxEnvIsolation (no parent secret leaks into sandbox env).

131 tests in test_sandbox_tools.py pass.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…ll_id (unslothai#5488)

* studio: scope cancel-cleanup to in-flight tmp dirs; walk back tool_call_id

Two follow-ups to unslothai#5375's training and chat hardening.

_cleanup_cancelled_checkpoints used to rmtree every checkpoint-N
directory on Cancel. That is the opposite of what the user expects.
A user cancelling an 8h run with save_steps=2000 loses every
completed checkpoint they could have resumed from. The 67 MB residue
the audit memo flagged is the HF Trainer atomic-rename partial
(tmp-checkpoint-N), not the completed ones. The cleanup now targets
only tmp-checkpoint subdirs; completed checkpoint-N directories are
user-owned and stay. Symlinked output_dir and symlinked children are
skipped so the realpath containment cannot be levered into deleting
arbitrary content via a symlink trick.

ChatMessage._validate_role_shape stamped a random secrets.token_hex
id on tool messages with no tool_call_id. That id is uncorrelated
with the prior assistant tool_calls id, so strict passthrough
backends (OpenAI, Anthropic) reject the request as orphaned and
llama.cpp treats the tool result as "no preceding call" and
hallucinates. The synthesis moves up to ChatCompletionRequest, where
the whole conversation is visible: for each tool message missing an
id we walk back to the most recent assistant turn with tool_calls
(stopping at user turns), prefer a function.name match, otherwise
take the first unconsumed tool_call. Synthesis is the fallback when
no candidate assistant turn exists, preserving the prior round-trip
guarantee for orphaned tool messages.

Tests:
  - test_cleanup_cancelled_checkpoints.py (new): pins that completed
    checkpoint subdirs survive, tmp-checkpoint partials are removed,
    non-int suffixes (checkpoint-final, checkpoint-best) are left
    alone, output_dir outside outputs_root is refused, symlinked
    output_dir and symlinked child are both skipped, missing dir is
    a no-op.
  - test_inference_model_validation.py: 6 new walkback cases covering
    name-match preference, first-unconsumed fallback, explicit-id
    passthrough, multi-tool-result pairing, synth-on-no-parent, and
    no-cross-user-turn invariant.
  - test_openai_tool_passthrough.py: the two ChatMessage-level
    synth-on-missing tests are rewritten to assert that the per-
    message validator now leaves tool_call_id untouched; resolution
    coverage lives in the request-level tests above.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: explicit tool_call_id reserve, numeric tmp-checkpoint suffix only

Reviewer follow-ups to the training-cleanup + tool_call_id walkback PR.

tool_call_id walkback: a mixed assistant turn with [call_a, call_b]
followed by a tool result that carried tool_call_id="call_a" and a
sibling tool result with no id resolved to ['call_a', 'call_a']
because the explicit id never reserved call_a in the consumed set.
Added a pre-pass over the message list that walks back from every
role="tool" message carrying an explicit id and marks the matching
(asst_idx, tc_idx) consumed, then the missing-id walkback runs against
that pre-populated set. The second result now resolves to call_b.

While here, also harden the function-shape check: if a provider
ships a malformed tool_call where `function` is a string rather than
a dict, the old `(tc.get("function") or {}).get("name")` raised
AttributeError on the string's .get; now isinstance-gated so the
walkback falls through to the fallback id without raising.

Cancel cleanup: `tmp-checkpoint-*` is too broad. HF Trainer's
in-flight partials are always `tmp-checkpoint-<integer-step>`, so
constrain the cleanup regex to `^tmp-checkpoint-\d+$`. A user folder
named `tmp-checkpoint-final`, `tmp-checkpoint-backup`, or
`tmp-checkpoint-user-notes` is now preserved.

ChatMessage docstring still pointed at the pre-PR contract that
required `tool_call_id` on every role="tool" message. Updated to say
missing ids are accepted at message scope and resolved at
ChatCompletionRequest scope. Inline comment above the cancel-cleanup
call now describes the actual behaviour (in-flight tmp partials,
completed checkpoints preserved).

Test:
  - python -m pytest studio/backend/tests/test_inference_model_validation.py
    studio/backend/tests/test_cleanup_cancelled_checkpoints.py
    studio/backend/tests/test_openai_tool_passthrough.py -q
    -> 76 passed (was 67 before this commit; +2 walkback regression
       tests, +1 numeric-suffix preservation test)

* studio: trim verbose comments in cleanup + tool_call_id walkback

Move the HF tmp-checkpoint regex to module scope as a named constant.
Drop the multi-paragraph docstring on _cleanup_cancelled_checkpoints
and the inline call-site rationale; the function name + the test
class already cover the why.

Compress _resolve_missing_tool_call_ids docstring from a six-line
explanation to two. Same logic, fewer in-flow tutorials.

76 tests in cleanup + inference-model-validation + tool-passthrough pass.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…nslothai#5489)

* studio: proxy-aware login rate-limit; allow google favicons in CSP

Two follow-ups to unslothai#5375's auth + headers hardening.

Login rate-limit:
The per-IP bucket keyed on request.client.host alone. Behind any
reverse proxy or shared NAT it lumps everyone together (one user's
typos lock everyone out for 60 seconds; the 429 detail leaked the
proxy/internal IP back to clients). The bucket key is now
(client-ip, username.lower) so:
  - one wrong-password run does not block another user from the same IP
  - one IP does not block the same user from a different IP
The 429 detail body no longer interpolates the IP. Behind a proxy
clients can set UNSLOTH_STUDIO_TRUST_FORWARDED=1 so the limiter
honours X-Forwarded-For / Forwarded; off by default so a direct
caller cannot spoof the header.

CSP img-src:
components/assistant-ui/sources.tsx renders citation favicons from
https://www.google.com/s2/favicons. The current img-src allows
t0..t3.gstatic.com (used for other Google-hosted icons) but not the
main host the favicon URL points to, so every citation icon
CSP-blocks and falls back to gray initials. Adding www.google.com to
img-src is the same shape as unslothai#5409's connect-src HF allowlist fix.

Tests:
  - test_login_rate_limit.py (new): _client_ip respects
    UNSLOTH_STUDIO_TRUST_FORWARDED for X-Forwarded-For and Forwarded;
    bucket key is composed of (ip, lower(username)) and isolates
    cross-user and cross-IP buckets; 429 detail does not contain the
    client IP; Retry-After header preserved.
  - test_middleware.py: new test_img_src_allows_google_favicons pins
    that www.google.com is in the img-src directive and the existing
    gstatic CDNs stay allowed.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: normalise forwarded IPs, IP-wide aggregate cap, unknown-user sentinel

Reviewer follow-ups to the proxy-aware login rate-limit PR.

Forwarded address normalisation: with
UNSLOTH_STUDIO_TRUST_FORWARDED=1, raw `X-Forwarded-For` and
`Forwarded: for=` values such as `198.51.100.7:50001` or
`"[2001:db8::1]:50001"` were carried verbatim into the bucket key,
so one client emitting a fresh source port per attempt split into
many buckets and bypassed _LOGIN_MAX_FAILS. _normalize_forwarded_addr
now strips quotes, optional `[..]:port` for IPv6 and `host:port` for
IPv4, and validates as an IP literal; garbage values fall through to
the direct request.client.host. Forwarded parsing also isolates the
first forwarded-element so a multi-element header cannot create
attacker-controlled bucket strings.

Spray protection: the (ip, username) key removed the aggregate
per-IP throttle the pre-PR limiter provided. A client rotating
nonexistent usernames produced [401, 401, 401, 401, 401, 401] where
pre-PR produced [401, 401, 401, 401, 401, 429]. Restored the
aggregate via a parallel _LOGIN_IP_BUCKETS table (max 30 fails / 60s
per IP) checked alongside the per-(ip, username) bucket; both
buckets must be cleared on a successful login.

Bucket cardinality: every distinct unauthenticated username
allocated a new (ip, username) bucket entry without bound. 1,000
random usernames from one IP produced 1,000 buckets. Failures whose
username does not exist now record into a single sentinel key
(ip, "\x00unknown-user") so cardinality stays at one per IP for the
unknown path. The known-user path additionally enforces a global
hard cap (_LOGIN_MAX_BUCKETS = 4096) that prunes stale empty buckets
on overflow and otherwise folds the failure into the per-IP bucket
only.

Test:
  - python -m pytest studio/backend/tests/test_login_rate_limit.py -q
    -> 19 passed (was 12 before this commit; +5 forwarded-address
       normalisation, +1 sentinel bucket, +1 bucket cap)

CSP comment refreshed to mention `www.google.com` alongside
*.gstatic.com so future readers see why the host is allowlisted.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: tokenise img-src assertion to silence CodeQL substring rule

The new CSP google-favicon test used 'host string in directive string'
which CodeQL flagged as py/incomplete-url-substring-sanitization
(the substring could appear at an arbitrary position in a URL).
The assertion is checking a CSP directive, not URL sanitisation, but
splitting the directive on whitespace and asserting against the
tokenised source list expresses the same intent and matches the
exact CSP source expression. CodeQL no longer treats it as a URL
substring check.

Test: python -m pytest studio/backend/tests/test_middleware.py -q
      -> 14 passed

* studio: use any(src == host) for CSP source asserts

CodeQL's py/incomplete-url-substring-sanitization still flagged the
tokenised "host in img_sources" check. Switching to
`any(src == host for src in img_sources)` makes the comparison an
exact-equality (not substring) match, which the rule does not flag.

Test: python -m pytest studio/backend/tests/test_middleware.py -q
      -> 14 passed

* studio: trim verbose rate-limit + CSP comments

Compress the 6-line constants header on _LOGIN_BUCKETS to 3 lines and
the per-helper docstrings on _trust_forwarded_for / _normalize_forwarded_addr
to one line each. Same code, fewer in-flow tutorials.

Note in the CSP comment that www.google.com is the active favicon host
(used by sources.tsx for s2/favicons citations); *.gstatic.com stays as
legacy faviconV2 coverage but the SPA no longer fetches it.

33 tests in test_login_rate_limit.py + test_middleware.py still pass.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…, current-password input (unslothai#5490)

* studio/frontend: wire logout, singleflight refresh, shared 422 helper, current-password input

Four frontend follow-ups to unslothai#5375 that the train-api fix in unslothai#5409
did not cover.

Log out:
features/auth/api.ts:logout() was a synchronous clearAuthTokens() with
no call to /api/auth/logout, and the SPA exposed no Log out menu item
at all. Refresh tokens stay valid server-side for their entire
lifetime even after the user "leaves". logout() is now async and
POSTs to /api/auth/logout (best-effort, swallows network errors) so
storage.revoke_user_refresh_tokens fires server-side. The account
dropdown in components/app-sidebar.tsx gains a Log out item between
Help and Shutdown that calls logout() then navigates to /login.

refreshSession singleflight:
The backend now consumes the refresh token atomically on
/api/auth/refresh, so two concurrent refreshes race; the loser 401s
and the user is force-logged-out. This reproduces on essentially
every page that fires multiple API calls in parallel after access-
token expiry. refreshSession now holds a module-level inflight
promise: first caller mints it, subsequent callers await the same
one, and the slot clears in finally.

Shared formatDetail helper:
Roland's unslothai#5409 fix lived inside train-api.ts. Other api modules
(chat-api.ts, export-api.ts, history-api.ts, datasets-api.ts,
recipe-studio/api/index.ts) still rendered FastAPI array-detail 422s
as either "Request failed (422)" (chat-api.ts's typeof-string gate)
or "[object Object]" (the others). format-fastapi-error.ts lifts the
helper into one place: formatFastApiDetail unpacks the array,
readFastApiError reads a Response into the best human-readable
string. All five sibling api modules now use it. recipe-studio also
swaps ?? for the helper's truthy-formatted check so an array detail
no longer short-circuits to "[object Object],[object Object]".

Current password input:
features/auth/components/auth-form.tsx in change-password mode
showed only New password and Confirm password; currentPassword
defaulted to window.__UNSLOTH_BOOTSTRAP__?.password. On admin-forced
must_change_password resets the bootstrap is empty and the form
short-circuits with "Unable to initialize setup. Reload the page".
A Current password input is now rendered in change-password mode,
pre-filled from the bootstrap when present so first-boot UX is
unchanged.

Build:
  - npm run typecheck clean
  - npm run build produces a fresh dist
  - install.sh rebuilds dist on next install.sh --local

* studio/frontend: logout refresh-retry, generation guard, two missed 422 sites, password toggle

Reviewer follow-ups to the auth-UX PR.

Logout server-side revoke missed the expired-access case. /api/auth/
logout requires a valid access JWT and only then calls
storage.revoke_user_refresh_tokens(). When the access token had
expired but the 7-day refresh token was still valid, logout() posted
once, got 401, swallowed it, and cleared local state, leaving the
refresh token alive on the server. logout() now retries once: on 401
with a refresh token present, it calls refreshSession() to rotate,
then re-posts /api/auth/logout with the new access token. Both
branches still clearAuthTokens in finally.

In-flight refresh could repopulate localStorage after logout. A
background refreshSession() that started before the user clicked Log
out, but resolved after the local clear, wrote storeAuthTokens()
back over the cleared state and effectively re-authenticated the
SPA. Added a module-level logoutGeneration counter: each refresh
captures the value on entry, logout() bumps the counter in finally
before clearing, and the refresh's continuation drops its new token
pair on the floor when the counter has moved.

Two API client modules kept the pre-unslothai#5409 string-only 422 parser:
  - features/chat/api/providers-api.ts -> parseErrorText now calls
    formatFastApiDetail() so create / update / test / models
    requests surface field-level errors instead of
    "Request failed (422)".
  - features/chat/api/openai-containers.ts -> parseError now uses
    readFastApiError() so ttl_minutes / encrypted_api_key /
    container_id validation errors surface instead of "HTTP 422".

recipe-studio/api/index.ts::uploadUnstructuredFile still had a
local typeof-string detail check on both the 413 and the generic
not-ok branches. Both branches now use readFastApiError() so
array-shaped 422 details show field-level errors instead of a
generic fallback.

Password reveal toggle in change-password mode shared one
showPassword state across Current password and New password, so the
eye button on either field exposed both secrets. Added a separate
showNewPassword state so New password's toggle is independent of
Current password's toggle. Confirm password remains type="password"
unconditionally.

Test:
  - npm run typecheck clean
  - npm run build produces a fresh dist

* studio/frontend: drop dynamic auth/api + auth/session imports in sidebar

Log out's onSelect dynamically imported logout from "@/features/auth/api"
and clearAuthTokens from "@/features/auth/session". Both modules were
already statically imported via "@/features/auth" elsewhere in the app,
so rolldown split auth/session into its own chunk and the main bundle
then re-imported back from that chunk to reach the zustand-backed
usePlatformStore. The resulting circular dependency left session.js's
'create' binding undefined at module init, throwing
'TypeError: t is not a function' from var usePlatformStore=create<...>
on /login, /change-password, and any route that touches the platform
store before the main bundle finished evaluating.

Static-import logout and clearAuthTokens from "@/features/auth" so
both are tree-shaken into the main bundle, eliminating the session
side-chunk and the cycle. Exported clearAuthTokens from auth/index.ts
since it was previously only reachable through the session.ts path
module.

Test:
  - npm run typecheck clean
  - npm run build no longer emits a session-*.js chunk
  - Local Playwright pre/post: /login, /change-password, /chat
    render with 0 page errors on the rebuilt dist
    (pre: 'TypeError: t is not a function' on every route)

* studio/frontend: decouple must_change_password from storeAuthTokens

CodeQL's js/clear-text-storage-of-sensitive-information rule traced
must_change_password through loginWithPassword() into
localStorage.setItem(AUTH_MUST_CHANGE_PASSWORD_KEY, ...) at
session.ts:46 and flagged the line as new high-severity. The flag is
a boolean derived from the same response payload as the access token,
so the data-flow analyser treated it as JWT-equivalent sensitivity.

Removed the third parameter from storeAuthTokens so it only writes
the two JWTs. Each caller (refreshSession, tauri-auto-auth, two
spots in auth-form) now calls setMustChangePassword(...) explicitly
with the boolean. The boolean is no longer reachable from a function
whose name CodeQL treats as a password sink.

Test:
  - npm run typecheck clean
  - npm run build produces no session-*.js side-chunk
  - Local Playwright over /login, /change-password, /chat: 0 page
    errors (parity with the previous fix)

* studio/frontend: suppress CodeQL clear-text-storage on must_change_password flag

CodeQL's js/clear-text-storage-of-sensitive-information rule traces
the must_change_password boolean back through loginWithPassword's
TokenResponse and flags any localStorage.setItem of that boolean as
sensitive-clear-text storage. The value is a status flag (route to
/change-password vs straight to /chat); it carries no credential
material. Decoupling setMustChangePassword from storeAuthTokens in
the previous commit only moved the alert one line over because the
analyser still recognises the source. Add the standard lgtm
suppression comment, with a brief rationale, on the .setItem call.

Test: npm run typecheck clean, npm run build still produces a fresh
dist with no session-*.js side-chunk.

* studio/frontend: encode must_change_password as key presence to silence CodeQL

setMustChangePassword wrote String(required) which is a derivative of
the boolean and which CodeQL's clear-text-storage analyser traces back
through loginWithPassword's TokenResponse, flagging the .setItem call
as sensitive-information storage. Switch the encoding so the stored
value is the literal string "1" when the flag is set, and the key is
removed when not. The reader switches from `=== "true"` to a
presence check (`!== null`).

This breaks the boolean's data flow into .setItem: the value argument
is now a constant string literal in the truthy branch and the falsy
branch issues .removeItem (no stored value to taint). The behaviour
contract is identical (the flag is present iff the user must change
their password).

Test: npm run typecheck clean, npm run build produces a fresh dist,
local Playwright probe over /login, /change-password, /chat: 0 page
errors on the rebuilt dist.

* studio/frontend: trim verbose comments in auth api + session

Compress singleflight + logoutGeneration paragraphs in api.ts from
~9 lines each to ~3. Same logic. Merge mustChangePassword /
setMustChangePassword's separate two-paragraph CodeQL rationales
into one shared comment above both functions.

Typecheck + build still clean.
… a synthetic CI test (unslothai#5376)

* tests/studio: end-to-end Windows GPU detection mock test (unslothai#5106)

Locks in the combined fix from unslothai#5322 + unslothai#5324 with a synthetic
Windows scenario that CI runners without GPUs can execute. The
test packs the real PyPI win_amd64 wheel layouts (cu12 modular and
the new unsuffixed cu13 nvidia/cu13/bin/x86_64 layout) plus the
exact filename set of the upstream b9103 cudart-llama-bin-win-cuda
bundles, then mocks nvidia-smi output and asserts that:

 * Studio's nvidia-smi probe parses the CSV and reports the GPU.
 * After PR unslothai#5322 the install_dir/build/bin/Release/ tree contains
   all three cudart bundle DLLs alongside llama-server.exe.
 * After PR unslothai#5324 the PATH built by start_llama_server's win32
   branch lists pip nvidia + torch/lib dirs in addition to the
   binary_dir.
 * cudart64_X.dll, cublas64_X.dll, and cublasLt64_X.dll are
   each reachable from at least one PATH entry, with cudart
   specifically reachable from BOTH the install dir and a pip
   nvidia dir (defence in depth).
 * Bare venvs without pip nvidia wheels still work via unslothai#5322's
   binary_dir drop; pre-unslothai#5322 installs still work via unslothai#5324's
   PATH augmentation.
 * A reconstructed pre-PR scenario (cudart absent from binary_dir
   and pip dirs not on PATH) leaves cudart unreachable, confirming
   the test would catch a future regression.

Bonus housekeeping in studio/install_llama_prebuilt.py: drop the
pointless f-prefix on the literal "llama-" in the
windows_cuda_attempts pairing guard (no behaviour change; lint
nit flagged in the post-merge review).

The mocks model real artifact contents I verified empirically:
 * pip download nvidia-cuda-runtime --platform win_amd64
   produces nvidia/cu13/bin/x86_64/cudart64_13.dll.
 * unzip on the b9103 cudart-llama-bin-win-cuda-13.1-x64.zip
   produces exactly cudart64_13.dll + cublas64_13.dll +
   cublasLt64_13.dll, no executables.
 * objdump -p on the b9103 ggml-cuda.dll shows a static PE
   import on cublas64_13.dll (the root cause of unslothai#5106 when
   cublas64_13.dll is unreachable).

Refs unslothai#5106 unslothai#5322 unslothai#5324

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test_5106_windows_gpu_detection_mock: don't shadow real httpx

This file's name sorts before every other file in studio/backend/tests/
(starts with the digit '5'), so pytest collects it first. The previous
``sys.modules.setdefault("httpx", _httpx_stub)`` ran before any other
test imported real httpx, which meant the stub permanently shadowed
the real module for the rest of the collection. Tests that did
``from httpx import HTTPError, Response`` (test_anthropic_messages,
test_browse_folders_route, test_training_*, etc) then failed at
collection with ``ImportError: cannot import name 'HTTPError'``
because the stub did not define those names. The existing
test_llama_cpp_windows_nvidia_path.py did not trigger the same issue
because it sorts after test_a* / test_b* / etc, by which point the
real httpx has already been imported and setdefault is a no-op.

Switch the stub installation to ``importlib.util.find_spec(name) is
None`` so we only fall back to the stub when the real module truly is
not installed. Backend CI installs httpx, structlog, and the
studio/backend/loggers package is reachable via the sys.path
augmentation a few lines above, so on CI all three find_spec calls
succeed and no stubs are installed at all.

Also add HTTPError and Response to the stub module for the offline
case, so anyone running this test outside CI with httpx absent still
gets a stub that satisfies the broader test suite's imports.

Refs unslothai#5106

* test_5106 + llama_cpp: extract win32 PATH helper and harden the regression test

Follow-up to PR unslothai#5376's review feedback. Three real findings from the
bot reviewers, plus one stale one.

1. (codex P2 line 201, gemini medium line 209) The regression test's
   _build_path_dirs_like_start_llama_server hand-copied the win32
   branch of LlamaCppBackend.start_llama_server, so a future drop or
   reorder of _windows_pip_nvidia_dll_dirs(sys.prefix) in production
   would have passed the test silently.

   Extract a new staticmethod LlamaCppBackend._build_windows_path_dirs
   (binary_dir, prefix, cuda_path). Production start_llama_server now
   calls this helper. The test's wrapper is reduced to a one-line
   delegate that forwards to the staticmethod, so the regression
   asserts against the exact production logic instead of a parallel
   copy of it.

2. (codex P2 line 245) test_nvidia_smi_probe_reports_synthetic_gpu did
   not clear CUDA_VISIBLE_DEVICES. On a shared GPU runner with the
   variable set in the parent shell, _get_gpu_free_memory() filters
   the mocked CSV and returns [] or falls through to the torch
   fallback. Cleared CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES
   via monkeypatch.delenv(..., raising=False).

3. (codex P2 line 66) _maybe_stub gated on importlib.util.find_spec
   ("loggers"), which returns a spec because studio/backend/loggers/
   is on sys.path. But the actual import chain loads
   loggers/handlers.py which does `from fastapi import Request,
   Response` at module load. In a lightweight env without fastapi
   installed, the stub never lands and `from core.inference.llama_cpp
   import LlamaCppBackend` raises during collection. Switched
   _maybe_stub to a real import attempt under try / except ImportError
   so the stub falls into place when the package is discoverable but
   not importable. CI has fastapi so this is purely a developer-
   machine ergonomics fix.

The fourth comment (codex P1 line 85 "Keep the httpx stub from leaking
across tests") was already addressed by 7437e73, which replaced the
unconditional sys.modules.setdefault with the find_spec-gated
_maybe_stub. No code change needed.

Production behaviour is unchanged: _build_windows_path_dirs returns
exactly the same ordering start_llama_server used inline
([binary_dir, *pip_dirs, cuda_bin?, cuda_bin_x64?]).

Verification (run inside studio/backend):
  pytest tests/test_5106_windows_gpu_detection_mock.py -v
    -> 10 passed
  pytest tests/test_llama_cpp_*.py tests/test_llama_server_args.py
       tests/test_5106_windows_gpu_detection_mock.py -q
    -> 171 passed
  CUDA_VISIBLE_DEVICES=1 pytest tests/test_5106_windows_gpu_detection_mock.py::TestWindowsGpuDetectionAfter5106Fix::test_nvidia_smi_probe_reports_synthetic_gpu
    -> 1 passed

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Rename Windows GPU detection test to a generic filename and trim comments

- studio/backend/tests/test_5106_windows_gpu_detection_mock.py
  -> studio/backend/tests/test_windows_gpu_detection_mock.py
  The file is the generic regression suite for Windows GPU detection;
  encoding the issue number in the filename is noise.
- Shorten module docstring, helper docstrings, per-test docstrings and
  inline comments in the renamed test file. No behaviour change,
  all 10 cases still pass.
- Shorten the _build_windows_path_dirs docstring in
  studio/backend/core/inference/llama_cpp.py and update the test-path
  reference; trim the win32 call-site comment to one line.

Local verification:
- pytest studio/backend/tests/test_windows_gpu_detection_mock.py -- 10 passed.
- pytest studio/backend/tests/test_llama_cpp_windows_nvidia_path.py
  studio/backend/tests/test_llama_server_args.py
  studio/backend/tests/test_windows_gpu_detection_mock.py -- 110 passed.

* Studio: harden _wait_for_health against transient httpx ReadError

The probe loop in LlamaCppBackend._wait_for_health only caught
ConnectError and TimeoutException. On Windows, when llama-server.exe
accepts the TCP probe and then dies before sending HTTP headers, the
peer process RST closes the socket. httpx maps this to ReadError
("WinError 10054 -- An existing connection was forcibly closed by the
remote host"), which fell through the except clause and bubbled out of
_wait_for_health, the routes/inference.py load_model handler, and back
to /api/inference/load as an opaque 500.

The crash diagnostic Studio actually wants to surface lives on the
self._process.poll() branch at the top of the loop body: "llama-server
exited with code X. Output: ...". We never reached that branch on the
WinError 10054 path because the very first probe blew up.

Expand the except to also swallow ReadError and RemoteProtocolError so
the next 0.5-second iteration runs the poll() branch. Outcomes:
  * Process really died: structured exit-code + last-stdout log line.
  * Single transient probe blip: silently retried; load succeeds.

Adds studio/backend/tests/test_llama_cpp_wait_for_health.py with five
cases covering happy-path 200, transient ReadError + dead process,
RemoteProtocolError + dead process, ConnectError cycling until success,
and dead process before the first probe. The new cases would have
failed against the old except clause -- ReadError / RemoteProtocolError
would have propagated instead of returning False.

Found while triaging the Windows Studio GGUF CI flake on this PR's
5a6ddc3 push: llama-server.exe (b9203 prebuilt) crashed within 2.2 s of
launch on the GPU-less runner, and Studio reported "WinError 10054"
instead of an upstream-tag-attributable exit-code line.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: danielhanchen <michaelhan2050@gmail.com>
…#5527)

* Studio: auto-enable MTP speculative decoding for MTP GGUFs

Detect Unsloth's MTP (multi-token-prediction) GGUFs and auto-emit the
right --spec-type draft-mtp flags for llama-server (llama.cpp PR
#22673), so users get the speedup without configuration.

Detection prefers the GGUF metadata field <arch>.nextn_predict_layers
(verified on Qwen3.6-27B-MTP-GGUF / qwen35 and Qwen3.6-35B-A3B-MTP-GGUF
/ qwen35moe). Falls back to a -MTP marker in the identifier / filename
so HF-mode loads can detect MTP from the repo name before the GGUF is
downloaded.

Flag presets follow the Unsloth MTP guide:
  GPU:     --spec-type draft-mtp --spec-draft-n-max 6
  CPU/Mac: --spec-type draft-mtp --spec-draft-n-max 3 \
           --spec-type ngram-mod --spec-ngram-mod-n-match 24 \
           --spec-ngram-mod-n-min 48 --spec-ngram-mod-n-max 6

User overrides win: if the caller passes --spec-type / --spec-default
via unsloth run / unsloth studio run pass-through (or HTTP
llama_extra_args), the auto-emit steps aside so llama-server only sees
the user's flag. Scalar tuning knobs like --spec-draft-n-max compose
with the auto preset via llama-server's last-wins parsing.

_already_in_target_state mirrors the same promotion so a repeat /load
with unchanged settings against an MTP backend running draft-mtp
short-circuits cleanly instead of forcing a reload.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Studio: warn when llama.cpp prebuilt is too old for MTP

Layered on unslothai#5527. Adds a one-shot llama-server --help capability probe
so users get a clear signal when their prebuilt is missing MTP support,
plus a graceful fallback if they load an MTP GGUF against an outdated
binary.

What's surfaced:

1. Startup log + stderr line in main.py:lifespan() if MTP isn't
   advertised:
     WARNING: llama.cpp prebuilt is missing MTP support
     (--spec-type mtp / draft-mtp). Run `unsloth studio update` to
     refresh it. MTP GGUFs will load without speculative decoding.
2. Load-time graceful fallback in load_model's spec block: skip the
   auto-emit and log a clear warning instead of letting llama-server
   fail with an unknown-flag error.
3. /api/inference/status now returns llama_cpp_supports_mtp: bool so
   the frontend can show a banner / popup.

Probe internals:

- Class-level cache keyed on (binary_path, mtime). One subprocess call
  the first time, instant thereafter. Touching the binary (e.g. via
  `unsloth studio update`) invalidates the cache automatically because
  the mtime changes, so the new build is picked up without restarting
  the server.
- Recognises both upstream naming forms: the original draft-mtp from
  llama.cpp PR #22673 and the renamed mtp variant in later commits.
- Spec block uses whichever token the binary accepts so we emit the
  right value regardless of which release the user has.

Tests:

- 6 new cases in test_llama_cpp_mtp_detection.py covering each probe
  variant (draft-mtp, renamed mtp, pre-MTP build, missing binary,
  mtime-based cache invalidation).
- Existing 38 MTP detection cases still pass; broader 188-test
  regression suite (server args, reload inheritance, gguf metadata,
  load progress, context fit, model validation) still green.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…thai#5529)

* Studio: warn when llama.cpp prebuilt is at least 3 days behind

Layered on unslothai#5528. Generalises the MTP-specific staleness warning to
every llama.cpp prebuilt update, not just the ones that add MTP. If
the installed prebuilt is at least 3 days old AND its tag differs
from the latest published tag on the helper release repo (default
unslothai/llama.cpp), Studio nudges the user to run
"unsloth studio update".

How it works

Reads the install marker UNSLOTH_PREBUILT_INFO.json that
install_llama_prebuilt.py already writes to install_dir. The marker
carries the installed tag, the helper repo, and an installed_at_utc
timestamp. Studio compares those against the latest published tag
from the GitHub releases API for the helper repo.

GitHub fetch is cached at two levels:
- Process-level memo for /status hot path.
- Disk-level cache (24h TTL) at ~/.unsloth/studio/cache/llama_cpp_freshness/
  so cold-start Studio launches do not always hit the API.

On a transient fetch failure (offline, rate-limited) we keep the
last-good disk value alive rather than poisoning the cache with None.
The check fails open: if anything is missing (marker, timestamp,
GitHub response), stale stays False so users never see a misleading
banner.

Surfaced in two places

1. Startup banner (logs + stderr) in main.py:lifespan(), alongside the
   MTP capability probe added in unslothai#5528. Single line, e.g.:
     WARNING: llama.cpp prebuilt is 5 days behind: installed b9190,
     latest b9300. Run "unsloth studio update" to refresh.

2. /api/inference/status now returns:
     llama_cpp_prebuilt_stale: bool
     llama_cpp_installed_tag:  str | None
     llama_cpp_latest_tag:     str | None
   so the frontend can render a banner / popup with the actual tag
   delta the user is missing.

3-day threshold

Mirrors the typical Unsloth llama.cpp release cadence. Anything
shorter would nag users who restart Studio at the wrong moment;
longer leaves real bugs sitting on the user's machine. Configurable
via the threshold_days kwarg if a future call site wants a different
window.

Tests

17 new cases in tests/test_llama_cpp_freshness.py cover marker
discovery in both cmake and root install layouts, missing / invalid
marker, GitHub fetch caching across process restarts (disk cache hit
after the in-memory cache is reset), the stale / not-stale decision
matrix (tag mismatch + age threshold), fail-open behaviour when
GitHub is unreachable, custom threshold, singular/plural day in the
warning string, and unparseable installed_at_utc. The broader
205-test inference regression suite still passes.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…unslothai#5512)

* studio: extend offline DNS auto-detect to inference parent + training

unslothai#5505 fixed the GGUF/llama-server load path. Studio still has two
adjacent code paths that burn ~30-60s of soft-failed timeouts before
the worker subprocess starts when DNS to huggingface.co is dead and
the model is already in the local HF cache.

Inference parent process (routes/inference.py:load_model):

* ModelConfig.from_identifier now runs inside _hf_offline_if_dns_dead
  so the LoRA-detect hf_model_info call and the urllib config probes
  in utils/transformers_version.py short-circuit when DNS is dead.
* utils/models/model_config.py: extracted the inline HF_HUB_OFFLINE/
  TRANSFORMERS_OFFLINE check used by list_gguf_variants and
  detect_gguf_model_remote into a shared _env_offline() helper, then
  reused it to gate the LoRA-detect hf_model_info call.
* utils/transformers_version.py: _check_tokenizer_config_needs_v5 and
  _check_config_needs_550 now early-return False when offline instead
  of issuing a 10s urllib.urlopen against huggingface.co/raw/main.

Training worker (core/training/worker.py:run_training_process):

* Add the same 2s DNS probe used by core/inference/worker.py at the
  top of the training subprocess. On failure, set HF_HUB_OFFLINE,
  TRANSFORMERS_OFFLINE, and HF_DATASETS_OFFLINE before the rest of
  the subprocess imports torch/transformers/unsloth, so every
  from_pretrained, snapshot_download, and load_dataset call below
  resolves from cache. Scope is per-subprocess; the orchestrator
  always spawns a fresh worker per training run.

Training trainer (core/training/trainer.py:load_model):

* Skip the proactive hf_model_info gated-repo probe when _env_offline()
  is true. The API is unreachable anyway, and a gated model that is
  already cached is exactly the scenario the user is trying to train
  against. from_pretrained surfaces the real error if access is
  actually denied.

Tests (tests/test_offline_inference_parent.py, 7 new cases):

* _env_offline truthy/falsy parsing across HF_HUB_OFFLINE and
  TRANSFORMERS_OFFLINE.
* transformers_version urllib short-circuit when offline.
* LoRA detect hf_model_info skip when offline.

Existing tests/test_offline_gguf_cache_fallback.py still passes
(26 cases) because the inline env check was extracted, not changed.

* tests: prefer real httpx over stub in offline-test files

The studio test stub convention only included the 6 httpx exception
names that existed callers needed. Newer huggingface_hub (1.15+)
imports HTTPError, Response, Request, HTTPStatusError, AsyncClient,
and more at module import time. When httpx is truly absent the stub
chase becomes a treadmill.

Use the real package when installed (the CI install list already
includes httpx, so this is the production environment). Fall back to
the stub only when httpx is genuinely missing.

No code under test changes.

* studio: detect cached LoRA adapters offline; tighten test

Two follow-ups from the review pass on unslothai#5512:

* ModelConfig.from_identifier no longer skips the remote LoRA-detect
  hf_model_info call when _env_offline() is true. huggingface_hub
  short-circuits the call via OfflineModeIsEnabled in ~0ms when
  HF_HUB_OFFLINE is set, so the original 25s concern was moot once
  routes/inference.py wrapped the call in _hf_offline_if_dns_dead.
  Skipping the API meant users with a cached LoRA adapter
  (adapter_config.json on disk) got is_lora=False and the load
  failed. After the API call (which raises fast offline) a new
  cache-fallback walks the HF cache snapshot for adapter_config.json
  via the existing _iter_hf_cache_snapshots helper.

* test_hf_model_info_not_called_when_offline replaced. The old test
  raised AssertionError inside production code that catches Exception,
  so it passed even if the call happened. New tests use MagicMock and
  assert call_count >= 1, plus a fixture that stages a fake HF cache
  with adapter_config.json to verify the offline cache detection.

Test count goes from 7 to 8 in test_offline_inference_parent.py.
Combined with test_offline_gguf_cache_fallback.py: 34 pass in 9.75s.

* Fix/adjust offline training DNS probe per PR unslothai#5505 review

Same fix as unslothai#5505's _probe_dns_dead refactor: run gethostbyname on a
daemon thread with join timeout so concurrent sockets in the parent
interpreter never inherit a process-wide socket.setdefaulttimeout
mutation. Adds a static-pin regression test that the inference parent
file does not regress on this.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Trim verbose code comments per review feedback

Shorten the longer explanatory comments added by this PR while keeping
the WHY of each non-obvious branch:

- trainer.py: collapse the 5-line proactive gated-check comment.
- training/worker.py: trim the offline auto-detect preamble and the
  "logger isn't configured" note.
- routes/inference.py: shorten the DNS-probe wrap rationale.
- transformers_version.py: collapse the two urllib short-circuit notes.
- model_config.py: shorten the LoRA detect + cache-fallback notes.
- tests/test_offline_inference_parent.py: tighter module docstring,
  trim class docstrings, drop multi-line explainer comments inside the
  tests; behaviour and coverage unchanged (9/9 tests still pass).

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix ORPO text tokenization with processors

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Guard ORPO tokenizer rewrite anchor

* Resolve processor pad_token_id and preserve preference data collators for ORPO

Two follow-ups so the text-only ORPO + VL processor path works end to end on
top of the build_tokenized_answer and tokenize_row rewrites:

1. Add orpo_trainer_processor_pad_token to rewrite processing_class.pad_token_id
   in ORPOTrainer.__init__ to fall back to processing_class.tokenizer.pad_token_id
   when the processor itself has no pad_token_id (Qwen3-VL, Gemma-3, etc.).
   Without this, DPODataCollatorWithPadding(pad_token_id=processing_class.pad_token_id)
   raises AttributeError before training starts.

2. Stop the outer UnslothORPOTrainer.__init__ collator-swap from clobbering
   DPODataCollatorWithPadding when the tokenizer is a processor without .pad.
   The swap to TransformersDataCollatorForLanguageModeling is now only applied
   to LM-style collators, so ORPO/DPO/CPO/KTO keep their own prompt/chosen/
   rejected handling. Otherwise the collator can't pad ORPO rows and raises
   "You should supply an encoding ... that includes input_ids" at train time.

Verified with Qwen3-VL-2B-Instruct ORPO + text-only data (training completes
to max_steps, no AttributeError, no collator error) and Llama-3.2-1B-Instruct
ORPO (losses and grad-norms bit-exact identical to main, so the change is a
true no-op for plain text tokenizers).

Extends tests/python/test_orpo_processor_text_tokenizer.py with three new
unit tests covering the pad_token_id rewriter.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
… Ubuntu 24.04 (unslothai#5517)

* fix(studio/worker): inject --gcc-install-dir for HIP source builds on Ubuntu 24.04

On Ubuntu 24.04 + ROCm clang-20, the HIP source-build fallback in
`_install_package_wheel_first` (causal-conv1d, mamba-ssm source fallback,
flash-attn source fallback) dies at:

  /opt/rocm-X.Y/lib/llvm/lib/clang/20/include/__clang_hip_runtime_wrapper.h:112:10:
    fatal error: 'cstdlib' file not found

Root cause: clang-20 picks the highest-numbered /usr/lib/gcc/x86_64-linux-gnu/<N>
runtime dir by default. On 24.04 that's gcc-14, whose runtime objects ship in
the gcc-14 package but whose C++ headers (/usr/include/c++/14) come from
libstdc++-14-dev — NOT in the default apt set. libstdc++-13-dev IS in the
default set, so /usr/include/c++/13 exists. clang has no way to discover
that asymmetry and the build fails.

Fix: new `_hipcc_gcc_install_dir()` helper iterates gcc 14 → 11 and returns
the first /usr/lib/gcc/x86_64-linux-gnu/<N> dir where BOTH the runtime AND
/usr/include/c++/<N> exist. The HIP branch of `_install_package_wheel_first`
appends `--gcc-install-dir=<that path>` to HIPCC_COMPILE_FLAGS_APPEND before
invoking pip. Respects an existing `--gcc-install-dir` in the env var
(user-set takes precedence); preserves any other flags the user has set
(appends to the end rather than overwriting). No-op on non-HIP, non-Linux,
non-x86_64.

Mirrors the same fix bbf004c added to studio/setup.sh for the llama.cpp HIP
build branch (unslothai#5301), but via env var since pip-driven source builds can't
take CMake flags directly.

Verified on Ryzen AI MAX+ 395 / Radeon 8060S (gfx1151) / Ubuntu 24.04 /
ROCm 7.13 nightly: `_hipcc_gcc_install_dir()` returns
`/usr/lib/gcc/x86_64-linux-gnu/13`, which matches the manual workaround
that already lets `pip install causal-conv1d` succeed on this hardware.

Tests added (8 new in test_training_worker_flash_attn.py):
- test_hipcc_gcc_install_dir_picks_highest_with_headers
- test_hipcc_gcc_install_dir_picks_14_when_headers_exist
- test_hipcc_gcc_install_dir_returns_none_when_no_match
- test_hipcc_gcc_install_dir_returns_none_on_non_linux
- test_hipcc_gcc_install_dir_returns_none_on_non_x86_64
- test_install_injects_gcc_install_dir_on_hip_source_build
- test_install_appends_to_existing_hipcc_compile_flags
- test_install_respects_user_gcc_install_dir
- test_install_does_not_inject_env_on_cuda

Per @danielhanchen's suggestion in
unslothai#5434 (comment)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review: apply gemini-code-assist suggestion on _run_kwargs env handling

Use _run_kwargs.get("env", os.environ).copy() + key-mutation instead of
rebuilding env from os.environ directly. Today both forms are equivalent
(no earlier code in _install_package_wheel_first sets _run_kwargs["env"]),
but the .get().copy() pattern survives any future env modification added
upstream of this block without silently throwing it away.

No behavioural change; tests already assert the final HIPCC_COMPILE_FLAGS_APPEND
value, not the env-construction pattern.

Per unslothai#5517 (comment)... (gemini-code-assist[bot])

---------

Co-authored-by: h34v3nzc0dex <h34v3nzc0dex@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* Studio: gate image input on a usable mmproj for GGUF vision models

* Improve image gating and model capability sync

Tighten image-handling and model capability syncing across the chat flow. Key changes:

- chat-adapter: Replace per-message current-user image check with a simpler gate that blocks if ANY image is present in the outbound payload when the selected model cannot handle vision. Show the toast reason and flip the per-thread running flag on→off to avoid hanging wait promises before throwing.

- shared-composer: Simplify and correct image-attachment gating for single vs compare modes. Use an attach-time gate that defers to send/ensureModelLoaded in compare mode, introduce attachUnavailableReason, and only block immediately for single-mode. Remove an unused models selector.

- shared-composer: Sync the runtime models[] entry with the response from ensureModelLoaded so UI/send gates read fresh capabilities (isVision, isGguf, isAudio, audioType, hasAudioInput). This addresses catalog lag (e.g., GGUF mmproj arriving after the catalog snapshot).

- UX tweak: the file-picker button no longer outright blocks on image availability; addFiles still filters images per-file and toasts appropriately.

These changes prevent mid-stream server rejections, avoid deadlocks, and ensure model capability checks are accurate when attaching images or audio.

* studio: only pass --mmproj to llama-server when effective_is_vision

When a text-only GGUF (static is_vision=False) was paired with a
family-matching mmproj path, the launcher appended both --mmproj and
--spec-default, leaving llama-server in an inconsistent state while
Studio reported is_vision=False. Gate the --mmproj flag on
effective_is_vision so the launch command tracks the runtime
capability the rest of Studio sees.

* studio: reject image content in streaming /v1/responses for non-vision GGUF

_responses_stream forwards the OpenAI request body directly to
llama-server's /v1/chat/completions, bypassing the image-vs-vision
guard that openai_chat_completions enforces for the wrapped path.
Add the same check at the top of the streaming entry point so an
SDK client that posts an image to a non-vision GGUF receives a
typed 400 instead of an opaque downstream error.

* studio: gate external chat providers in the image input helper

External selections (cohere, deepseek, mistral, openrouter, ...) live
in externalProviders, not in runtime.models[], so activeModel is
undefined for them and the helper short-circuited to allow. Result:
images attached to a non-vision external chat model were dropped
silently downstream instead of rejected up front.

Add providerTypeSupportsVision to external-providers.ts (false for
known text-only providers, true for known vision-capable ones, null
for unknown / custom self-hosted) and thread externalSupportsVision
+ externalModelLabel through the helper. shared-composer.tsx,
runtime-provider.tsx (VisionImageAdapter.add), and chat-adapter.ts
pre-stream gate all resolve the provider type and pass it.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…othai#5496)

* studio/install: fix mac desktop shortcut spawning and lifecycle

The macOS .app generated by install.sh ships a shell-shim wrapper that
is unsigned and has no NSAppleEventsUsageDescription in its Info.plist,
so AppleEvents from the bundle are denied by TCC. The launcher's
`osascript ... tell application "Terminal" to do script ...` call
silently fails and the script falls back to the headless nohup branch,
where the user sees no Terminal window at all. Each click of the Desktop
shortcut then leaks an unattached server (no PID file, no cleanup) and
the launcher times out after 60s without ever opening a browser.

Replace the AppleScript spawn with a `.command` file + `open -a Terminal`.
Terminal handles `.command` natively through Launch Services, no
AppleEvents permission required, works with unsigned bundles.

The new design also decouples the studio server from the Terminal:

- Server is started via nohup, detached from any TTY. Warm relaunches
  (server still alive) hit the existing fast path: the launcher's
  `_find_healthy_port` returns the running port and the browser opens
  in ~80ms with no Terminal involvement.
- The `.command` file is a log viewer (`tail -F` of studio.log), not
  the server's parent. It also runs a watcher subshell that polls the
  server PID and kills `tail` when the server exits. This means
  clicking "Stop server" in the UI causes the Terminal window to drop
  to no-running-processes state, so the user can close the window
  without the "Do you want to terminate running processes" dialog.
- A trap on HUP/INT/TERM/EXIT in the `.command` file sends SIGTERM
  (then SIGKILL at +0.5s) to the server PID, so closing the Terminal
  window also stops Studio. Best of both worlds: fast warm relaunch
  AND "close terminal == quit Studio".

Also:

- Drop POLL_INTERVAL_SEC from 1 to 0.25. With Python studio startup
  at ~2s, the 1s poll added up to 1s of slack between server-ready
  and browser-open. 0.25s tightens cold-launch latency at no
  meaningful CPU cost.
- Refuse to install the `.app` bundle through a symlink. If a prior
  install (e.g. a --tauri build) left $HOME/Applications/Unsloth\\ Studio.app
  as a symlink, mkdir -p follows it and writes the new bundle contents
  through to the target. Detect and rm the symlink before mkdir -p.

Test plan:
- Existing studio-mac-update-smoke.yml CI runs install.sh end-to-end
  on macos-14 and asserts /api/health returns healthy.
- Manual: click Desktop shortcut from cold state, Terminal opens with
  logs streaming, browser opens at ~2s. Re-click while Studio still
  running, browser opens in <200ms, no new Terminal. Click "Stop
  server" in the UI, Terminal closes cleanly with no prompt. Close
  Terminal via Cmd+W, server stops within 1s.

* studio/install: trim verbose comments in _spawn_terminal

* studio/install: harden trap quoting in generated .command

The trap bodies in the .command file were written with broken
quoting:

  trap "rm -f "$PID_FILE" 2>/dev/null" EXIT

Shell parses this as three concatenated tokens ("rm -f " + unquoted
$PID_FILE + " 2>/dev/null") then runs the trap. With paths that
contain spaces, the unquoted expansion word-splits and the rm
either no-ops or removes the wrong path. Default $HOME has no
spaces so the bug is latent, but it should be space-safe.

Switch both trap bodies to single-quoted form so $WATCHER_PID,
$TAIL_PID, and $PID_FILE expand at signal time inside properly
quoted positions. Shellcheck-clean on the generated .command.

* studio/install: exec studio in nohup wrapper so PID is the server

Without the explicit exec, `nohup sh -c "$_cmd"` runs `_cmd` as a
child of the wrapper shell. Whether sh exec-optimizes that single
command is shell-specific (macOS /bin/sh does, dash does, some bash
configurations do not). When the optimization does not fire, `$!`
records the wrapper PID rather than the studio PID, so:

- the watcher in the generated .command monitors the wrapper, not
  the actual studio process; closing the Terminal can leave studio
  running if the wrapper exits first
- SIGTERM from shutdown_studio goes to the wrapper rather than the
  server

Force the replacement with exec so the recorded PID is always the
studio process regardless of shell version.

Flagged by both gemini-code-assist and codex in PR review; verified
correct.

* Fix orphan-on-spawn-failure, graceful kill, and nested symlink for PR unslothai#5496

Three issues found while testing the new macOS spawn path:

1. _spawn_terminal returned 0 even when 'open -a Terminal' failed, so
   the nohup'd server was left orphaned with no Terminal owner. Wrap
   the .command write + chmod + open chain in 'if {...}; then return 0;
   fi', and on failure SIGTERM the orphan (with a 3s grace) before
   falling through to the generic terminal-spawn fallback.

2. The generated .command sent SIGKILL only 0.5s after SIGTERM, shorter
   than studio/backend/run.py's _graceful_shutdown windows (5s inference
   + 5s export). Wait up to 12s for the server to exit on its own.

3. The .app symlink guard only checked the top-level path. If a prior
   corrupted install left Unsloth Studio.app/Contents (or its MacOS or
   Resources children) as a symlink, mkdir -p still wrote through them.
   Check all four bundle paths, and refuse to continue if the bundle
   path exists as a regular file.

---------

Co-authored-by: Daniel Han <info@unsloth.ai>
* studio: add uninstall.sh and document it in README

The current uninstall guidance in README.md is `rm -rf ~/.unsloth/studio`,
which leaves behind everything that lives outside that path:

  - ~/.local/share/unsloth/ (launcher script, studio.conf, studio.log,
    icon assets)
  - ~/Applications/Unsloth Studio.app (macOS bundle, orphaned and
    pointing nowhere on next reinstall)
  - ~/Desktop/Unsloth Studio (broken symlink after the bundle is gone)
  - ~/Desktop/unsloth-studio.desktop (Linux)
  - ~/.local/share/applications/unsloth-studio.desktop (Linux)
  - /tmp/unsloth-studio-launcher-<uid>*.lock (lock dir, possibly stale)
  - Launch Services cache entry for ai.unsloth.studio on macOS
  - Any running `unsloth studio -p N` processes

Users who follow the documented uninstall and reinstall end up with the
new launcher layered on top of stale state from the previous install,
which has produced concrete bugs (e.g. self-referential symlink inside
the .app bundle after a reinstall over leftover state).

Add uninstall.sh at the repo root that handles all of the above, and
update README.md to point at it as the recommended path. The plain
`rm -rf ~/.unsloth/studio` line is kept as a "partial uninstall, keep
launcher for a later reinstall" alternative. The model cache at
~/.cache/huggingface is intentionally left untouched, with a note in
the script suggesting how to remove it if desired.

Script is POSIX sh, idempotent (every removal is gated on existence
and uses `2>/dev/null || true`), and handles macOS, Linux, and WSL.
Windows is intentionally not covered here; the existing PowerShell
Remove-Item line in README is kept for that.

* studio: trim uninstall.sh header

* studio: address PR review feedback on uninstall.sh

Four findings from automated review, all verified real:

1. pkill pattern only matched `-p N`, not `--port N`. Studio
   instances launched with the long option form survived the
   uninstall. Fix: run two pkill passes, one for each form, with
   `[ =]` covering both space and `=` separators.

2. CLI shim at ~/.local/bin/unsloth (symlink into the venv created
   by install.sh:2167) was left behind, becoming a broken symlink
   after the venv directory is removed. Fix: add it to the removals.

3. Custom install roots via UNSLOTH_STUDIO_HOME / STUDIO_HOME were
   not removed. install.sh records the install location in
   ~/.local/share/unsloth/studio.conf as UNSLOTH_EXE; parse it,
   derive the root as three dirnames up, and remove the root if it
   is non-default.

4. On WSL the installer creates 'Unsloth Studio.lnk' on the Windows
   Desktop and Start Menu Programs folder via powershell.exe.
   Mirror that path on uninstall by invoking powershell.exe to
   Remove-Item the same two locations. Best-effort, gated on
   powershell.exe being available.

Tests (T2.8b, T2.15, T2.16, T2.17, T2.18, T2.5b) added behind the
scenes; all pass on macOS Darwin 25.3 with `dash -n`, `sh -n`,
shellcheck-clean (SC2016 suppressed on the PowerShell single-quoted
heredoc since the $env: expansions must remain literal to the
shell so PowerShell receives them verbatim).

* studio: harden uninstall.sh against env-mode and shim collisions

- Honor UNSLOTH_STUDIO_HOME / STUDIO_HOME at uninstall time and read
  env-mode studio.conf at $<root>/share/studio.conf, not just the
  default-mode conf under $HOME/.local/share/unsloth/. Without this,
  installs done with a custom STUDIO_HOME leak the install tree even
  when the env var is re-exported.
- Guard the custom-root resolver against "/" and empty so a corrupted
  studio.conf (UNSLOTH_EXE='/etc/passwd' or similar) or an
  UNSLOTH_STUDIO_HOME=/ cannot trick the script into rm -rf'ing root.
- Only remove $HOME/.local/bin/unsloth when it is a symlink resolving
  to a Studio venv. pyproject.toml declares unsloth as a console
  script, so pip install --user unsloth places a regular file at the
  same path; the previous unconditional rm wiped that unrelated CLI.
- When neither env var is set, print a tail hint so users with custom
  install roots know to re-run with the variable.

Verified with a sandboxed harness covering 24 scenarios (default and
env-mode installs across macOS / Linux / WSL, idempotency, hostile
lockfile names, path-traversal attempts, malformed conf, pkill long
and short forms, pip-conflict shim, broken-symlink bundle path).
Script remains POSIX (shellcheck -s sh clean, runs under /bin/dash).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Refuse non-Studio uninstall roots and tighten process matching for PR unslothai#5497

Three issues found while testing custom-root paths and process cleanup:

1. UNSLOTH_STUDIO_HOME=$HOME sh uninstall.sh rm -rf'd $HOME (same for
   STUDIO_HOME and parent-of-$HOME). install.sh accepts any writable
   directory for STUDIO_HOME, so the uninstaller must validate ownership
   before deletion. _is_studio_root accepts a candidate root only if it
   contains share/studio.conf, an unsloth_studio/ directory, or a
   bin/unsloth shim pointing into unsloth_studio/bin. _is_unsafe_root is
   a defense-in-depth deny list (/, $HOME, $HOME's parent, system paths).

2. pkill -f patterns "unsloth studio.*-p[ =][0-9]" over-matched on argv
   substrings. A user running `less notes.md` whose filename contained
   "unsloth studio ... -p N" had their less killed. New patterns anchor
   on /unsloth_studio/bin/ so only processes whose actual exe lives in a
   Studio venv match.

3. pkill missed processes that exec into studio/backend/run.py --port N
   (the post-exec form when the unsloth CLI replaces itself). Added a
   third pattern for that shape, and prefer PID files written by
   install.sh's _spawn_terminal (studio-$port.pid in DATA_DIR) over
   argv matching for installs that have them.

* Tighten ownership guards from review round for PR unslothai#5497

Three findings from the second reviewer round:

1. _is_studio_root accepted any directory containing an unsloth_studio/
   subdir as Studio-owned. A user workspace that happens to contain a
   folder named unsloth_studio/ would be deleted. install.sh's env-mode
   guard at install.sh:1358-1361 already requires .unsloth-studio-owned
   before treating the venv as replaceable. Mirror that: require the
   owner marker, share/studio.conf, or the bin/unsloth shim target.

2. The pkill -f fallback patterns were global, so uninstalling install A
   would also kill install B's running server. Scope each pattern to the
   actual install root being removed by interpolating the root path into
   the regex. Also adds a third pattern shape for `unsloth studio` with
   no -p / --port flag (the CLI default-port form).

3. Desktop/Unsloth Studio is created by install.sh as a symlink to the
   .app bundle. If a user has a regular directory by that name (photos,
   notes, etc.), the previous _remove_path call rm -rf'd it. Now we only
   remove it when it is a symlink or does not exist.

* Canonicalize env roots and honor UNSLOTH_STUDIO_HOME precedence for PR unslothai#5497

Two findings from the latest review round:

1. Canonicalize env-derived roots before the safety check. The deny list
   only string-compares against $HOME, so a syntactic variant like
   UNSLOTH_STUDIO_HOME=$HOME/../$USER (or trailing slash, or relative
   path) bypassed _is_unsafe_root even though it resolves to $HOME. Now
   _emit runs CDPATH= cd -P -- + pwd -P first, so all variants normalize
   to the same canonical path before the deny check. Also added the same
   tilde expansion install.sh's _resolve_studio_destinations does.

2. Mirror install.sh's env-var precedence (install.sh:282-290). When
   both UNSLOTH_STUDIO_HOME and STUDIO_HOME are set, install.sh resolves
   only UNSLOTH_STUDIO_HOME and ignores STUDIO_HOME. Uninstall was
   emitting both, so running uninstall.sh for install A would also
   delete install B if the user had a stale STUDIO_HOME pointing at B.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Daniel Han <info@unsloth.ai>
…i#5536)

* Studio update CI: round-trip install -> update -> uninstall

Adds an "Uninstall and verify clean" step to the three existing
studio-{,-mac-,-windows-}update-smoke.yml workflows so each one ends by
running uninstall.sh / uninstall.ps1 against the install it just
produced, then asserting that the install dir, launcher data dir,
desktop shortcut, CLI shim (and on Mac, the .app bundle) are all gone.
Two trailing reruns confirm idempotency. The uninstall log is added to
the existing artifact bundle.

Catches regressions where install.sh / install.ps1 starts writing to a
new path (registry key, Start Menu entry, %APPDATA% subdir, etc.) and
uninstall.{sh,ps1} has not been updated to match. Safety-guard
scenarios (refuse-\$HOME, refuse-non-Studio, tilde expansion, etc.) are
intentionally NOT exercised here -- those belong in a dedicated fast
smoke job that does not have to wait on a 5-15 min install.

Wall-clock overhead is ~30-45 s on each runner. Path filters extended
to include uninstall.sh / uninstall.ps1 so a pure uninstaller change
also triggers the round-trip check.

* Skip round-trip step when uninstall.{sh,ps1} are not in tree

---------

Co-authored-by: Daniel Han <info@unsloth.ai>
…unslothai#5518)

* studio: register /settings route that opens the settings dialog

Navigating to /settings used to render Not Found because the route
was never registered. The settings dialog only opened via the user
menu, so /settings was a broken deep link if shared. Add a route
that calls useSettingsDialogStore.openDialog() and redirects to the
post-auth landing page so the modal appears on top of the chat.

* studio: harden Connections dialog provider sync and allow manual model IDs

Two related fixes for the Connections panel.

1. Keep localStorage providers when the server returns an empty list.
The dialog used to sync from /api/providers/ on mount and unconditionally
overwrite the Zustand provider store with the server result. When the
server had no enabled configs but the local store had entries (legacy
users, fresh dev installs, or providers created via earlier paths),
opening the dialog silently wiped them. The model picker reads from the
same store, so the chat header reverted from 'gpt-4o . OpenAI' to the
raw 'external::openai-1::gpt-4o' key. Treat the server as authoritative
only when it actually has rows; otherwise keep the local view.

2. Accept manual model IDs alongside the live catalog for remote-mode
providers (DeepSeek, OpenAI, etc.). Previously the only way to save was
to load the available-models catalog via a live API call, which fails
in air-gapped setups, behind 502s, or when the user already knows the
exact model ID. Add a Textarea fallback in the same render block, and
relax the validation to accept manual IDs even when availableModels is
empty. The validation message now points users at the manual path.

* studio: restrict manual model ID entry to openrouter among remote providers

Address review feedback: major remote providers (openai, anthropic,
gemini, mistral, cohere, deepseek, ...) expose large per-model
parameter surfaces that differ across models, so accepting pasted
model IDs leads to mismatched parameter expectations and frustrating
runtime errors. Keep their catalog curated by hiding the manual
textarea and falling back to the prior 'Load available models first'
validation toast for them.

OpenRouter drops unsupported parameters server-side, so manual entry
remains useful there; keep the textarea and the union save path for
it. Custom and curated backends already gated via isCustomProvider /
isCuratedModelList and continue to require manual entry as before.

* studio: shorten code comments in chat-providers-dialog.tsx

Trim three multi-line comment blocks to single lines per review.
* studio: add uninstall.ps1 and document it in README for Windows

The previous Windows uninstall guidance was Remove-Item -Recurse -Force on
$HOME\.unsloth\studio, which only deletes the install dir and leaves
behind:

  * %LOCALAPPDATA%\Unsloth Studio                 (data dir)
  * Desktop\Unsloth Studio.lnk                    (Desktop shortcut)
  * %APPDATA%\Microsoft\Windows\Start Menu\Programs\Unsloth Studio.lnk
  * Custom UNSLOTH_STUDIO_HOME / STUDIO_HOME roots
  * Running unsloth_studio venv processes
  * User PATH entry under .unsloth\studio
  * HKCU\Software\Unsloth\PathBackup

This script mirrors uninstall.sh for Windows. It stops listening backends
by reading the port from share\studio.port (with a Win32_Process sweep
anchored on \unsloth_studio\ as a fallback), removes the install dir,
data dir, both shortcuts, the Studio PATH entry, and the PathBackup
registry key. Custom roots discovered from env vars or share\studio.conf
are accepted only if they contain a Studio sentinel (share\studio.conf,
unsloth_studio\.unsloth-studio-owned, or bin\unsloth.exe) and are not
on a hard deny list (drive root, %USERPROFILE%, parent of %USERPROFILE%,
or top-level system paths).

README now points Windows users at the script.

* Scope port-file kill and PATH cleanup to known Studio roots for PR unslothai#5513

Three findings from the reviewer round:

1. _StopByPortFile killed whatever owned the recorded port without proving
   the PID belonged to this Studio install. A stale studio.port pointing
   at a port a different local service later bound would force-kill that
   service. New _PidUnderKnownRoot checks the listening PID's exe path
   against the same $KnownRoots that _StopStudioProcesses already uses.

2. The netstat.exe fallback matched ":$port " anywhere in the line, so a
   stale port file with 443 (or any common port) could match an
   ESTABLISHED row whose remote endpoint was that port, killing an
   unrelated process (browser, IDE). Now requires the row contain
   LISTENING, and applies the same _PidUnderKnownRoot ownership check.

3. PATH cleanup removed any entry whose expanded path contained
   \unsloth_studio\, which would also clobber an unrelated user virtualenv
   that shared the name. Now only removes entries that resolve inside a
   known Studio root (default %USERPROFILE%\.unsloth\studio plus any
   custom roots discovered from UNSLOTH_STUDIO_HOME / STUDIO_HOME /
   share\studio.conf).

* Expand tilde and honor UNSLOTH_STUDIO_HOME precedence for PR unslothai#5513

Two findings from the latest review round:

1. install.ps1 (lines 152-154) expands ~ and ~\path to $env:USERPROFILE
   before resolving the install root, but uninstall.ps1 was passing the
   raw env value to [System.IO.Path]::GetFullPath. That resolved ~\foo
   relative to the current directory rather than the user profile, so a
   user who installed with UNSLOTH_STUDIO_HOME='~\custom' could not
   uninstall through the same variable. New _ExpandTilde helper matches
   install.ps1's behavior.

2. Mirror install.ps1's env-var precedence: UNSLOTH_STUDIO_HOME wins,
   STUDIO_HOME is ignored when both are set. Otherwise uninstalling
   install A could also touch install B if the user has a stale
   STUDIO_HOME pointing at B.

---------

Co-authored-by: Daniel Han <info@unsloth.ai>
…5538)

* Fix num_logits_to_keep on transformers >= 4.51 + compile loss_function

Two follow-ups to the fused-forward work landed in unsloth-zoo PR unslothai#665.

1. unsloth_fast_generate (models/llama.py): transformers 4.51 renamed
   num_logits_to_keep to logits_to_keep. Previously we unconditionally
   set kwargs['num_logits_to_keep'] = 1, which transformers 4.57's
   _validate_model_kwargs rejects with:
     ValueError: The following `model_kwargs` are not used by the
     model: ['num_logits_to_keep']
   blocking model.generate() on Llama / Mistral. Now we inspect the
   runtime forward signature and use whichever spelling it accepts;
   if a caller still passes the legacy name we promote it to the new
   spelling instead of stripping it.

2. patch_loss_functions (models/loader.py): the single internal call
   site passed torch_compile=False. UnslothForCausalLMLoss is small
   (label shift + Triton CE), so torch.compile folds the elementwise
   prep into one launch and removes per-step Python overhead. The
   < 2.4 fallback inside patch_loss_functions still routes through
   torch._disable_dynamo so older torches are unaffected.

Verified:
- Llama 3.2 1B + model.generate() no longer raises; emits a sensible
  16-token continuation.
- Gemma3 1B GRPO smoke (max_steps=3) returns bit-identical losses
  0.256 / 0.4393 / 0.2031 vs pre-fix; train_runtime 409s (vs 415s
  pre-fix, within noise).
- unsloth-zoo test_compiler_rewriter_exhaustive + test_fused_forward_install
  pass (96 passed) on this combination.

Related: unslothai/unsloth-zoo PR for the compiler.py single-matmul
backport.

* Revert loader.py loss-compile flip; correct rename-version comment

Drop the patch_loss_functions(torch_compile=True) flip. Tracing the
loss call chain:

  UnslothForCausalLMLoss
    -> unsloth_fixed_cross_entropy
      -> _fast_cross_entropy_loss
         -> Fast_CrossEntropyLoss.apply  (torch.autograd.Function wrapping Triton)

torch.compile treats custom autograd.Function.apply as an opaque op and
breaks the graph at the boundary. The only Python it can actually
compile in the loss function is the label-shift + ignore-fill prep
(three elementwise ops), and the per-call dynamo guard overhead is in
the same order as that prep. Empirical Gemma3 1B GRPO smoke (max_steps=3)
showed no meaningful runtime delta (415s vs 409s, within noise) and
risked dragging the outer compiled training step into recompiles when
the inner guards drift. Keep torch_compile=False; the Triton kernel is
the work, and it is unchanged either way.

Also: the inline comment in unsloth_fast_generate said the kwarg rename
landed in transformers 4.51. The actual decorator (@deprecate_kwarg)
was tagged version="4.50" and present through 4.51.x, then removed in
4.52+. Correct the comment. No behaviour change.
…) (unslothai#4611)

Co-authored-by: WhiskyAKM <35374730+PTFOPlayer@users.noreply.github.com>

PR unslothai#4611 originally proposed a community uninstall.sh for Unsloth
Studio. We folded that idea into the maintainer-authored
uninstall.sh (PR unslothai#5497) and uninstall.ps1 (PR unslothai#5513) which now ship
in main with safety guards, idempotency, lock-dir / .desktop / .app
cleanup, env-var precedence, tilde expansion, and CI coverage on
real Linux / macOS / Windows runners (PR unslothai#5536). Recording this
empty-commit merge so the original contribution from @PTFOPlayer
is attributed in git history.
* Add OpenDocument chat attachments

* Preserve typed ODS cell values

* Exclude hidden OpenDocument review text

* fix(chat): harden OpenDocument attachment extraction

* fix(chat): close opendocument attachment leaks

* fix(chat): unblock failed attachments

* fix(chat): preserve covered cell columns

---------

Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: shine1i <wasimysdev@gmail.com>
danielhanchen and others added 2 commits May 25, 2026 06:34
Round 14 reviewer aggregate (logs/review_round14_aggregate.md):

P1 fixes:
- routes/export.py /load-checkpoint now runs the active-export 409
  guard BEFORE the chat / diffusion unloads, so a rejected request
  no longer tears down unrelated GPU state.
- core/inference/llama_cpp.py wraps the WHOLE load_model body in a
  single try/finally that publishes loading_model_identifier across
  download, metadata read, VRAM settle, process spawn, and health
  check. Done via a thin load_model wrapper around the existing
  body (renamed _load_model_impl) to avoid reindenting hundreds of
  lines.
- routes/models.py /delete-finetuned now checks
  loading_model_identifier so a pending HF GGUF download cannot
  have its destination directory rmtree'd before llama-server
  spawns.
- core/inference/diffusion.py stores the original caller-supplied
  gguf_filename (e.g. ``BF16/model.gguf``) in a new self._gguf_filename
  field and exposes it as active_gguf_filename. UI-facing
  gguf_filename still collapses to basename for the panel.
- routes/models.py /delete-cached llama guard now allows safe
  different-variant deletes when hf_variant differs, matching the
  diffusion path's variant-aware behaviour.
- core/inference/diffusion.py tracks self._cpu_offload_enabled and
  forces a CPU torch.Generator when offload is on, so seeded
  generation no longer crashes on CUDA hosts with the default offload
  enabled.

P2 fixes:
- core/inference/diffusion.py detect_family normalises mixed
  separators (``Qwen_Image-Edit-GGUF``, ``Qwen-Image_Edit-GGUF``,
  ``QwenImageEdit-GGUF``) so every Qwen-Image-Edit spelling is
  excluded from the base Qwen-Image family.
- core/inference/diffusion.py logger.info / logger.error in
  load_model run repo_id and effective_base through _redact_hf_tokens
  so URL-embedded ``hf_xxxxx`` tokens never reach structured-log
  sinks.
- core/inference/diffusion.py _release_other_gpu_owners_for_diffusion
  now raises RuntimeError when an export job is active instead of
  logging and continuing, so direct backend callers cannot bypass
  the route layer's 409 guard.
- core/inference/diffusion.py full-diffusers repo / base_repo paths
  expand ``~`` via _expand_existing_local_path so
  ``repo_id="~/models/my-flux"`` no longer falls through to the Hub.

Tests:
- 5 new regression cases (mixed Qwen-Image-Edit separators, token
  redaction, status full-filename, CPU offload generator device,
  staging Windows leaf already-set sanity).
- All 68 diffusion backend + route tests pass.
@danielhanchen danielhanchen force-pushed the studio-diffusion-images-staging branch from acad7c6 to b7207e3 Compare May 25, 2026 06:34
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Round 15 reviewer aggregate (logs/review_round15_aggregate.md):

P1 fixes:
- core/inference/llama_cpp.py publishes loading_model_identifier +
  loading_hf_variant AFTER acquiring _serial_load_lock; previously
  a queued second load could overwrite or clear the identifier
  currently in flight, breaking delete-safety and GPU handoff guards.
- routes/models.py /delete-finetuned compares the pending llama
  load against loading_hf_variant (new), not the stale hf_variant
  from the previous loaded model. Without this, a Q4-loaded
  directory loading Q8 would still accept a Q8 delete.
- core/inference/diffusion.py _release_other_gpu_owners_for_diffusion
  now also raises when training is active so direct backend callers
  cannot bypass the route layer's 409 guard. Mirrors the
  export-active check the same helper already enforces.
- routes/models.py /delete-cached diffusion guard compares owned
  diffusion paths against the HF cache root for the target repo
  via _all_hf_cache_scans + _is_path_under. Without this, loading
  from a local models--owner--model/snapshots/<sha> path let the
  cache delete proceed while the snapshot was still mmap'd.
- models/inference.py DiffusionLoadRequest refuses URL-embedded
  hf_xxxxx tokens in repo_id / base_repo at the API boundary, so
  the value never reaches self._repo_id and status() can never
  echo it back to other authenticated sessions.

P2 fixes:
- core/inference/diffusion.py status() routes UI-facing repo_id /
  base_repo through _display_repo_id, which collapses absolute
  local paths to the leaf name (delete guards still see the full
  path via active_*/pending_*).
- routes/inference.py /images/load maps backend RuntimeError that
  reports an export/training conflict to HTTP 409 instead of 400.
- core/inference/diffusion.py detect_family now uses token-boundary
  matching so owner/flux.20-model does not collide with flux.2.

P3 fixes:
- tests/test_diffusion_routes.py drops the partial routes.inference
  module from sys.modules if exec_module() raises, so the real
  ImportError surfaces instead of a misleading AttributeError on
  follow-up tests.

Tests:
- 5 new regression cases (display_repo_id, token-boundary family
  detection, training-active raise from backend helper, embedded HF
  token rejection).
- All 72 diffusion backend + route tests pass.
@danielhanchen danielhanchen force-pushed the studio-diffusion-images-staging branch from b7207e3 to 4c75b61 Compare May 25, 2026 07:00
Comment thread studio/backend/core/inference/diffusion.py Fixed
for holding ``_serial_load_lock`` and for publishing /
clearing ``_loading_model_identifier`` + ``_loading_hf_variant``
in the surrounding try/finally."""
if True:
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
danielhanchen and others added 2 commits May 25, 2026 07:15
…lothai#5754

Round 15 split LlamaCppBackend.load_model into a thin wrapper that
publishes _loading_model_identifier + _loading_hf_variant under
_serial_load_lock and an inner _load_model_impl_locked body that
actually launches llama-server. The pre-existing source-inspection
regression tests inspected only load_model and broke because the
flag literals and _wait_for_vram_settle call now live in the inner
method:

- tests/test_llama_cpp_no_context_shift.py
  test_no_context_shift_is_in_load_model
  test_flag_sits_inside_the_base_cmd_list
- tests/test_llama_cpp_wait_for_vram_settle.py
  test_load_model_calls_helper_outside_lock_and_uses_last_kill_timestamp

Update both helpers to concatenate the source of load_model AND
_load_model_impl_locked so the assertions still cover the launch
path without weakening their scope to the full module.
@danielhanchen danielhanchen force-pushed the studio-diffusion-images-staging branch from 4c75b61 to aa24f21 Compare May 25, 2026 07:15
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
danielhanchen and others added 2 commits May 25, 2026 07:31
Round 16 reviewer aggregate (logs/review_round16_aggregate.md):

P1 fixes:
- routes/models.py /delete-cached llama guard pairs loading_id with
  loading_hf_variant so deleting a different cached quant (Q8_0)
  while another variant (Q4_K_M) is loading is no longer blocked.
- core/inference/diffusion.py load_model now calls
  _release_other_gpu_owners_for_diffusion BEFORE
  _release_chat_backend_for_diffusion. The other-owners helper
  RAISES on active training/export, so a route -> worker race or
  direct backend caller no longer drops the user's chat model
  before the diffusion load is refused.
- routes/models.py /delete-cached diffusion guard fails CLOSED
  (503) on HF cache scan failure instead of silently falling
  through to repo-id-only matching, which could miss a loaded
  local snapshot path.
- routes/inference.py _release_llama_for and
  _release_safetensors_chat_for now raise 503 on actual unload
  failure (exception or False return), so new GPU workloads do
  not start while the old chat process still owns VRAM.
- core/inference/diffusion.py status() now takes
  include_internal=False by default and only exposes the
  guard-facing active_*/pending_* paths when callers opt in. The
  public /api/inference/images/status route gets the redacted
  payload; routes/models.py delete guards pass
  include_internal=True so they still see the raw paths.
- core/inference/diffusion.py generate_image_with_metadata routes
  the response model through _display_repo_id so /images/generate
  cannot echo back an absolute local path.

P2 fixes:
- routes/inference.py /images/load now maps backend "Could not
  verify training/export status" to 503 instead of 409, matching
  the route-level pre-check.
- core/inference/diffusion.py _release_other_gpu_owners_for_diffusion
  raises "Could not verify export status" when the
  is_export_active() probe itself raises, instead of silently
  treating it as active export.
- core/inference/diffusion.py detect_family compares compact family
  spellings (Flux2Klein) against per-token compact strings so
  unsloth/Flux2Klein-GGUF matches the flux.2-klein family without
  matching the embedded substring inside flux.20.
- main.py installs a RequestValidationError handler that scrubs
  hf_xxxxx tokens out of the 422 response body so a rejected
  ``repo_id`` containing a URL-embedded HF token does not echo it
  back to the browser.

Tests:
- 3 new regression cases (Flux2Klein compact alias, public status
  redaction, generate_image_with_metadata redaction).
- All 75 diffusion backend + route tests pass.
@danielhanchen danielhanchen force-pushed the studio-diffusion-images-staging branch from aa24f21 to 3c6a47d Compare May 25, 2026 07:32
Two diffusion tests broke on the Windows runner after round 16:

- test_display_repo_id_collapses_absolute_path used hardcoded
  POSIX absolute paths; Windows reads /home/... as drive-
  relative so Path.is_absolute() returns False. Use pytest's
  tmp_path so the path is platform-correct.
- test_load_publishes_pending_target_during_loading regressed
  because round 16 moved _release_other_gpu_owners_for_diffusion
  ahead of the chat unload. That helper imports core.training and
  core.export; on Windows CI the import resolved to a real but
  partially configured backend, which raised inside the new
  status-verification path and aborted the load before
  from_pretrained ran. Stub both modules with idle backends in
  _install_fake_diffusers.

Also updated test_public_status_does_not_leak_local_path_via
_active_fields and test_generate_image_with_metadata_redacts_
local_path to use tmp_path for the same Windows reason.
@danielhanchen danielhanchen force-pushed the studio-diffusion-images-staging branch from 3c6a47d to b841562 Compare May 25, 2026 07:36
danielhanchen and others added 3 commits May 25, 2026 08:15
P1: route-layer chat/diffusion/export releases were still
asymmetric. Training start and export load called
``diff_backend.unload_model`` inside a best-effort try/except so a
wedged diffusion backend let the next workload allocate over the
top of the resident pipeline and OOM. Both now use the strict
``_release_diffusion_for`` helper from routes.inference, which
raises HTTPException 503 on status/unload failure or post-check
mismatch.

P2 #9: diffusion load exceptions can include the absolute local
repo / base / gguf path verbatim (FileNotFoundError, OSError from
diffusers / safetensors). The path flows into ``_last_error``,
which ``status()`` returns to every authenticated session. Collapse
the known repo_id / effective_base / gguf_filename paths to their
leaf name before storing the error, mirroring the
``_display_repo_id`` convention used for the public repo label.

P2 #10: when ``repo_id`` is an absolute local path,
``detect_family`` matched _FAMILY_EXCLUDE deny lists against the
full path, so models stored under a parent directory containing
``qwen-image-edit`` or ``3.5`` were misclassified as None. Reduce
the family-detection needle to the leaf directory when the input
looks like a filesystem path; Hub-style ``owner/repo`` ids
continue to use the original needle so existing detection rules
keep working.

P2 #12: ``gguf_filename`` was missing from the
``_reject_embedded_hf_token`` validator. A URL-form quant path
like ``https://hf_xxxxx@huggingface.co/.../flux.gguf`` would be
stored on ``DiffusionBackend._gguf_filename`` and surface in
status() / log lines. Extend the validator to gguf_filename so the
token is dropped before it can leak.

All 85 diffusion-relevant backend tests pass locally.
P1 #1: ``_release_llama_for()`` now verifies ``llama.unload_model``
did not return False AND that ``is_loaded`` / ``is_active`` /
``loading_model_identifier`` are all cleared after the call. The
previous version only treated raised exceptions as failure, so a
subprocess refusing to terminate or an in-flight GGUF download
let the next workload allocate on top.

P1 #2: ``DiffusionBackend._release_other_gpu_owners_for_diffusion``
now raises RuntimeError when ``exp._shutdown_subprocess`` fails on
a settled checkpoint. Direct backend callers used to log at debug
level and proceed toward diffusion allocation while the export
checkpoint still owned VRAM.

P1 #3 + P1 #7: ``/images/load`` no longer drops chat + idle export
before the cheap backend validation runs. ``DiffusionBackend.load_model``
already calls the strict ``_release_other_gpu_owners_for_diffusion``
and ``_release_chat_backend_for_diffusion`` helpers AFTER family
inference and GGUF filename checks pass, so the GPU is still
freed before allocation and a malformed payload no longer
silently unloads the user's chat / chat-export pair.

P1 #4: ``_release_chat_backend_for_diffusion`` now also rejects a
post-unload state where ``loading_model_identifier`` is still set,
matching the route-level ``_release_llama_for`` strictness. A GGUF
download mid-flight before the diffusion handoff used to slip
through and end up double-owning VRAM after diffusion allocated.

P1 #5: ``_release_diffusion_for`` no longer swallows a post-unload
``status()`` failure as ``after = {}``. Training / chat / export
handoffs need proof that the diffusion pipeline released VRAM;
the helper now raises HTTP 503 when the verification status call
itself raises, so the caller retries.

P1 #6: ``DiffusionBackend._release_other_gpu_owners_for_diffusion``
raises RuntimeError when ``get_export_backend()`` itself raises.
Direct backend callers used to silently ``return`` here and
proceed to GPU allocation without being able to verify export
ownership.

P1 #8: ``/training/start`` releases settled export BEFORE chat,
matching the chat-load helpers. If idle export shutdown fails the
user's chat model is preserved instead of being dropped for a
training run that never starts.

P2 #9: GGUF load-error scrubber also collapses ``local_gguf_path``,
the resolved HF cache path passed to
``transformer_cls.from_single_file()``. Without this an exception
like ``OSError: cannot load /home/alice/.cache/huggingface/.../flux.gguf``
would leak the operator's filesystem layout through ``last_error``
and ``/images/status``.

All 85 diffusion-relevant backend tests pass locally.
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/tests/test_diffusion_backend.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
P1 #1: ``_release_safetensors_chat_for`` now re-reads
``active_model_name`` and ``loading_models`` after each unload AND
runs a final sweep against the initial owned-name set. The previous
helper trusted ``unload_model() -> True`` even though the
orchestrator can respond ``unloaded`` while still holding weights
or a concurrent ``load`` can repopulate the tracker between calls.
Per-name and global post-state mismatches now raise HTTP 503 so
the caller retries.

P1 #2: same post-state guarantee inside
``_release_chat_backend_for_diffusion`` for direct backend
callers. ``DiffusionBackend.load_model`` now raises RuntimeError
when the safetensors tracker still owns a previously-resident
name after the unload, matching the route-level helper. The route
layer's existing classifier maps the new wording to HTTP 503.

P1 #3: ``DiffusionBackend.load_model`` now preflights the full
diffusers repo (or explicit GGUF ``base_repo``) via
``hf_hub_download(filename="model_index.json")`` BEFORE the
chat / export unload runs. The GGUF path was already covered by
the existing ``hf_hub_download(gguf_filename)`` round-trip; the
full-repo path used to skip validation and let a typo / private /
gated repo only surface inside ``from_pretrained`` AFTER the
user's chat model was already dropped. Local paths are checked
structurally (must be a directory containing ``model_index.json``)
so we do not network-round-trip for an on-disk miss. Error
messages route through ``_display_repo_id`` so an absolute
filesystem path does not leak the operator's layout.

P1 #6: ``/api/inference/unload`` (the direct chat unload endpoint)
now treats ``unload_model() -> False`` AND a leftover state
(``is_loaded`` / ``is_active`` / ``loading_model_identifier`` for
GGUF, ``active_model_name`` / ``loading_models`` for safetensors)
as 503 instead of unconditionally responding
``status="unloaded"``. The UI used to show the model as gone while
the backend still owned VRAM.

P2 #7: extended the /images/load RuntimeError -> HTTPException
marker list with ``still active or loading after unload`` and
``still loading after unload``. Round 18 introduced these exact
phrasings on the backend side; without the extension a retryable
unload failure was returning HTTP 400 to the user instead of 503.

P2 #8: removed the unused ``unsloth_backend = get_inference_backend()``
eager construction in the GGUF chat-load branch. Eager
construction made the GGUF-only path needlessly fail or pay
startup cost when the safetensors backend was unavailable / lazy;
``_release_safetensors_chat_for`` already handles that case as a
no-op.

All 85 diffusion-relevant + 98 related backend tests pass locally.
P1 #1: ``_preflight_full_diffusers_repo(effective_base, hf_token)``
now runs for every load mode, including the GGUF-with-auto-base
path. Round 19 only preflighted the full repo or an explicit
``base_repo``, so an auto-picked companion that turned out to be
gated / private / missing still unloaded the user's chat model
before ``from_pretrained`` failed. ``effective_base`` is the same
value that feeds every downstream allocation, so preflighting it
unconditionally catches all three modes.

P1 #2: ``diffusers.GGUFQuantizationConfig`` (which imports the
``gguf`` package at construction time) is now built up front,
inside the same try block that surfaces "Re-run Studio setup".
Previously the missing-dependency exception fired AFTER
``_release_other_gpu_owners_for_diffusion`` and
``_release_chat_backend_for_diffusion`` had already taken the
chat / export models down. The downstream from_single_file call
reuses the same ``quant_config`` reference.

P1 #4: ``studio/backend/requirements/studio.txt`` now lists
``diffusers>=0.37.0`` and ``gguf>=0.10.0``. These were only in
the extras files, so fresh standard Studio installs failed on
/images/load with the round 20 P1 #2 dependency error message.

P1 #5: ``LoadRequest``, ``UnloadRequest``, and
``ValidateModelRequest`` now apply the same control-character +
embedded-HF-token validators that ``DiffusionLoadRequest``
already had. /api/inference/load, /api/inference/validate, and
/api/inference/unload used to accept newline / tab / control
characters in ``model_path`` (log-line smuggling) and URL-form
``https://hf_xxxxx@huggingface.co/...`` (credential leak through
structured log sinks).

P2 #6: ``_collapse_local`` in the diffusion load-error scrubber
now resolves relative candidates and adds the absolute form to
the substring set. A relative ``exports/my-flux`` used to leak
``/mnt/disks/.../exports/my-flux/...`` via downstream library
errors because the scrubber only matched the original literal.
Replacement is longest-first so a leaf-only context survives.

All 85 diffusion-relevant + 35 related model-validation tests
pass locally.

(P1 #3 cross-workload GPU handoff lock is deferred: deserves a
focused design pass across /images/load, /chat/load (both
branches), /training/start, and /export/load to pick a lock
boundary that does not deadlock against the backend load locks
or stall the SSE log stream.)
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
Comment thread studio/backend/core/inference/diffusion.py Fixed
P1 #1 + #2: ``LoadRequest._no_embedded_hf_tokens`` and
``ValidateModelRequest._no_embedded_hf_tokens`` now cover
``gguf_variant`` in addition to ``model_path``. A caller could
pass a variant like ``Q4_K_M-hf_xxxxxxxx`` that flowed into
structured log sinks via the GGUF resolver path; the matching
``DiffusionLoadRequest`` validator already covered every string
field, so this restores parity.

P1 #3: ``/api/inference/unload`` now also matches the llama
``loading_model_identifier`` when picking the GGUF branch. A
pending GGUF download (``is_active`` still False,
``loading_model_identifier`` populated) used to fall through to
the safetensors branch and respond ``status="unloaded"`` while
llama-server kept downloading.

P1 #4 + #5: the final safetensors-handoff sweeps (route-level
``_release_safetensors_chat_for`` and backend
``_release_chat_backend_for_diffusion``) now check ``active_model_name``
and ``loading_models`` WITHOUT the initial ``owned_names`` filter.
A concurrent ``/load`` that landed AFTER the snapshot was
previously ignored, so a chat model that began loading during the
unload window let training / export / GGUF chat / diffusion start
anyway and race the new chat for VRAM.

P2 #6: added ``_preflight_diffusers_subfolder_config`` and
invoked it for GGUF loads with a transformer class
(``effective_base``, ``"transformer"``). A custom base companion
that had ``model_index.json`` but lacked
``transformer/config.json`` previously passed the round 19
preflight, unloaded chat, then failed inside
``from_single_file``.

P2 #7: ``_scrub_validation_obj`` in main.py also scrubs string
dict KEYS. Pydantic ``string_type`` errors surface ``input``
verbatim, and a malformed payload like
``{"repo_id": {"hf_xxxxx": "owner/repo"}}`` would otherwise leak
the token through the 422 response body.

All 85 diffusion-relevant + 35 model-validation tests pass
locally. Existing fakes for ``hf_hub_download`` updated to
accept the new ``subfolder=`` kwarg the round 21 preflight uses.

(P1 #3 cross-workload GPU handoff lock from round 20 is still
deferred; round 21's P1 #4 / #5 raised the sweep-level guarantee,
which closes the most common race without the deadlock risk of
holding a process-wide lock across the entire load.)
P1 #1: ``TrainingStartRequest.model_name`` now runs the same
control-character and embedded-HF-token validators that the chat
and diffusion request models gained in rounds 5 / 15 / 20 / 21.
``/api/training/start`` previously accepted newline / tab /
control characters and URL-form ``hf_xxxxx`` tokens that flowed
into structured-log sinks via "Loading model %s" lines.

P1 #2: ``_run_with_helper`` in ``utils/datasets/llm_assist.py``
now skips the helper GGUF when the diffusion image backend
reports loaded / loading. The public chat / training / export
routes already do this through ``_release_diffusion_for``, but
this dataset-side helper loaded llama-server directly with no
diffusion guard, so an Images-page allocation would race the
helper for VRAM. New ``_diffusion_image_model_busy`` helper
fails closed (treats status() failure as busy) so the resident
image model is preserved instead of being overwritten.

P1 #3: same ``_diffusion_image_model_busy`` guard added to
``_run_multi_pass_advisor`` (the dataset conversion advisor),
which has the same direct llama.cpp load shape.

P2 #4: the early "Could not infer a diffusion family" RuntimeError
now routes ``repo_id`` through ``_display_repo_id`` before
formatting. A local absolute path that did not match any known
family used to leak the operator's filesystem layout via the 400
response body, last_error, and log line.

All 97 diffusion + training-validation + related tests pass
locally.
P1 #1 + #2 + #6: extended the chat / diffusion / training
identifier hardening to every export-side request model.
ExportCommonOptions (parent of ExportMergedModelRequest /
ExportBaseModelRequest / ExportLoRAAdapterRequest) now applies
_no_control_chars and _reject_embedded_hf_token to repo_id and
base_model_id; ExportGGUFRequest gets the same on its repo_id
plus a control-char check on quantization_method; and
LoadCheckpointRequest validates checkpoint_path. Previously
"/api/export/*" accepted newline-smuggled identifiers and
URL-form ``hf_xxxxx`` tokens that flowed into log lines.

P1 #3 + #4: ``_run_with_helper`` and ``_run_multi_pass_advisor``
now use a shared ``_gpu_workload_busy_for_helper`` that gates on
diffusion (round 22 already), training, AND export. The round 22
guard only checked diffusion, so the dataset helper / advisor
could still load llama-server on top of an active training run
or a resident export checkpoint. Each step fails closed
(unverifiable status counts as busy) so the user's primary
workload is preserved.

P1 #5: PublishDatasetRequest in models/data_recipe.py also
applies the identifier hardening to repo_id; the publish path
previously accepted control characters and URL-form tokens.

P1 #7-10: added _validate_logged_identifier helper to
routes/models.py and applied it to the path / query parameter
endpoints that flow into logger.info(...) calls --
``/config/{model_name}``, ``/check-vision/{model_name}``,
``/check-embedding/{model_name}``, ``/gguf-variants``. Mapped
the validator's ValueError to HTTP 422 so the client sees the
same shape as a Pydantic validation failure.

P2 #11 + #12: ``Loading diffusion model %s`` and
``Diffusion load failed for %s`` log lines route ``repo_id`` /
``effective_base`` through ``_display_repo_id`` (collapses
absolute local paths to the leaf, still scrubs HF tokens)
instead of plain ``_redact_hf_tokens``. The error path was
already collapsed in the user-facing 400 / RuntimeError, but
the structured-log lines kept the full path.

All 97 diffusion + training-validation + related tests pass
locally.
P1 #1: ``_gpu_workload_busy_for_helper`` in
``utils/datasets/llm_assist.py`` now also gates on the GGUF chat
backend (llama-server) AND the safetensors chat backend. Round 23
extended it to training + export but missed Chat, so a helper /
advisor GGUF could still race a loaded chat model for VRAM.
Both checks fail closed when status is unverifiable.

P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff
unloads so the diffusion release runs BEFORE the chat releases.
A wedged diffusion unload used to fire AFTER chat was already
gone, so the user lost both on a single failure. Drop chat last
so an earlier failure preserves it. Applied to
``/training/start`` (training.py), ``/export/load`` (export.py),
``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch
(routes/inference.py).

P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens
``model_path`` and ``gguf_variant`` via the shared
``_validate_logged_identifier`` helper, so control characters
and URL-form HF tokens can no longer log-line-smuggle.

P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and
``variant`` the same way.

P1 #9: ``/download-progress`` ``repo_id`` query parameter is
also hardened; the value flows into log lines deep inside
``_get_repo_size_cached`` on lookup failure.

P1 #11: ``CheckFormatRequest.dataset_name`` and
``AiAssistMappingRequest.{dataset_name, model_name}`` in
``models/datasets.py`` now apply the same control-char +
embedded-HF-token validators, matching every other public
request-body model.

All 115 diffusion + training-validation + cached_gguf + export
+ inference model-validation tests pass locally.

(P1 #6 native-path-lease enforcement for diffusion local paths
and P1 #12 React Compiler frontend lint deferred -- both need
focused design / frontend touchups separate from this batch.)
return
try:
del obj
except Exception:
# supplied filename (e.g. ``BF16/model.gguf``) is kept
# separately as ``active_gguf_filename`` for delete
# guards.
gguf_basename = Path(self._gguf_path).name if self._gguf_path else None
self._cpu_offload_enabled = False
self._loaded_at = None
_release(old)
old = None # noqa: F841
backend = get_inference_backend()
active_model_name = getattr(backend, "active_model_name", None)
loading_models = set(getattr(backend, "loading_models", set()) or set())
owned_names = {name for name in ({active_model_name} | loading_models) if name}

active_model_name = getattr(inf, "active_model_name", None)
loading_models = set(getattr(inf, "loading_models", set()) or set())
owned_names = {name for name in ({active_model_name} | loading_models) if name}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.