Studio: local diffusion image generation (CI validation)#88
Open
danielhanchen wants to merge 187 commits into
Open
Studio: local diffusion image generation (CI validation)#88danielhanchen wants to merge 187 commits into
danielhanchen wants to merge 187 commits into
Conversation
* Add a simple --version flag * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Small code clean-up, less ugly * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Slightly better function names. And use again None --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
* studio: load cached GGUF models when fully offline
When huggingface.co is unreachable, GGUF model loads fail in three distinct
places even though the bits are already in ~/.cache/huggingface/hub. Each
failure has a different surface symptom:
1. list_gguf_variants() raises straight through HTTPException(500), so the
variant dropdown shows 'Failed to list GGUF variants'.
2. detect_gguf_model_remote() silently returns None after retries fail. The
caller then treats a GGUF-only repo as non-GGUF and routes it through the
transformers/MLX path. On Apple Silicon this surfaces as 'Unsloth currently
only works on NVIDIA, AMD and Intel GPUs.'
3. _download_gguf() loses list_repo_files() to the network and falls back to a
filename heuristic ('{repo}-{variant}.gguf'). When the repo name does not
echo the filenames (e.g. repo 'Qwen3.6-27B-MTP-GGUF' contains a file
'Qwen3.6-27B-UD-Q4_K_XL.gguf' with no MTP), hf_hub_download cannot find
that invented filename in the cache and aborts.
Fix in three layers:
- list_gguf_variants / detect_gguf_model_remote: honor HF_HUB_OFFLINE and
fall back to scanning the local HF cache snapshot when the API throws.
detect_gguf_model_remote still keeps its retry loop for transient flakes;
the cache fallback only kicks in after every attempt fails.
- _download_gguf: when list_repo_files() fails, look up variant -> real
filename inside the cached snapshot before resorting to the heuristic.
- llama_cpp.load_model / inference worker startup: when DNS for
huggingface.co fails (2s probe), set HF_HUB_OFFLINE=1 for the process so
every hf_hub_download call below resolves from cache instantly instead of
spending ~25s on five exponential retries.
Online behavior is unchanged: the API is tried first and only used to fail
over. The cache scan is a strict subset of what list_local_gguf_variants
already does today for local paths.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* studio: tighten inline comments on offline GGUF fallback
* studio: address review feedback on offline GGUF fallback
Fixes from the review pass on unslothai#5505:
* ruff F823 (lint CI red): the late `import os` at the bottom of
LlamaCppBackend.load_model made `os` a function-local name, so my
new `os.environ` reference at the top of the same method was a
use-before-bind. Surfaces at runtime as
'cannot access local variable os where it is not associated with a value'
and is why the Mac/Windows Studio API jobs were failing too. The
env-var mutation has been moved into a module-level contextmanager,
so load_model no longer touches `os` directly.
* Codex P1: cache variant match now uses the relative path, not the
basename. Layouts like `BF16/foo.gguf` (variant token only in
parent dir) were silently skipped, falling through to the bogus
`{repo}-{variant}.gguf` heuristic and failing offline loads of
models stored under quant-named subdirs.
* Codex P1: HF_HUB_OFFLINE no longer persists past one model load.
llama_cpp.load_model now uses a contextmanager that probes DNS,
sets HF_HUB_OFFLINE/TRANSFORMERS_OFFLINE only when DNS is dead,
and pops them in finally (preserving any prior user setting of
TRANSFORMERS_OFFLINE). Pre-existing user-set HF_HUB_OFFLINE is
respected as a no-op. worker.py keeps the startup probe because the
orchestrator spawns a fresh worker per load -- comment updated to
make that lifecycle explicit, and a warning is now logged.
* Gemini: cache-dir lookup centralized in `_iter_hf_cache_snapshots`.
Three near-identical copies (in list/detect helpers and the
llama_cpp offline scan) now go through one helper.
* Gemini: `huggingface_hub.utils.is_offline_mode` does not exist in
1.x (verified locally); `huggingface_hub.constants.HF_HUB_OFFLINE`
is snapshot-at-import-time and does not reflect runtime mutations.
Manual env-var parsing kept.
* socket probe now saves and restores the prior default timeout
instead of unconditionally setting None on exit, so it composes
with caller code that already configured a timeout.
* worker.py probe now logs a warning when offline mode is auto-enabled
so debugging the case isn't blind.
* studio: regression tests for offline GGUF cache fallback
Lock in the offline fallback path from unslothai#5505 so future refactors can't
silently regress either bug. 26 tests, 0.55 s, no network/GPU/subprocess.
Covers:
* _iter_hf_cache_snapshots: missing cache, missing repo, missing
snapshots/, newest-mtime ordering, case-insensitive repo match.
* _list_gguf_variants_from_hf_cache and the list_gguf_variants
online/offline-env/API-exception/reraise paths.
* _detect_gguf_from_hf_cache and detect_gguf_model_remote 3x-fail
fallback. Pre-existing RepositoryNotFoundError early-return preserved.
* Codex P1 #1 regression: BF16/foo.gguf (quant only in subdir name)
must resolve via _detect_gguf_from_hf_cache, which now matches the
snapshot-relative path rather than the basename.
* _probe_dns_dead: returns True/False, restores prior socket timeout.
* Codex P1 #2 regression: _hf_offline_if_dns_dead sets env only inside
the block, restores on exit (including on exception), re-probes DNS
on the next call so a transient hiccup cannot lock the long-lived
LlamaCppBackend singleton offline. Honors a user-set HF_HUB_OFFLINE
as a no-op. Preserves a user-set TRANSFORMERS_OFFLINE across exit.
Follows the existing studio backend test stub pattern (loggers /
structlog / httpx stubs + backend dir on sys.path).
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* studio: extend offline cache fallback to _download_mmproj and quant label
Two follow-up fixes from the review pass on unslothai#5505:
* _download_mmproj() now mirrors _download_gguf()'s offline path:
when list_repo_files() fails, scan the local HF cache snapshot for
any GGUF whose basename starts with mmproj-. Without this, offline
vision GGUF loads succeed at the main weight (the existing PR fix)
but the mmproj returns None and llama-server starts without vision
support. Same _iter_hf_cache_snapshots helper, F16 preference and
fallback to the first match are preserved.
* _extract_quant_label() now considers parent directory segments when
the basename has no quant token. Layouts like BF16/foo.gguf are
already documented in this file and are returned by the new
snapshot-relative-path filter in _download_gguf; before this fix
their variant label collapsed to "foo" (the last hyphen segment of
the basename). Regex is the same; the search just walks parent
segments innermost-first if the basename misses.
Tests (studio/backend/tests/test_offline_gguf_cache_fallback.py):
* TestExtractQuantLabelSubdir: basename quant unchanged, quant-only-
in-parent, UD- prefix in parent, deeper nesting picks the
innermost matching segment.
* TestDownloadMmprojOfflineCacheFallback: cache fallback returns the
mmproj when list_repo_files fails, F16 preference holds when both
variants are in cache, no-mmproj cache returns None.
* httpx stub now prefers the real package when installed (the CI
install list already includes it) and falls back to the stub only
when httpx is genuinely missing. Newer huggingface_hub imports
HTTPError/Response/Request at module load, so the previous
fixed-set stub broke when those names were added upstream.
26 existing cases plus 7 new = 33 pass in 0.74s.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix/adjust offline cache + DNS probe per PR unslothai#5505 review
Four review findings tightened, with regression tests:
- list_local_gguf_variants subdir collapse (P1 codex 10:08): pass the
snapshot-relative path to _extract_quant_label so BF16/foo.gguf and
Q4_K_M/foo.gguf produce distinct labels instead of folding to the same
basename pseudo-quant.
- list_gguf_variants cache fallback (P2 codex 12:10): surface
RepositoryNotFoundError / GatedRepoError / RevisionNotFoundError /
EntryNotFoundError to the caller instead of masking with stale cache,
matching detect_gguf_model_remote.
- _detect_gguf_from_hf_cache mmproj (P2 codex 12:10): exclude mmproj
files from the candidate list so a partial cache with only a vision
projector cannot route the projector as the main model.
- _probe_dns_dead global timeout (P2 codex 13:06): run the gethostbyname
on a daemon thread with join timeout so concurrent sockets in the same
interpreter never inherit a process-wide socket.setdefaulttimeout
mutation. Same shape applied in worker.py's startup probe.
* Make llama-server health check tolerant of warmup races
Two layered fixes for the Windows GGUF smoke CI Tool calling Tests
flake that exit-22'd on a single httpx.ReadError during llama-server
warmup. The 'windows-latest -> windows-2025-vs2026' image rollout is
hitting main with the identical symptom.
A. _wait_for_health: catch httpx.ReadError, RemoteProtocolError,
WriteError alongside ConnectError and TimeoutException. A TCP RST
mid-read while llama-server is still binding the port (WinError
10054) is a 'still warming up' signal, not fatal. The existing
_process.poll() check still wins for real crashes.
B. _drain_stdout + spawn: tee llama-server stdout/stderr to a
per-launch log file at ~/.unsloth/studio/logs/llama-server/
<port>.log. Any future subprocess crash leaves a forensic trace
on disk even when Studio's traceback only captures the symptom
(ReadError) and not the cause. Best-effort: a logging-side OSError
never blocks the load.
Regression coverage: TestWaitForHealthRetriesOnReadError pins the
retry behaviour for the three new exception types and verifies that a
real process exit still short-circuits the loop.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ci(windows): retry inference/load + collect llama-server logs
Composite fix for the Tool calling Tests flake that exit-22'd on a
single httpx.ReadError during llama-server warm-up. The
windows-latest -> windows-2025-vs2026 runner image rollout has been
hitting main with the identical symptom.
- All three jobs (openai-anthropic, tool-calling, json-images) now
retry POST /api/inference/load up to 3 times with 10s backoff and
preserve the response body for post-mortem. One transient 500 no
longer fails the whole job.
- A new "Collect llama-server logs" step copies the per-launch
llama-server stdout teed by Studio under ~/.unsloth/studio/logs/
llama-server/ into the workspace, and the upload-artifact step
now includes logs/llama-server/*.log so any future subprocess
crash leaves a forensic trace.
---------
Co-authored-by: shimmyshimmer <shimmyshimmer@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
…hai#5486) * studio: expose launcher capability bits on unauth /api/health PR unslothai#5375 reduced the unauthenticated /api/health response to {status, timestamp} only, on the theory that the rest of the payload was useful fingerprinting. That was too aggressive: the Tauri watchdog reads `service == "Unsloth UI Backend"` and `studio_root_id` to re-adopt its own backend across restarts (src-tauri/src/desktop_backend_owner.rs and commands.rs), and the SPA bootstrap fetches the same payload unauth to detect chat-only mode and native path lease support before any token is available (frontend src/config/env.ts and features/native-intents/use-native-readiness.ts). With the post-unslothai#5375 shape, the watchdog kills its own healthy backend, the SPA never flips out of "full Studio" mode on chat-only Linux/Windows, and the About tab shows "dev" in place of the real version. The actual fingerprint-ish fields are `version` / `studio_version` / `device_type` (and to a lesser extent the hostname inside `device_type`). `service`, `studio_root_id` (already a hex digest of the install path, not the raw path), `chat_only`, the desktop_* capability flags, and `native_path_leases_supported` do not leak the install path or version. This patch keeps the auth gate but rebalances which fields sit on each side of it: unauth service, studio_root_id, chat_only, desktop_protocol_version, desktop_manageability_version, supports_desktop_auth, supports_desktop_backend_ownership, native_path_leases_supported, desktop_owner (when present) authed + version, studio_version, device_type Existing must-change-password sessions still fall through to the base payload because get_current_subject (strict) rejects them; that matches prior behaviour. test_middleware.py is updated to pin the new contract: launcher bits present unauth, fingerprint fields present only with a valid bearer. * studio: complete launcher-bits health unauth contract on Tauri + About tab Reviewer follow-ups to the unauth /api/health launcher bits split. Tauri preflight: backend_capability_stale_reason() fell through to backend_version_stale_reason(health.version.as_deref()) when capability bits were present but version was absent. With the unauth payload now exposing service + studio_root_id + desktop_* bits but gating version behind a bearer, the desktop watchdog was reading the new payload, parsing all capability bits, then classifying the same-root backend as desktop_backend_version_missing and refusing to adopt it. A backend that exposes desktop_protocol_version=1, desktop_manageability_version>=1, supports_desktop_auth=true and supports_desktop_backend_ownership=true was introduced together with MIN_DESKTOP_BACKEND_VERSION=2026.5.3 in unslothai#5341, so a present capability bitset is itself a version-compatibility signal. Skip the version sub-check when version is None/empty; keep it for non-empty values so genuinely-too-old backends that do echo a version still get desktop_backend_version_too_old. About tab: fetchStudioVersions() did a bare fetch(apiUrl("/api/health")), which the unauth payload no longer carries version/studio_version for, so Settings -> About kept rendering "dev"/"dev" for any logged-in user. Attach Authorization: Bearer <token> when getAuthToken() returns one; fall back to bare fetch (still 200, just truncated payload) for the not-logged-in case. No new endpoint. Comment: studio_root_id is no longer a hex digest of the install path; it is an opaque per-install id written by the launcher. Updated the inline comment to match. Test: - python -m pytest studio/backend/tests/test_middleware.py::TestHealthAuthGate studio/backend/tests/test_desktop_auth.py -q -> 29 passed - npm run typecheck clean, npm run build produces fresh dist * Trigger CI rerun for flaky Mac Chat UI step
…unslothai#5487) * studio: tighten sandbox blocklist precision (bash, hf upload, NOFILE) Three precision fixes in core/inference/tools.py. Same security boundary; fewer false positives that broke legitimate sandbox use. bash blocklist: The per-token loop introduced in unslothai#5375 fired on any blocklist word in any token position, so the entirely benign `grep -r curl .`, `echo source the data`, and `ls /usr/bin/curl` were rejected with "blocked command 'curl'". The position-anchored regex already covers real command-position invocations, including `;rm`, `&&wget`, `$(rm)`, `<(rm)`, backticked subshells, and `/usr/bin/sudo`. The token loop is re-scoped: it only fires when the previous shlex token is a shell separator (or at start of line), so split-quoting obfuscations like `r''m -rf /` are still caught (shlex collapses them to a single command-position token) while argument-position blocklist words pass through. Trailing meta-chars glued to a shlex token (`rm;`) are stripped before basename matching. hf upload AST gate: `_method_call_is_hf_upload` previously matched any method named `upload_file` / `upload_folder` / `upload_large_folder` / `create_commit` on any receiver, so paramiko.SFTPClient.upload_file, boto3.create_commit, and similar non-HF SDK methods were rejected. The fallback now requires an `import huggingface_hub` / `import hf_api` / `from huggingface_hub import ...` somewhere in the same module. Fully-qualified huggingface_hub.upload_file(...) calls are unchanged. NOFILE env knob: `RLIMIT_NOFILE = (1024, 1024)` was the only sandbox rlimit without an env override. 1024 is below Linux's typical soft default and below what multi-shard safetensors mmap chains need on Llama-3 70B-class loads. Default is now 16384 with UNSLOTH_STUDIO_SANDBOX_NOFILE, parity with the other rlimits. 15 new bash-blocklist-position tests pin both the false-positive fixes and the still-blocked invariants (semicolon, &&, subshell, backtick, split-quote, /usr/bin/ prefix, nested bash -c). 4 new hf-upload-import-gate tests pin both the false-positive allowances and that HF-imported uses are still blocked. 1 new pin asserts the NOFILE env var is wired. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: cover command wrappers, find -exec, dynamic HF imports, NOFILE clamp Reviewer follow-ups to the sandbox blocklist precision change. Command-position scanner missed Bash command-prefix wrappers and inline shell assignments. shlex tokenised `env curl`, `time curl`, `nohup rm`, `FOO=bar curl`, `sudo rm`, etc. with the prefix at command position and the real command at argument position, so the position-anchored check returned set() while pre-PR's per-token scan caught them. Likewise the position-anchored regex requires `^` or a shell separator before the command, so `env curl` slipped through. Reworked the scanner to track an expect_command flag plus a prefix_pending flag: - assignments (FOO=bar) keep expect_command=True for the next token, - flags ('-oL', '--') keep it intact while prefix_pending is set, - numeric duration args ('timeout 1 cmd') skip without breaking expect_command, - known wrappers (env, command, builtin, exec, time, nohup, nice, setsid, stdbuf, timeout, ionice, chroot, sudo, doas, su, xargs) set prefix_pending so the wrapper's command is still checked, - shell separators now include `{`, `}`, `)`, `then`, `do`, `else`, `elif` so brace groups and if/then/while/do bodies are recognised as command positions. Also lex with `shlex.shlex(punctuation_chars=";&|()`")` so split-quote forms like `echo done; r''m -rf /tmp/x` and `echo done;r''m` tokenise as `[..., ';', 'rm', ...]` and the command position check fires. Added a small `find -exec CMD ... ;` / `-execdir CMD ... ;` pass so `find . -exec rm -f {} +` and friends are caught even though the direct token is at argument position to `find`. Dynamic Hugging Face imports were treated as no-HF-in-scope. The upload-method gate now also resolves `__import__('huggingface_hub')`, `importlib.import_module('huggingface_hub')`, and bare `import_module('huggingface_hub')` (via `from importlib import import_module`) as HF imports, so HfApi().upload_file via dynamic import is still blocked. RLIMIT_NOFILE: setrlimit(NOFILE, (16384, 16384)) silently failed if the parent's hard cap is below the requested value; the broad except swallowed the OSError and left the sandbox at the parent's default. Clamp the requested value to the inherited hard limit before calling setrlimit. Test cleanup: the existing test_cat_with_word_source_allowed had `assert ... or True` so it could not fail; rewrote it to assert the actual return value plus the two membership checks. Added parametrised coverage for shell prefix wrappers, find -exec / xargs, brace groups, if/then, while/do, split-quote command-name forms, and dynamic HF import upload patterns. Test: - python -m pytest studio/backend/tests/test_sandbox_tools.py -q -> 90 passed (was 67 before this commit) - full studio/backend/tests/ minus llama_cpp_load_progress_live and GPU CUDA_VISIBLE_DEVICES tests (pre-existing isolation flake) -> 1063 passed * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: catch bare-name HF upload calls in AST gate `from huggingface_hub import upload_file; upload_file(...)` is a canonical HF call shape that the previous Attribute-only check missed: the bare-name call lands as ast.Name (not ast.Attribute), so the fuzzy gate skipped it. Extend _method_call_is_hf_upload to also match ast.Name when HF is in scope. Same import-gating discipline as the Attribute branch, so paramiko/boto3 and locally-defined `def upload_file(...)` helpers without HF imports still pass. Pins: 4 new TestHfUploadImportGate cases (upload_file/folder/create_commit bare-name imports blocked; local upload_file without HF import allowed). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: scope HF uploads to sandbox-local literals; block env / token leaks The previous gate dropped every HF upload call. Two refinements make it precise enough to allow legitimate sandbox->HF uploads while still catching credential / file exfil: - path_or_fileobj / folder_path / create_commit operation paths must be sandbox-local relative-path literals (no '/', '~', drive letter, or '..' segments). Variable / dynamic paths are rejected. - Any positional or keyword argument that statically resolves to os.environ / os.environ.get / os.getenv / bare getenv / subprocess shape readers is rejected (env-var exfil). - token / hf_token / api_token / api_key / auth_token / access_token / password / secret kwargs are always rejected; sandbox env strips all parent credentials by construction, so any value here is hard-coded or lifted. Recursive subtree walk in _reads_env_or_secret catches wrapper shapes (str(os.environ), json.dumps(os.environ.items()), etc.). Add TestSandboxEnvIsolation: pin that _build_safe_env builds the env from a whitelist, not by stripping. Cover Linux/macOS/WSL/Windows secret shapes. The whitelist is PATH / HOME / TMPDIR / LANG / TERM / PYTHONIOENCODING (+ VIRTUAL_ENV / SystemRoot when applicable); HOME points at the sandbox workdir, so HF / wandb / aws SDKs cannot reach the operator's ~/.cache credentials. Test classes added: - TestHfUploadSandboxLocalPaths (relative literals allowed; absolute, drive-letter, '~', '..', mid-path traversal, dynamic vars, and open() of unsafe paths blocked, including create_commit recursion). - TestHfUploadEnvAndSecretLeakBlock (os.environ subscript/get/getenv, bare getenv, subprocess.check_output, str(os.environ), token=, hf_token=, api_key=, and create_commit operations referencing env). - TestSandboxEnvIsolation (no parent secret leaks into sandbox env). 131 tests in test_sandbox_tools.py pass. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…ll_id (unslothai#5488) * studio: scope cancel-cleanup to in-flight tmp dirs; walk back tool_call_id Two follow-ups to unslothai#5375's training and chat hardening. _cleanup_cancelled_checkpoints used to rmtree every checkpoint-N directory on Cancel. That is the opposite of what the user expects. A user cancelling an 8h run with save_steps=2000 loses every completed checkpoint they could have resumed from. The 67 MB residue the audit memo flagged is the HF Trainer atomic-rename partial (tmp-checkpoint-N), not the completed ones. The cleanup now targets only tmp-checkpoint subdirs; completed checkpoint-N directories are user-owned and stay. Symlinked output_dir and symlinked children are skipped so the realpath containment cannot be levered into deleting arbitrary content via a symlink trick. ChatMessage._validate_role_shape stamped a random secrets.token_hex id on tool messages with no tool_call_id. That id is uncorrelated with the prior assistant tool_calls id, so strict passthrough backends (OpenAI, Anthropic) reject the request as orphaned and llama.cpp treats the tool result as "no preceding call" and hallucinates. The synthesis moves up to ChatCompletionRequest, where the whole conversation is visible: for each tool message missing an id we walk back to the most recent assistant turn with tool_calls (stopping at user turns), prefer a function.name match, otherwise take the first unconsumed tool_call. Synthesis is the fallback when no candidate assistant turn exists, preserving the prior round-trip guarantee for orphaned tool messages. Tests: - test_cleanup_cancelled_checkpoints.py (new): pins that completed checkpoint subdirs survive, tmp-checkpoint partials are removed, non-int suffixes (checkpoint-final, checkpoint-best) are left alone, output_dir outside outputs_root is refused, symlinked output_dir and symlinked child are both skipped, missing dir is a no-op. - test_inference_model_validation.py: 6 new walkback cases covering name-match preference, first-unconsumed fallback, explicit-id passthrough, multi-tool-result pairing, synth-on-no-parent, and no-cross-user-turn invariant. - test_openai_tool_passthrough.py: the two ChatMessage-level synth-on-missing tests are rewritten to assert that the per- message validator now leaves tool_call_id untouched; resolution coverage lives in the request-level tests above. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: explicit tool_call_id reserve, numeric tmp-checkpoint suffix only Reviewer follow-ups to the training-cleanup + tool_call_id walkback PR. tool_call_id walkback: a mixed assistant turn with [call_a, call_b] followed by a tool result that carried tool_call_id="call_a" and a sibling tool result with no id resolved to ['call_a', 'call_a'] because the explicit id never reserved call_a in the consumed set. Added a pre-pass over the message list that walks back from every role="tool" message carrying an explicit id and marks the matching (asst_idx, tc_idx) consumed, then the missing-id walkback runs against that pre-populated set. The second result now resolves to call_b. While here, also harden the function-shape check: if a provider ships a malformed tool_call where `function` is a string rather than a dict, the old `(tc.get("function") or {}).get("name")` raised AttributeError on the string's .get; now isinstance-gated so the walkback falls through to the fallback id without raising. Cancel cleanup: `tmp-checkpoint-*` is too broad. HF Trainer's in-flight partials are always `tmp-checkpoint-<integer-step>`, so constrain the cleanup regex to `^tmp-checkpoint-\d+$`. A user folder named `tmp-checkpoint-final`, `tmp-checkpoint-backup`, or `tmp-checkpoint-user-notes` is now preserved. ChatMessage docstring still pointed at the pre-PR contract that required `tool_call_id` on every role="tool" message. Updated to say missing ids are accepted at message scope and resolved at ChatCompletionRequest scope. Inline comment above the cancel-cleanup call now describes the actual behaviour (in-flight tmp partials, completed checkpoints preserved). Test: - python -m pytest studio/backend/tests/test_inference_model_validation.py studio/backend/tests/test_cleanup_cancelled_checkpoints.py studio/backend/tests/test_openai_tool_passthrough.py -q -> 76 passed (was 67 before this commit; +2 walkback regression tests, +1 numeric-suffix preservation test) * studio: trim verbose comments in cleanup + tool_call_id walkback Move the HF tmp-checkpoint regex to module scope as a named constant. Drop the multi-paragraph docstring on _cleanup_cancelled_checkpoints and the inline call-site rationale; the function name + the test class already cover the why. Compress _resolve_missing_tool_call_ids docstring from a six-line explanation to two. Same logic, fewer in-flow tutorials. 76 tests in cleanup + inference-model-validation + tool-passthrough pass. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…nslothai#5489) * studio: proxy-aware login rate-limit; allow google favicons in CSP Two follow-ups to unslothai#5375's auth + headers hardening. Login rate-limit: The per-IP bucket keyed on request.client.host alone. Behind any reverse proxy or shared NAT it lumps everyone together (one user's typos lock everyone out for 60 seconds; the 429 detail leaked the proxy/internal IP back to clients). The bucket key is now (client-ip, username.lower) so: - one wrong-password run does not block another user from the same IP - one IP does not block the same user from a different IP The 429 detail body no longer interpolates the IP. Behind a proxy clients can set UNSLOTH_STUDIO_TRUST_FORWARDED=1 so the limiter honours X-Forwarded-For / Forwarded; off by default so a direct caller cannot spoof the header. CSP img-src: components/assistant-ui/sources.tsx renders citation favicons from https://www.google.com/s2/favicons. The current img-src allows t0..t3.gstatic.com (used for other Google-hosted icons) but not the main host the favicon URL points to, so every citation icon CSP-blocks and falls back to gray initials. Adding www.google.com to img-src is the same shape as unslothai#5409's connect-src HF allowlist fix. Tests: - test_login_rate_limit.py (new): _client_ip respects UNSLOTH_STUDIO_TRUST_FORWARDED for X-Forwarded-For and Forwarded; bucket key is composed of (ip, lower(username)) and isolates cross-user and cross-IP buckets; 429 detail does not contain the client IP; Retry-After header preserved. - test_middleware.py: new test_img_src_allows_google_favicons pins that www.google.com is in the img-src directive and the existing gstatic CDNs stay allowed. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: normalise forwarded IPs, IP-wide aggregate cap, unknown-user sentinel Reviewer follow-ups to the proxy-aware login rate-limit PR. Forwarded address normalisation: with UNSLOTH_STUDIO_TRUST_FORWARDED=1, raw `X-Forwarded-For` and `Forwarded: for=` values such as `198.51.100.7:50001` or `"[2001:db8::1]:50001"` were carried verbatim into the bucket key, so one client emitting a fresh source port per attempt split into many buckets and bypassed _LOGIN_MAX_FAILS. _normalize_forwarded_addr now strips quotes, optional `[..]:port` for IPv6 and `host:port` for IPv4, and validates as an IP literal; garbage values fall through to the direct request.client.host. Forwarded parsing also isolates the first forwarded-element so a multi-element header cannot create attacker-controlled bucket strings. Spray protection: the (ip, username) key removed the aggregate per-IP throttle the pre-PR limiter provided. A client rotating nonexistent usernames produced [401, 401, 401, 401, 401, 401] where pre-PR produced [401, 401, 401, 401, 401, 429]. Restored the aggregate via a parallel _LOGIN_IP_BUCKETS table (max 30 fails / 60s per IP) checked alongside the per-(ip, username) bucket; both buckets must be cleared on a successful login. Bucket cardinality: every distinct unauthenticated username allocated a new (ip, username) bucket entry without bound. 1,000 random usernames from one IP produced 1,000 buckets. Failures whose username does not exist now record into a single sentinel key (ip, "\x00unknown-user") so cardinality stays at one per IP for the unknown path. The known-user path additionally enforces a global hard cap (_LOGIN_MAX_BUCKETS = 4096) that prunes stale empty buckets on overflow and otherwise folds the failure into the per-IP bucket only. Test: - python -m pytest studio/backend/tests/test_login_rate_limit.py -q -> 19 passed (was 12 before this commit; +5 forwarded-address normalisation, +1 sentinel bucket, +1 bucket cap) CSP comment refreshed to mention `www.google.com` alongside *.gstatic.com so future readers see why the host is allowlisted. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: tokenise img-src assertion to silence CodeQL substring rule The new CSP google-favicon test used 'host string in directive string' which CodeQL flagged as py/incomplete-url-substring-sanitization (the substring could appear at an arbitrary position in a URL). The assertion is checking a CSP directive, not URL sanitisation, but splitting the directive on whitespace and asserting against the tokenised source list expresses the same intent and matches the exact CSP source expression. CodeQL no longer treats it as a URL substring check. Test: python -m pytest studio/backend/tests/test_middleware.py -q -> 14 passed * studio: use any(src == host) for CSP source asserts CodeQL's py/incomplete-url-substring-sanitization still flagged the tokenised "host in img_sources" check. Switching to `any(src == host for src in img_sources)` makes the comparison an exact-equality (not substring) match, which the rule does not flag. Test: python -m pytest studio/backend/tests/test_middleware.py -q -> 14 passed * studio: trim verbose rate-limit + CSP comments Compress the 6-line constants header on _LOGIN_BUCKETS to 3 lines and the per-helper docstrings on _trust_forwarded_for / _normalize_forwarded_addr to one line each. Same code, fewer in-flow tutorials. Note in the CSP comment that www.google.com is the active favicon host (used by sources.tsx for s2/favicons citations); *.gstatic.com stays as legacy faviconV2 coverage but the SPA no longer fetches it. 33 tests in test_login_rate_limit.py + test_middleware.py still pass. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…, current-password input (unslothai#5490) * studio/frontend: wire logout, singleflight refresh, shared 422 helper, current-password input Four frontend follow-ups to unslothai#5375 that the train-api fix in unslothai#5409 did not cover. Log out: features/auth/api.ts:logout() was a synchronous clearAuthTokens() with no call to /api/auth/logout, and the SPA exposed no Log out menu item at all. Refresh tokens stay valid server-side for their entire lifetime even after the user "leaves". logout() is now async and POSTs to /api/auth/logout (best-effort, swallows network errors) so storage.revoke_user_refresh_tokens fires server-side. The account dropdown in components/app-sidebar.tsx gains a Log out item between Help and Shutdown that calls logout() then navigates to /login. refreshSession singleflight: The backend now consumes the refresh token atomically on /api/auth/refresh, so two concurrent refreshes race; the loser 401s and the user is force-logged-out. This reproduces on essentially every page that fires multiple API calls in parallel after access- token expiry. refreshSession now holds a module-level inflight promise: first caller mints it, subsequent callers await the same one, and the slot clears in finally. Shared formatDetail helper: Roland's unslothai#5409 fix lived inside train-api.ts. Other api modules (chat-api.ts, export-api.ts, history-api.ts, datasets-api.ts, recipe-studio/api/index.ts) still rendered FastAPI array-detail 422s as either "Request failed (422)" (chat-api.ts's typeof-string gate) or "[object Object]" (the others). format-fastapi-error.ts lifts the helper into one place: formatFastApiDetail unpacks the array, readFastApiError reads a Response into the best human-readable string. All five sibling api modules now use it. recipe-studio also swaps ?? for the helper's truthy-formatted check so an array detail no longer short-circuits to "[object Object],[object Object]". Current password input: features/auth/components/auth-form.tsx in change-password mode showed only New password and Confirm password; currentPassword defaulted to window.__UNSLOTH_BOOTSTRAP__?.password. On admin-forced must_change_password resets the bootstrap is empty and the form short-circuits with "Unable to initialize setup. Reload the page". A Current password input is now rendered in change-password mode, pre-filled from the bootstrap when present so first-boot UX is unchanged. Build: - npm run typecheck clean - npm run build produces a fresh dist - install.sh rebuilds dist on next install.sh --local * studio/frontend: logout refresh-retry, generation guard, two missed 422 sites, password toggle Reviewer follow-ups to the auth-UX PR. Logout server-side revoke missed the expired-access case. /api/auth/ logout requires a valid access JWT and only then calls storage.revoke_user_refresh_tokens(). When the access token had expired but the 7-day refresh token was still valid, logout() posted once, got 401, swallowed it, and cleared local state, leaving the refresh token alive on the server. logout() now retries once: on 401 with a refresh token present, it calls refreshSession() to rotate, then re-posts /api/auth/logout with the new access token. Both branches still clearAuthTokens in finally. In-flight refresh could repopulate localStorage after logout. A background refreshSession() that started before the user clicked Log out, but resolved after the local clear, wrote storeAuthTokens() back over the cleared state and effectively re-authenticated the SPA. Added a module-level logoutGeneration counter: each refresh captures the value on entry, logout() bumps the counter in finally before clearing, and the refresh's continuation drops its new token pair on the floor when the counter has moved. Two API client modules kept the pre-unslothai#5409 string-only 422 parser: - features/chat/api/providers-api.ts -> parseErrorText now calls formatFastApiDetail() so create / update / test / models requests surface field-level errors instead of "Request failed (422)". - features/chat/api/openai-containers.ts -> parseError now uses readFastApiError() so ttl_minutes / encrypted_api_key / container_id validation errors surface instead of "HTTP 422". recipe-studio/api/index.ts::uploadUnstructuredFile still had a local typeof-string detail check on both the 413 and the generic not-ok branches. Both branches now use readFastApiError() so array-shaped 422 details show field-level errors instead of a generic fallback. Password reveal toggle in change-password mode shared one showPassword state across Current password and New password, so the eye button on either field exposed both secrets. Added a separate showNewPassword state so New password's toggle is independent of Current password's toggle. Confirm password remains type="password" unconditionally. Test: - npm run typecheck clean - npm run build produces a fresh dist * studio/frontend: drop dynamic auth/api + auth/session imports in sidebar Log out's onSelect dynamically imported logout from "@/features/auth/api" and clearAuthTokens from "@/features/auth/session". Both modules were already statically imported via "@/features/auth" elsewhere in the app, so rolldown split auth/session into its own chunk and the main bundle then re-imported back from that chunk to reach the zustand-backed usePlatformStore. The resulting circular dependency left session.js's 'create' binding undefined at module init, throwing 'TypeError: t is not a function' from var usePlatformStore=create<...> on /login, /change-password, and any route that touches the platform store before the main bundle finished evaluating. Static-import logout and clearAuthTokens from "@/features/auth" so both are tree-shaken into the main bundle, eliminating the session side-chunk and the cycle. Exported clearAuthTokens from auth/index.ts since it was previously only reachable through the session.ts path module. Test: - npm run typecheck clean - npm run build no longer emits a session-*.js chunk - Local Playwright pre/post: /login, /change-password, /chat render with 0 page errors on the rebuilt dist (pre: 'TypeError: t is not a function' on every route) * studio/frontend: decouple must_change_password from storeAuthTokens CodeQL's js/clear-text-storage-of-sensitive-information rule traced must_change_password through loginWithPassword() into localStorage.setItem(AUTH_MUST_CHANGE_PASSWORD_KEY, ...) at session.ts:46 and flagged the line as new high-severity. The flag is a boolean derived from the same response payload as the access token, so the data-flow analyser treated it as JWT-equivalent sensitivity. Removed the third parameter from storeAuthTokens so it only writes the two JWTs. Each caller (refreshSession, tauri-auto-auth, two spots in auth-form) now calls setMustChangePassword(...) explicitly with the boolean. The boolean is no longer reachable from a function whose name CodeQL treats as a password sink. Test: - npm run typecheck clean - npm run build produces no session-*.js side-chunk - Local Playwright over /login, /change-password, /chat: 0 page errors (parity with the previous fix) * studio/frontend: suppress CodeQL clear-text-storage on must_change_password flag CodeQL's js/clear-text-storage-of-sensitive-information rule traces the must_change_password boolean back through loginWithPassword's TokenResponse and flags any localStorage.setItem of that boolean as sensitive-clear-text storage. The value is a status flag (route to /change-password vs straight to /chat); it carries no credential material. Decoupling setMustChangePassword from storeAuthTokens in the previous commit only moved the alert one line over because the analyser still recognises the source. Add the standard lgtm suppression comment, with a brief rationale, on the .setItem call. Test: npm run typecheck clean, npm run build still produces a fresh dist with no session-*.js side-chunk. * studio/frontend: encode must_change_password as key presence to silence CodeQL setMustChangePassword wrote String(required) which is a derivative of the boolean and which CodeQL's clear-text-storage analyser traces back through loginWithPassword's TokenResponse, flagging the .setItem call as sensitive-information storage. Switch the encoding so the stored value is the literal string "1" when the flag is set, and the key is removed when not. The reader switches from `=== "true"` to a presence check (`!== null`). This breaks the boolean's data flow into .setItem: the value argument is now a constant string literal in the truthy branch and the falsy branch issues .removeItem (no stored value to taint). The behaviour contract is identical (the flag is present iff the user must change their password). Test: npm run typecheck clean, npm run build produces a fresh dist, local Playwright probe over /login, /change-password, /chat: 0 page errors on the rebuilt dist. * studio/frontend: trim verbose comments in auth api + session Compress singleflight + logoutGeneration paragraphs in api.ts from ~9 lines each to ~3. Same logic. Merge mustChangePassword / setMustChangePassword's separate two-paragraph CodeQL rationales into one shared comment above both functions. Typecheck + build still clean.
… a synthetic CI test (unslothai#5376) * tests/studio: end-to-end Windows GPU detection mock test (unslothai#5106) Locks in the combined fix from unslothai#5322 + unslothai#5324 with a synthetic Windows scenario that CI runners without GPUs can execute. The test packs the real PyPI win_amd64 wheel layouts (cu12 modular and the new unsuffixed cu13 nvidia/cu13/bin/x86_64 layout) plus the exact filename set of the upstream b9103 cudart-llama-bin-win-cuda bundles, then mocks nvidia-smi output and asserts that: * Studio's nvidia-smi probe parses the CSV and reports the GPU. * After PR unslothai#5322 the install_dir/build/bin/Release/ tree contains all three cudart bundle DLLs alongside llama-server.exe. * After PR unslothai#5324 the PATH built by start_llama_server's win32 branch lists pip nvidia + torch/lib dirs in addition to the binary_dir. * cudart64_X.dll, cublas64_X.dll, and cublasLt64_X.dll are each reachable from at least one PATH entry, with cudart specifically reachable from BOTH the install dir and a pip nvidia dir (defence in depth). * Bare venvs without pip nvidia wheels still work via unslothai#5322's binary_dir drop; pre-unslothai#5322 installs still work via unslothai#5324's PATH augmentation. * A reconstructed pre-PR scenario (cudart absent from binary_dir and pip dirs not on PATH) leaves cudart unreachable, confirming the test would catch a future regression. Bonus housekeeping in studio/install_llama_prebuilt.py: drop the pointless f-prefix on the literal "llama-" in the windows_cuda_attempts pairing guard (no behaviour change; lint nit flagged in the post-merge review). The mocks model real artifact contents I verified empirically: * pip download nvidia-cuda-runtime --platform win_amd64 produces nvidia/cu13/bin/x86_64/cudart64_13.dll. * unzip on the b9103 cudart-llama-bin-win-cuda-13.1-x64.zip produces exactly cudart64_13.dll + cublas64_13.dll + cublasLt64_13.dll, no executables. * objdump -p on the b9103 ggml-cuda.dll shows a static PE import on cublas64_13.dll (the root cause of unslothai#5106 when cublas64_13.dll is unreachable). Refs unslothai#5106 unslothai#5322 unslothai#5324 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test_5106_windows_gpu_detection_mock: don't shadow real httpx This file's name sorts before every other file in studio/backend/tests/ (starts with the digit '5'), so pytest collects it first. The previous ``sys.modules.setdefault("httpx", _httpx_stub)`` ran before any other test imported real httpx, which meant the stub permanently shadowed the real module for the rest of the collection. Tests that did ``from httpx import HTTPError, Response`` (test_anthropic_messages, test_browse_folders_route, test_training_*, etc) then failed at collection with ``ImportError: cannot import name 'HTTPError'`` because the stub did not define those names. The existing test_llama_cpp_windows_nvidia_path.py did not trigger the same issue because it sorts after test_a* / test_b* / etc, by which point the real httpx has already been imported and setdefault is a no-op. Switch the stub installation to ``importlib.util.find_spec(name) is None`` so we only fall back to the stub when the real module truly is not installed. Backend CI installs httpx, structlog, and the studio/backend/loggers package is reachable via the sys.path augmentation a few lines above, so on CI all three find_spec calls succeed and no stubs are installed at all. Also add HTTPError and Response to the stub module for the offline case, so anyone running this test outside CI with httpx absent still gets a stub that satisfies the broader test suite's imports. Refs unslothai#5106 * test_5106 + llama_cpp: extract win32 PATH helper and harden the regression test Follow-up to PR unslothai#5376's review feedback. Three real findings from the bot reviewers, plus one stale one. 1. (codex P2 line 201, gemini medium line 209) The regression test's _build_path_dirs_like_start_llama_server hand-copied the win32 branch of LlamaCppBackend.start_llama_server, so a future drop or reorder of _windows_pip_nvidia_dll_dirs(sys.prefix) in production would have passed the test silently. Extract a new staticmethod LlamaCppBackend._build_windows_path_dirs (binary_dir, prefix, cuda_path). Production start_llama_server now calls this helper. The test's wrapper is reduced to a one-line delegate that forwards to the staticmethod, so the regression asserts against the exact production logic instead of a parallel copy of it. 2. (codex P2 line 245) test_nvidia_smi_probe_reports_synthetic_gpu did not clear CUDA_VISIBLE_DEVICES. On a shared GPU runner with the variable set in the parent shell, _get_gpu_free_memory() filters the mocked CSV and returns [] or falls through to the torch fallback. Cleared CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES via monkeypatch.delenv(..., raising=False). 3. (codex P2 line 66) _maybe_stub gated on importlib.util.find_spec ("loggers"), which returns a spec because studio/backend/loggers/ is on sys.path. But the actual import chain loads loggers/handlers.py which does `from fastapi import Request, Response` at module load. In a lightweight env without fastapi installed, the stub never lands and `from core.inference.llama_cpp import LlamaCppBackend` raises during collection. Switched _maybe_stub to a real import attempt under try / except ImportError so the stub falls into place when the package is discoverable but not importable. CI has fastapi so this is purely a developer- machine ergonomics fix. The fourth comment (codex P1 line 85 "Keep the httpx stub from leaking across tests") was already addressed by 7437e73, which replaced the unconditional sys.modules.setdefault with the find_spec-gated _maybe_stub. No code change needed. Production behaviour is unchanged: _build_windows_path_dirs returns exactly the same ordering start_llama_server used inline ([binary_dir, *pip_dirs, cuda_bin?, cuda_bin_x64?]). Verification (run inside studio/backend): pytest tests/test_5106_windows_gpu_detection_mock.py -v -> 10 passed pytest tests/test_llama_cpp_*.py tests/test_llama_server_args.py tests/test_5106_windows_gpu_detection_mock.py -q -> 171 passed CUDA_VISIBLE_DEVICES=1 pytest tests/test_5106_windows_gpu_detection_mock.py::TestWindowsGpuDetectionAfter5106Fix::test_nvidia_smi_probe_reports_synthetic_gpu -> 1 passed * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Rename Windows GPU detection test to a generic filename and trim comments - studio/backend/tests/test_5106_windows_gpu_detection_mock.py -> studio/backend/tests/test_windows_gpu_detection_mock.py The file is the generic regression suite for Windows GPU detection; encoding the issue number in the filename is noise. - Shorten module docstring, helper docstrings, per-test docstrings and inline comments in the renamed test file. No behaviour change, all 10 cases still pass. - Shorten the _build_windows_path_dirs docstring in studio/backend/core/inference/llama_cpp.py and update the test-path reference; trim the win32 call-site comment to one line. Local verification: - pytest studio/backend/tests/test_windows_gpu_detection_mock.py -- 10 passed. - pytest studio/backend/tests/test_llama_cpp_windows_nvidia_path.py studio/backend/tests/test_llama_server_args.py studio/backend/tests/test_windows_gpu_detection_mock.py -- 110 passed. * Studio: harden _wait_for_health against transient httpx ReadError The probe loop in LlamaCppBackend._wait_for_health only caught ConnectError and TimeoutException. On Windows, when llama-server.exe accepts the TCP probe and then dies before sending HTTP headers, the peer process RST closes the socket. httpx maps this to ReadError ("WinError 10054 -- An existing connection was forcibly closed by the remote host"), which fell through the except clause and bubbled out of _wait_for_health, the routes/inference.py load_model handler, and back to /api/inference/load as an opaque 500. The crash diagnostic Studio actually wants to surface lives on the self._process.poll() branch at the top of the loop body: "llama-server exited with code X. Output: ...". We never reached that branch on the WinError 10054 path because the very first probe blew up. Expand the except to also swallow ReadError and RemoteProtocolError so the next 0.5-second iteration runs the poll() branch. Outcomes: * Process really died: structured exit-code + last-stdout log line. * Single transient probe blip: silently retried; load succeeds. Adds studio/backend/tests/test_llama_cpp_wait_for_health.py with five cases covering happy-path 200, transient ReadError + dead process, RemoteProtocolError + dead process, ConnectError cycling until success, and dead process before the first probe. The new cases would have failed against the old except clause -- ReadError / RemoteProtocolError would have propagated instead of returning False. Found while triaging the Windows Studio GGUF CI flake on this PR's 5a6ddc3 push: llama-server.exe (b9203 prebuilt) crashed within 2.2 s of launch on the GPU-less runner, and Studio reported "WinError 10054" instead of an upstream-tag-attributable exit-code line. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: danielhanchen <michaelhan2050@gmail.com>
…#5527) * Studio: auto-enable MTP speculative decoding for MTP GGUFs Detect Unsloth's MTP (multi-token-prediction) GGUFs and auto-emit the right --spec-type draft-mtp flags for llama-server (llama.cpp PR #22673), so users get the speedup without configuration. Detection prefers the GGUF metadata field <arch>.nextn_predict_layers (verified on Qwen3.6-27B-MTP-GGUF / qwen35 and Qwen3.6-35B-A3B-MTP-GGUF / qwen35moe). Falls back to a -MTP marker in the identifier / filename so HF-mode loads can detect MTP from the repo name before the GGUF is downloaded. Flag presets follow the Unsloth MTP guide: GPU: --spec-type draft-mtp --spec-draft-n-max 6 CPU/Mac: --spec-type draft-mtp --spec-draft-n-max 3 \ --spec-type ngram-mod --spec-ngram-mod-n-match 24 \ --spec-ngram-mod-n-min 48 --spec-ngram-mod-n-max 6 User overrides win: if the caller passes --spec-type / --spec-default via unsloth run / unsloth studio run pass-through (or HTTP llama_extra_args), the auto-emit steps aside so llama-server only sees the user's flag. Scalar tuning knobs like --spec-draft-n-max compose with the auto preset via llama-server's last-wins parsing. _already_in_target_state mirrors the same promotion so a repeat /load with unchanged settings against an MTP backend running draft-mtp short-circuits cleanly instead of forcing a reload. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Studio: warn when llama.cpp prebuilt is too old for MTP Layered on unslothai#5527. Adds a one-shot llama-server --help capability probe so users get a clear signal when their prebuilt is missing MTP support, plus a graceful fallback if they load an MTP GGUF against an outdated binary. What's surfaced: 1. Startup log + stderr line in main.py:lifespan() if MTP isn't advertised: WARNING: llama.cpp prebuilt is missing MTP support (--spec-type mtp / draft-mtp). Run `unsloth studio update` to refresh it. MTP GGUFs will load without speculative decoding. 2. Load-time graceful fallback in load_model's spec block: skip the auto-emit and log a clear warning instead of letting llama-server fail with an unknown-flag error. 3. /api/inference/status now returns llama_cpp_supports_mtp: bool so the frontend can show a banner / popup. Probe internals: - Class-level cache keyed on (binary_path, mtime). One subprocess call the first time, instant thereafter. Touching the binary (e.g. via `unsloth studio update`) invalidates the cache automatically because the mtime changes, so the new build is picked up without restarting the server. - Recognises both upstream naming forms: the original draft-mtp from llama.cpp PR #22673 and the renamed mtp variant in later commits. - Spec block uses whichever token the binary accepts so we emit the right value regardless of which release the user has. Tests: - 6 new cases in test_llama_cpp_mtp_detection.py covering each probe variant (draft-mtp, renamed mtp, pre-MTP build, missing binary, mtime-based cache invalidation). - Existing 38 MTP detection cases still pass; broader 188-test regression suite (server args, reload inheritance, gguf metadata, load progress, context fit, model validation) still green. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…thai#5529) * Studio: warn when llama.cpp prebuilt is at least 3 days behind Layered on unslothai#5528. Generalises the MTP-specific staleness warning to every llama.cpp prebuilt update, not just the ones that add MTP. If the installed prebuilt is at least 3 days old AND its tag differs from the latest published tag on the helper release repo (default unslothai/llama.cpp), Studio nudges the user to run "unsloth studio update". How it works Reads the install marker UNSLOTH_PREBUILT_INFO.json that install_llama_prebuilt.py already writes to install_dir. The marker carries the installed tag, the helper repo, and an installed_at_utc timestamp. Studio compares those against the latest published tag from the GitHub releases API for the helper repo. GitHub fetch is cached at two levels: - Process-level memo for /status hot path. - Disk-level cache (24h TTL) at ~/.unsloth/studio/cache/llama_cpp_freshness/ so cold-start Studio launches do not always hit the API. On a transient fetch failure (offline, rate-limited) we keep the last-good disk value alive rather than poisoning the cache with None. The check fails open: if anything is missing (marker, timestamp, GitHub response), stale stays False so users never see a misleading banner. Surfaced in two places 1. Startup banner (logs + stderr) in main.py:lifespan(), alongside the MTP capability probe added in unslothai#5528. Single line, e.g.: WARNING: llama.cpp prebuilt is 5 days behind: installed b9190, latest b9300. Run "unsloth studio update" to refresh. 2. /api/inference/status now returns: llama_cpp_prebuilt_stale: bool llama_cpp_installed_tag: str | None llama_cpp_latest_tag: str | None so the frontend can render a banner / popup with the actual tag delta the user is missing. 3-day threshold Mirrors the typical Unsloth llama.cpp release cadence. Anything shorter would nag users who restart Studio at the wrong moment; longer leaves real bugs sitting on the user's machine. Configurable via the threshold_days kwarg if a future call site wants a different window. Tests 17 new cases in tests/test_llama_cpp_freshness.py cover marker discovery in both cmake and root install layouts, missing / invalid marker, GitHub fetch caching across process restarts (disk cache hit after the in-memory cache is reset), the stale / not-stale decision matrix (tag mismatch + age threshold), fail-open behaviour when GitHub is unreachable, custom threshold, singular/plural day in the warning string, and unparseable installed_at_utc. The broader 205-test inference regression suite still passes. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…unslothai#5512) * studio: extend offline DNS auto-detect to inference parent + training unslothai#5505 fixed the GGUF/llama-server load path. Studio still has two adjacent code paths that burn ~30-60s of soft-failed timeouts before the worker subprocess starts when DNS to huggingface.co is dead and the model is already in the local HF cache. Inference parent process (routes/inference.py:load_model): * ModelConfig.from_identifier now runs inside _hf_offline_if_dns_dead so the LoRA-detect hf_model_info call and the urllib config probes in utils/transformers_version.py short-circuit when DNS is dead. * utils/models/model_config.py: extracted the inline HF_HUB_OFFLINE/ TRANSFORMERS_OFFLINE check used by list_gguf_variants and detect_gguf_model_remote into a shared _env_offline() helper, then reused it to gate the LoRA-detect hf_model_info call. * utils/transformers_version.py: _check_tokenizer_config_needs_v5 and _check_config_needs_550 now early-return False when offline instead of issuing a 10s urllib.urlopen against huggingface.co/raw/main. Training worker (core/training/worker.py:run_training_process): * Add the same 2s DNS probe used by core/inference/worker.py at the top of the training subprocess. On failure, set HF_HUB_OFFLINE, TRANSFORMERS_OFFLINE, and HF_DATASETS_OFFLINE before the rest of the subprocess imports torch/transformers/unsloth, so every from_pretrained, snapshot_download, and load_dataset call below resolves from cache. Scope is per-subprocess; the orchestrator always spawns a fresh worker per training run. Training trainer (core/training/trainer.py:load_model): * Skip the proactive hf_model_info gated-repo probe when _env_offline() is true. The API is unreachable anyway, and a gated model that is already cached is exactly the scenario the user is trying to train against. from_pretrained surfaces the real error if access is actually denied. Tests (tests/test_offline_inference_parent.py, 7 new cases): * _env_offline truthy/falsy parsing across HF_HUB_OFFLINE and TRANSFORMERS_OFFLINE. * transformers_version urllib short-circuit when offline. * LoRA detect hf_model_info skip when offline. Existing tests/test_offline_gguf_cache_fallback.py still passes (26 cases) because the inline env check was extracted, not changed. * tests: prefer real httpx over stub in offline-test files The studio test stub convention only included the 6 httpx exception names that existed callers needed. Newer huggingface_hub (1.15+) imports HTTPError, Response, Request, HTTPStatusError, AsyncClient, and more at module import time. When httpx is truly absent the stub chase becomes a treadmill. Use the real package when installed (the CI install list already includes httpx, so this is the production environment). Fall back to the stub only when httpx is genuinely missing. No code under test changes. * studio: detect cached LoRA adapters offline; tighten test Two follow-ups from the review pass on unslothai#5512: * ModelConfig.from_identifier no longer skips the remote LoRA-detect hf_model_info call when _env_offline() is true. huggingface_hub short-circuits the call via OfflineModeIsEnabled in ~0ms when HF_HUB_OFFLINE is set, so the original 25s concern was moot once routes/inference.py wrapped the call in _hf_offline_if_dns_dead. Skipping the API meant users with a cached LoRA adapter (adapter_config.json on disk) got is_lora=False and the load failed. After the API call (which raises fast offline) a new cache-fallback walks the HF cache snapshot for adapter_config.json via the existing _iter_hf_cache_snapshots helper. * test_hf_model_info_not_called_when_offline replaced. The old test raised AssertionError inside production code that catches Exception, so it passed even if the call happened. New tests use MagicMock and assert call_count >= 1, plus a fixture that stages a fake HF cache with adapter_config.json to verify the offline cache detection. Test count goes from 7 to 8 in test_offline_inference_parent.py. Combined with test_offline_gguf_cache_fallback.py: 34 pass in 9.75s. * Fix/adjust offline training DNS probe per PR unslothai#5505 review Same fix as unslothai#5505's _probe_dns_dead refactor: run gethostbyname on a daemon thread with join timeout so concurrent sockets in the parent interpreter never inherit a process-wide socket.setdefaulttimeout mutation. Adds a static-pin regression test that the inference parent file does not regress on this. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Trim verbose code comments per review feedback Shorten the longer explanatory comments added by this PR while keeping the WHY of each non-obvious branch: - trainer.py: collapse the 5-line proactive gated-check comment. - training/worker.py: trim the offline auto-detect preamble and the "logger isn't configured" note. - routes/inference.py: shorten the DNS-probe wrap rationale. - transformers_version.py: collapse the two urllib short-circuit notes. - model_config.py: shorten the LoRA detect + cache-fallback notes. - tests/test_offline_inference_parent.py: tighter module docstring, trim class docstrings, drop multi-line explainer comments inside the tests; behaviour and coverage unchanged (9/9 tests still pass). --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix ORPO text tokenization with processors * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Guard ORPO tokenizer rewrite anchor * Resolve processor pad_token_id and preserve preference data collators for ORPO Two follow-ups so the text-only ORPO + VL processor path works end to end on top of the build_tokenized_answer and tokenize_row rewrites: 1. Add orpo_trainer_processor_pad_token to rewrite processing_class.pad_token_id in ORPOTrainer.__init__ to fall back to processing_class.tokenizer.pad_token_id when the processor itself has no pad_token_id (Qwen3-VL, Gemma-3, etc.). Without this, DPODataCollatorWithPadding(pad_token_id=processing_class.pad_token_id) raises AttributeError before training starts. 2. Stop the outer UnslothORPOTrainer.__init__ collator-swap from clobbering DPODataCollatorWithPadding when the tokenizer is a processor without .pad. The swap to TransformersDataCollatorForLanguageModeling is now only applied to LM-style collators, so ORPO/DPO/CPO/KTO keep their own prompt/chosen/ rejected handling. Otherwise the collator can't pad ORPO rows and raises "You should supply an encoding ... that includes input_ids" at train time. Verified with Qwen3-VL-2B-Instruct ORPO + text-only data (training completes to max_steps, no AttributeError, no collator error) and Llama-3.2-1B-Instruct ORPO (losses and grad-norms bit-exact identical to main, so the change is a true no-op for plain text tokenizers). Extends tests/python/test_orpo_processor_text_tokenizer.py with three new unit tests covering the pad_token_id rewriter. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>
… Ubuntu 24.04 (unslothai#5517) * fix(studio/worker): inject --gcc-install-dir for HIP source builds on Ubuntu 24.04 On Ubuntu 24.04 + ROCm clang-20, the HIP source-build fallback in `_install_package_wheel_first` (causal-conv1d, mamba-ssm source fallback, flash-attn source fallback) dies at: /opt/rocm-X.Y/lib/llvm/lib/clang/20/include/__clang_hip_runtime_wrapper.h:112:10: fatal error: 'cstdlib' file not found Root cause: clang-20 picks the highest-numbered /usr/lib/gcc/x86_64-linux-gnu/<N> runtime dir by default. On 24.04 that's gcc-14, whose runtime objects ship in the gcc-14 package but whose C++ headers (/usr/include/c++/14) come from libstdc++-14-dev — NOT in the default apt set. libstdc++-13-dev IS in the default set, so /usr/include/c++/13 exists. clang has no way to discover that asymmetry and the build fails. Fix: new `_hipcc_gcc_install_dir()` helper iterates gcc 14 → 11 and returns the first /usr/lib/gcc/x86_64-linux-gnu/<N> dir where BOTH the runtime AND /usr/include/c++/<N> exist. The HIP branch of `_install_package_wheel_first` appends `--gcc-install-dir=<that path>` to HIPCC_COMPILE_FLAGS_APPEND before invoking pip. Respects an existing `--gcc-install-dir` in the env var (user-set takes precedence); preserves any other flags the user has set (appends to the end rather than overwriting). No-op on non-HIP, non-Linux, non-x86_64. Mirrors the same fix bbf004c added to studio/setup.sh for the llama.cpp HIP build branch (unslothai#5301), but via env var since pip-driven source builds can't take CMake flags directly. Verified on Ryzen AI MAX+ 395 / Radeon 8060S (gfx1151) / Ubuntu 24.04 / ROCm 7.13 nightly: `_hipcc_gcc_install_dir()` returns `/usr/lib/gcc/x86_64-linux-gnu/13`, which matches the manual workaround that already lets `pip install causal-conv1d` succeed on this hardware. Tests added (8 new in test_training_worker_flash_attn.py): - test_hipcc_gcc_install_dir_picks_highest_with_headers - test_hipcc_gcc_install_dir_picks_14_when_headers_exist - test_hipcc_gcc_install_dir_returns_none_when_no_match - test_hipcc_gcc_install_dir_returns_none_on_non_linux - test_hipcc_gcc_install_dir_returns_none_on_non_x86_64 - test_install_injects_gcc_install_dir_on_hip_source_build - test_install_appends_to_existing_hipcc_compile_flags - test_install_respects_user_gcc_install_dir - test_install_does_not_inject_env_on_cuda Per @danielhanchen's suggestion in unslothai#5434 (comment) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * review: apply gemini-code-assist suggestion on _run_kwargs env handling Use _run_kwargs.get("env", os.environ).copy() + key-mutation instead of rebuilding env from os.environ directly. Today both forms are equivalent (no earlier code in _install_package_wheel_first sets _run_kwargs["env"]), but the .get().copy() pattern survives any future env modification added upstream of this block without silently throwing it away. No behavioural change; tests already assert the final HIPCC_COMPILE_FLAGS_APPEND value, not the env-construction pattern. Per unslothai#5517 (comment)... (gemini-code-assist[bot]) --------- Co-authored-by: h34v3nzc0dex <h34v3nzc0dex@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* Studio: gate image input on a usable mmproj for GGUF vision models * Improve image gating and model capability sync Tighten image-handling and model capability syncing across the chat flow. Key changes: - chat-adapter: Replace per-message current-user image check with a simpler gate that blocks if ANY image is present in the outbound payload when the selected model cannot handle vision. Show the toast reason and flip the per-thread running flag on→off to avoid hanging wait promises before throwing. - shared-composer: Simplify and correct image-attachment gating for single vs compare modes. Use an attach-time gate that defers to send/ensureModelLoaded in compare mode, introduce attachUnavailableReason, and only block immediately for single-mode. Remove an unused models selector. - shared-composer: Sync the runtime models[] entry with the response from ensureModelLoaded so UI/send gates read fresh capabilities (isVision, isGguf, isAudio, audioType, hasAudioInput). This addresses catalog lag (e.g., GGUF mmproj arriving after the catalog snapshot). - UX tweak: the file-picker button no longer outright blocks on image availability; addFiles still filters images per-file and toasts appropriately. These changes prevent mid-stream server rejections, avoid deadlocks, and ensure model capability checks are accurate when attaching images or audio. * studio: only pass --mmproj to llama-server when effective_is_vision When a text-only GGUF (static is_vision=False) was paired with a family-matching mmproj path, the launcher appended both --mmproj and --spec-default, leaving llama-server in an inconsistent state while Studio reported is_vision=False. Gate the --mmproj flag on effective_is_vision so the launch command tracks the runtime capability the rest of Studio sees. * studio: reject image content in streaming /v1/responses for non-vision GGUF _responses_stream forwards the OpenAI request body directly to llama-server's /v1/chat/completions, bypassing the image-vs-vision guard that openai_chat_completions enforces for the wrapped path. Add the same check at the top of the streaming entry point so an SDK client that posts an image to a non-vision GGUF receives a typed 400 instead of an opaque downstream error. * studio: gate external chat providers in the image input helper External selections (cohere, deepseek, mistral, openrouter, ...) live in externalProviders, not in runtime.models[], so activeModel is undefined for them and the helper short-circuited to allow. Result: images attached to a non-vision external chat model were dropped silently downstream instead of rejected up front. Add providerTypeSupportsVision to external-providers.ts (false for known text-only providers, true for known vision-capable ones, null for unknown / custom self-hosted) and thread externalSupportsVision + externalModelLabel through the helper. shared-composer.tsx, runtime-provider.tsx (VisionImageAdapter.add), and chat-adapter.ts pre-stream gate all resolve the provider type and pass it. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…othai#5496) * studio/install: fix mac desktop shortcut spawning and lifecycle The macOS .app generated by install.sh ships a shell-shim wrapper that is unsigned and has no NSAppleEventsUsageDescription in its Info.plist, so AppleEvents from the bundle are denied by TCC. The launcher's `osascript ... tell application "Terminal" to do script ...` call silently fails and the script falls back to the headless nohup branch, where the user sees no Terminal window at all. Each click of the Desktop shortcut then leaks an unattached server (no PID file, no cleanup) and the launcher times out after 60s without ever opening a browser. Replace the AppleScript spawn with a `.command` file + `open -a Terminal`. Terminal handles `.command` natively through Launch Services, no AppleEvents permission required, works with unsigned bundles. The new design also decouples the studio server from the Terminal: - Server is started via nohup, detached from any TTY. Warm relaunches (server still alive) hit the existing fast path: the launcher's `_find_healthy_port` returns the running port and the browser opens in ~80ms with no Terminal involvement. - The `.command` file is a log viewer (`tail -F` of studio.log), not the server's parent. It also runs a watcher subshell that polls the server PID and kills `tail` when the server exits. This means clicking "Stop server" in the UI causes the Terminal window to drop to no-running-processes state, so the user can close the window without the "Do you want to terminate running processes" dialog. - A trap on HUP/INT/TERM/EXIT in the `.command` file sends SIGTERM (then SIGKILL at +0.5s) to the server PID, so closing the Terminal window also stops Studio. Best of both worlds: fast warm relaunch AND "close terminal == quit Studio". Also: - Drop POLL_INTERVAL_SEC from 1 to 0.25. With Python studio startup at ~2s, the 1s poll added up to 1s of slack between server-ready and browser-open. 0.25s tightens cold-launch latency at no meaningful CPU cost. - Refuse to install the `.app` bundle through a symlink. If a prior install (e.g. a --tauri build) left $HOME/Applications/Unsloth\\ Studio.app as a symlink, mkdir -p follows it and writes the new bundle contents through to the target. Detect and rm the symlink before mkdir -p. Test plan: - Existing studio-mac-update-smoke.yml CI runs install.sh end-to-end on macos-14 and asserts /api/health returns healthy. - Manual: click Desktop shortcut from cold state, Terminal opens with logs streaming, browser opens at ~2s. Re-click while Studio still running, browser opens in <200ms, no new Terminal. Click "Stop server" in the UI, Terminal closes cleanly with no prompt. Close Terminal via Cmd+W, server stops within 1s. * studio/install: trim verbose comments in _spawn_terminal * studio/install: harden trap quoting in generated .command The trap bodies in the .command file were written with broken quoting: trap "rm -f "$PID_FILE" 2>/dev/null" EXIT Shell parses this as three concatenated tokens ("rm -f " + unquoted $PID_FILE + " 2>/dev/null") then runs the trap. With paths that contain spaces, the unquoted expansion word-splits and the rm either no-ops or removes the wrong path. Default $HOME has no spaces so the bug is latent, but it should be space-safe. Switch both trap bodies to single-quoted form so $WATCHER_PID, $TAIL_PID, and $PID_FILE expand at signal time inside properly quoted positions. Shellcheck-clean on the generated .command. * studio/install: exec studio in nohup wrapper so PID is the server Without the explicit exec, `nohup sh -c "$_cmd"` runs `_cmd` as a child of the wrapper shell. Whether sh exec-optimizes that single command is shell-specific (macOS /bin/sh does, dash does, some bash configurations do not). When the optimization does not fire, `$!` records the wrapper PID rather than the studio PID, so: - the watcher in the generated .command monitors the wrapper, not the actual studio process; closing the Terminal can leave studio running if the wrapper exits first - SIGTERM from shutdown_studio goes to the wrapper rather than the server Force the replacement with exec so the recorded PID is always the studio process regardless of shell version. Flagged by both gemini-code-assist and codex in PR review; verified correct. * Fix orphan-on-spawn-failure, graceful kill, and nested symlink for PR unslothai#5496 Three issues found while testing the new macOS spawn path: 1. _spawn_terminal returned 0 even when 'open -a Terminal' failed, so the nohup'd server was left orphaned with no Terminal owner. Wrap the .command write + chmod + open chain in 'if {...}; then return 0; fi', and on failure SIGTERM the orphan (with a 3s grace) before falling through to the generic terminal-spawn fallback. 2. The generated .command sent SIGKILL only 0.5s after SIGTERM, shorter than studio/backend/run.py's _graceful_shutdown windows (5s inference + 5s export). Wait up to 12s for the server to exit on its own. 3. The .app symlink guard only checked the top-level path. If a prior corrupted install left Unsloth Studio.app/Contents (or its MacOS or Resources children) as a symlink, mkdir -p still wrote through them. Check all four bundle paths, and refuse to continue if the bundle path exists as a regular file. --------- Co-authored-by: Daniel Han <info@unsloth.ai>
* studio: add uninstall.sh and document it in README
The current uninstall guidance in README.md is `rm -rf ~/.unsloth/studio`,
which leaves behind everything that lives outside that path:
- ~/.local/share/unsloth/ (launcher script, studio.conf, studio.log,
icon assets)
- ~/Applications/Unsloth Studio.app (macOS bundle, orphaned and
pointing nowhere on next reinstall)
- ~/Desktop/Unsloth Studio (broken symlink after the bundle is gone)
- ~/Desktop/unsloth-studio.desktop (Linux)
- ~/.local/share/applications/unsloth-studio.desktop (Linux)
- /tmp/unsloth-studio-launcher-<uid>*.lock (lock dir, possibly stale)
- Launch Services cache entry for ai.unsloth.studio on macOS
- Any running `unsloth studio -p N` processes
Users who follow the documented uninstall and reinstall end up with the
new launcher layered on top of stale state from the previous install,
which has produced concrete bugs (e.g. self-referential symlink inside
the .app bundle after a reinstall over leftover state).
Add uninstall.sh at the repo root that handles all of the above, and
update README.md to point at it as the recommended path. The plain
`rm -rf ~/.unsloth/studio` line is kept as a "partial uninstall, keep
launcher for a later reinstall" alternative. The model cache at
~/.cache/huggingface is intentionally left untouched, with a note in
the script suggesting how to remove it if desired.
Script is POSIX sh, idempotent (every removal is gated on existence
and uses `2>/dev/null || true`), and handles macOS, Linux, and WSL.
Windows is intentionally not covered here; the existing PowerShell
Remove-Item line in README is kept for that.
* studio: trim uninstall.sh header
* studio: address PR review feedback on uninstall.sh
Four findings from automated review, all verified real:
1. pkill pattern only matched `-p N`, not `--port N`. Studio
instances launched with the long option form survived the
uninstall. Fix: run two pkill passes, one for each form, with
`[ =]` covering both space and `=` separators.
2. CLI shim at ~/.local/bin/unsloth (symlink into the venv created
by install.sh:2167) was left behind, becoming a broken symlink
after the venv directory is removed. Fix: add it to the removals.
3. Custom install roots via UNSLOTH_STUDIO_HOME / STUDIO_HOME were
not removed. install.sh records the install location in
~/.local/share/unsloth/studio.conf as UNSLOTH_EXE; parse it,
derive the root as three dirnames up, and remove the root if it
is non-default.
4. On WSL the installer creates 'Unsloth Studio.lnk' on the Windows
Desktop and Start Menu Programs folder via powershell.exe.
Mirror that path on uninstall by invoking powershell.exe to
Remove-Item the same two locations. Best-effort, gated on
powershell.exe being available.
Tests (T2.8b, T2.15, T2.16, T2.17, T2.18, T2.5b) added behind the
scenes; all pass on macOS Darwin 25.3 with `dash -n`, `sh -n`,
shellcheck-clean (SC2016 suppressed on the PowerShell single-quoted
heredoc since the $env: expansions must remain literal to the
shell so PowerShell receives them verbatim).
* studio: harden uninstall.sh against env-mode and shim collisions
- Honor UNSLOTH_STUDIO_HOME / STUDIO_HOME at uninstall time and read
env-mode studio.conf at $<root>/share/studio.conf, not just the
default-mode conf under $HOME/.local/share/unsloth/. Without this,
installs done with a custom STUDIO_HOME leak the install tree even
when the env var is re-exported.
- Guard the custom-root resolver against "/" and empty so a corrupted
studio.conf (UNSLOTH_EXE='/etc/passwd' or similar) or an
UNSLOTH_STUDIO_HOME=/ cannot trick the script into rm -rf'ing root.
- Only remove $HOME/.local/bin/unsloth when it is a symlink resolving
to a Studio venv. pyproject.toml declares unsloth as a console
script, so pip install --user unsloth places a regular file at the
same path; the previous unconditional rm wiped that unrelated CLI.
- When neither env var is set, print a tail hint so users with custom
install roots know to re-run with the variable.
Verified with a sandboxed harness covering 24 scenarios (default and
env-mode installs across macOS / Linux / WSL, idempotency, hostile
lockfile names, path-traversal attempts, malformed conf, pkill long
and short forms, pip-conflict shim, broken-symlink bundle path).
Script remains POSIX (shellcheck -s sh clean, runs under /bin/dash).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Refuse non-Studio uninstall roots and tighten process matching for PR unslothai#5497
Three issues found while testing custom-root paths and process cleanup:
1. UNSLOTH_STUDIO_HOME=$HOME sh uninstall.sh rm -rf'd $HOME (same for
STUDIO_HOME and parent-of-$HOME). install.sh accepts any writable
directory for STUDIO_HOME, so the uninstaller must validate ownership
before deletion. _is_studio_root accepts a candidate root only if it
contains share/studio.conf, an unsloth_studio/ directory, or a
bin/unsloth shim pointing into unsloth_studio/bin. _is_unsafe_root is
a defense-in-depth deny list (/, $HOME, $HOME's parent, system paths).
2. pkill -f patterns "unsloth studio.*-p[ =][0-9]" over-matched on argv
substrings. A user running `less notes.md` whose filename contained
"unsloth studio ... -p N" had their less killed. New patterns anchor
on /unsloth_studio/bin/ so only processes whose actual exe lives in a
Studio venv match.
3. pkill missed processes that exec into studio/backend/run.py --port N
(the post-exec form when the unsloth CLI replaces itself). Added a
third pattern for that shape, and prefer PID files written by
install.sh's _spawn_terminal (studio-$port.pid in DATA_DIR) over
argv matching for installs that have them.
* Tighten ownership guards from review round for PR unslothai#5497
Three findings from the second reviewer round:
1. _is_studio_root accepted any directory containing an unsloth_studio/
subdir as Studio-owned. A user workspace that happens to contain a
folder named unsloth_studio/ would be deleted. install.sh's env-mode
guard at install.sh:1358-1361 already requires .unsloth-studio-owned
before treating the venv as replaceable. Mirror that: require the
owner marker, share/studio.conf, or the bin/unsloth shim target.
2. The pkill -f fallback patterns were global, so uninstalling install A
would also kill install B's running server. Scope each pattern to the
actual install root being removed by interpolating the root path into
the regex. Also adds a third pattern shape for `unsloth studio` with
no -p / --port flag (the CLI default-port form).
3. Desktop/Unsloth Studio is created by install.sh as a symlink to the
.app bundle. If a user has a regular directory by that name (photos,
notes, etc.), the previous _remove_path call rm -rf'd it. Now we only
remove it when it is a symlink or does not exist.
* Canonicalize env roots and honor UNSLOTH_STUDIO_HOME precedence for PR unslothai#5497
Two findings from the latest review round:
1. Canonicalize env-derived roots before the safety check. The deny list
only string-compares against $HOME, so a syntactic variant like
UNSLOTH_STUDIO_HOME=$HOME/../$USER (or trailing slash, or relative
path) bypassed _is_unsafe_root even though it resolves to $HOME. Now
_emit runs CDPATH= cd -P -- + pwd -P first, so all variants normalize
to the same canonical path before the deny check. Also added the same
tilde expansion install.sh's _resolve_studio_destinations does.
2. Mirror install.sh's env-var precedence (install.sh:282-290). When
both UNSLOTH_STUDIO_HOME and STUDIO_HOME are set, install.sh resolves
only UNSLOTH_STUDIO_HOME and ignores STUDIO_HOME. Uninstall was
emitting both, so running uninstall.sh for install A would also
delete install B if the user had a stale STUDIO_HOME pointing at B.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Daniel Han <info@unsloth.ai>
…i#5536) * Studio update CI: round-trip install -> update -> uninstall Adds an "Uninstall and verify clean" step to the three existing studio-{,-mac-,-windows-}update-smoke.yml workflows so each one ends by running uninstall.sh / uninstall.ps1 against the install it just produced, then asserting that the install dir, launcher data dir, desktop shortcut, CLI shim (and on Mac, the .app bundle) are all gone. Two trailing reruns confirm idempotency. The uninstall log is added to the existing artifact bundle. Catches regressions where install.sh / install.ps1 starts writing to a new path (registry key, Start Menu entry, %APPDATA% subdir, etc.) and uninstall.{sh,ps1} has not been updated to match. Safety-guard scenarios (refuse-\$HOME, refuse-non-Studio, tilde expansion, etc.) are intentionally NOT exercised here -- those belong in a dedicated fast smoke job that does not have to wait on a 5-15 min install. Wall-clock overhead is ~30-45 s on each runner. Path filters extended to include uninstall.sh / uninstall.ps1 so a pure uninstaller change also triggers the round-trip check. * Skip round-trip step when uninstall.{sh,ps1} are not in tree --------- Co-authored-by: Daniel Han <info@unsloth.ai>
…unslothai#5518) * studio: register /settings route that opens the settings dialog Navigating to /settings used to render Not Found because the route was never registered. The settings dialog only opened via the user menu, so /settings was a broken deep link if shared. Add a route that calls useSettingsDialogStore.openDialog() and redirects to the post-auth landing page so the modal appears on top of the chat. * studio: harden Connections dialog provider sync and allow manual model IDs Two related fixes for the Connections panel. 1. Keep localStorage providers when the server returns an empty list. The dialog used to sync from /api/providers/ on mount and unconditionally overwrite the Zustand provider store with the server result. When the server had no enabled configs but the local store had entries (legacy users, fresh dev installs, or providers created via earlier paths), opening the dialog silently wiped them. The model picker reads from the same store, so the chat header reverted from 'gpt-4o . OpenAI' to the raw 'external::openai-1::gpt-4o' key. Treat the server as authoritative only when it actually has rows; otherwise keep the local view. 2. Accept manual model IDs alongside the live catalog for remote-mode providers (DeepSeek, OpenAI, etc.). Previously the only way to save was to load the available-models catalog via a live API call, which fails in air-gapped setups, behind 502s, or when the user already knows the exact model ID. Add a Textarea fallback in the same render block, and relax the validation to accept manual IDs even when availableModels is empty. The validation message now points users at the manual path. * studio: restrict manual model ID entry to openrouter among remote providers Address review feedback: major remote providers (openai, anthropic, gemini, mistral, cohere, deepseek, ...) expose large per-model parameter surfaces that differ across models, so accepting pasted model IDs leads to mismatched parameter expectations and frustrating runtime errors. Keep their catalog curated by hiding the manual textarea and falling back to the prior 'Load available models first' validation toast for them. OpenRouter drops unsupported parameters server-side, so manual entry remains useful there; keep the textarea and the union save path for it. Custom and curated backends already gated via isCustomProvider / isCuratedModelList and continue to require manual entry as before. * studio: shorten code comments in chat-providers-dialog.tsx Trim three multi-line comment blocks to single lines per review.
* studio: add uninstall.ps1 and document it in README for Windows The previous Windows uninstall guidance was Remove-Item -Recurse -Force on $HOME\.unsloth\studio, which only deletes the install dir and leaves behind: * %LOCALAPPDATA%\Unsloth Studio (data dir) * Desktop\Unsloth Studio.lnk (Desktop shortcut) * %APPDATA%\Microsoft\Windows\Start Menu\Programs\Unsloth Studio.lnk * Custom UNSLOTH_STUDIO_HOME / STUDIO_HOME roots * Running unsloth_studio venv processes * User PATH entry under .unsloth\studio * HKCU\Software\Unsloth\PathBackup This script mirrors uninstall.sh for Windows. It stops listening backends by reading the port from share\studio.port (with a Win32_Process sweep anchored on \unsloth_studio\ as a fallback), removes the install dir, data dir, both shortcuts, the Studio PATH entry, and the PathBackup registry key. Custom roots discovered from env vars or share\studio.conf are accepted only if they contain a Studio sentinel (share\studio.conf, unsloth_studio\.unsloth-studio-owned, or bin\unsloth.exe) and are not on a hard deny list (drive root, %USERPROFILE%, parent of %USERPROFILE%, or top-level system paths). README now points Windows users at the script. * Scope port-file kill and PATH cleanup to known Studio roots for PR unslothai#5513 Three findings from the reviewer round: 1. _StopByPortFile killed whatever owned the recorded port without proving the PID belonged to this Studio install. A stale studio.port pointing at a port a different local service later bound would force-kill that service. New _PidUnderKnownRoot checks the listening PID's exe path against the same $KnownRoots that _StopStudioProcesses already uses. 2. The netstat.exe fallback matched ":$port " anywhere in the line, so a stale port file with 443 (or any common port) could match an ESTABLISHED row whose remote endpoint was that port, killing an unrelated process (browser, IDE). Now requires the row contain LISTENING, and applies the same _PidUnderKnownRoot ownership check. 3. PATH cleanup removed any entry whose expanded path contained \unsloth_studio\, which would also clobber an unrelated user virtualenv that shared the name. Now only removes entries that resolve inside a known Studio root (default %USERPROFILE%\.unsloth\studio plus any custom roots discovered from UNSLOTH_STUDIO_HOME / STUDIO_HOME / share\studio.conf). * Expand tilde and honor UNSLOTH_STUDIO_HOME precedence for PR unslothai#5513 Two findings from the latest review round: 1. install.ps1 (lines 152-154) expands ~ and ~\path to $env:USERPROFILE before resolving the install root, but uninstall.ps1 was passing the raw env value to [System.IO.Path]::GetFullPath. That resolved ~\foo relative to the current directory rather than the user profile, so a user who installed with UNSLOTH_STUDIO_HOME='~\custom' could not uninstall through the same variable. New _ExpandTilde helper matches install.ps1's behavior. 2. Mirror install.ps1's env-var precedence: UNSLOTH_STUDIO_HOME wins, STUDIO_HOME is ignored when both are set. Otherwise uninstalling install A could also touch install B if the user has a stale STUDIO_HOME pointing at B. --------- Co-authored-by: Daniel Han <info@unsloth.ai>
…5538) * Fix num_logits_to_keep on transformers >= 4.51 + compile loss_function Two follow-ups to the fused-forward work landed in unsloth-zoo PR unslothai#665. 1. unsloth_fast_generate (models/llama.py): transformers 4.51 renamed num_logits_to_keep to logits_to_keep. Previously we unconditionally set kwargs['num_logits_to_keep'] = 1, which transformers 4.57's _validate_model_kwargs rejects with: ValueError: The following `model_kwargs` are not used by the model: ['num_logits_to_keep'] blocking model.generate() on Llama / Mistral. Now we inspect the runtime forward signature and use whichever spelling it accepts; if a caller still passes the legacy name we promote it to the new spelling instead of stripping it. 2. patch_loss_functions (models/loader.py): the single internal call site passed torch_compile=False. UnslothForCausalLMLoss is small (label shift + Triton CE), so torch.compile folds the elementwise prep into one launch and removes per-step Python overhead. The < 2.4 fallback inside patch_loss_functions still routes through torch._disable_dynamo so older torches are unaffected. Verified: - Llama 3.2 1B + model.generate() no longer raises; emits a sensible 16-token continuation. - Gemma3 1B GRPO smoke (max_steps=3) returns bit-identical losses 0.256 / 0.4393 / 0.2031 vs pre-fix; train_runtime 409s (vs 415s pre-fix, within noise). - unsloth-zoo test_compiler_rewriter_exhaustive + test_fused_forward_install pass (96 passed) on this combination. Related: unslothai/unsloth-zoo PR for the compiler.py single-matmul backport. * Revert loader.py loss-compile flip; correct rename-version comment Drop the patch_loss_functions(torch_compile=True) flip. Tracing the loss call chain: UnslothForCausalLMLoss -> unsloth_fixed_cross_entropy -> _fast_cross_entropy_loss -> Fast_CrossEntropyLoss.apply (torch.autograd.Function wrapping Triton) torch.compile treats custom autograd.Function.apply as an opaque op and breaks the graph at the boundary. The only Python it can actually compile in the loss function is the label-shift + ignore-fill prep (three elementwise ops), and the per-call dynamo guard overhead is in the same order as that prep. Empirical Gemma3 1B GRPO smoke (max_steps=3) showed no meaningful runtime delta (415s vs 409s, within noise) and risked dragging the outer compiled training step into recompiles when the inner guards drift. Keep torch_compile=False; the Triton kernel is the work, and it is unchanged either way. Also: the inline comment in unsloth_fast_generate said the kwarg rename landed in transformers 4.51. The actual decorator (@deprecate_kwarg) was tagged version="4.50" and present through 4.51.x, then removed in 4.52+. Correct the comment. No behaviour change.
…) (unslothai#4611) Co-authored-by: WhiskyAKM <35374730+PTFOPlayer@users.noreply.github.com> PR unslothai#4611 originally proposed a community uninstall.sh for Unsloth Studio. We folded that idea into the maintainer-authored uninstall.sh (PR unslothai#5497) and uninstall.ps1 (PR unslothai#5513) which now ship in main with safety guards, idempotency, lock-dir / .desktop / .app cleanup, env-var precedence, tilde expansion, and CI coverage on real Linux / macOS / Windows runners (PR unslothai#5536). Recording this empty-commit merge so the original contribution from @PTFOPlayer is attributed in git history.
* Add OpenDocument chat attachments * Preserve typed ODS cell values * Exclude hidden OpenDocument review text * fix(chat): harden OpenDocument attachment extraction * fix(chat): close opendocument attachment leaks * fix(chat): unblock failed attachments * fix(chat): preserve covered cell columns --------- Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: shine1i <wasimysdev@gmail.com>
Round 14 reviewer aggregate (logs/review_round14_aggregate.md): P1 fixes: - routes/export.py /load-checkpoint now runs the active-export 409 guard BEFORE the chat / diffusion unloads, so a rejected request no longer tears down unrelated GPU state. - core/inference/llama_cpp.py wraps the WHOLE load_model body in a single try/finally that publishes loading_model_identifier across download, metadata read, VRAM settle, process spawn, and health check. Done via a thin load_model wrapper around the existing body (renamed _load_model_impl) to avoid reindenting hundreds of lines. - routes/models.py /delete-finetuned now checks loading_model_identifier so a pending HF GGUF download cannot have its destination directory rmtree'd before llama-server spawns. - core/inference/diffusion.py stores the original caller-supplied gguf_filename (e.g. ``BF16/model.gguf``) in a new self._gguf_filename field and exposes it as active_gguf_filename. UI-facing gguf_filename still collapses to basename for the panel. - routes/models.py /delete-cached llama guard now allows safe different-variant deletes when hf_variant differs, matching the diffusion path's variant-aware behaviour. - core/inference/diffusion.py tracks self._cpu_offload_enabled and forces a CPU torch.Generator when offload is on, so seeded generation no longer crashes on CUDA hosts with the default offload enabled. P2 fixes: - core/inference/diffusion.py detect_family normalises mixed separators (``Qwen_Image-Edit-GGUF``, ``Qwen-Image_Edit-GGUF``, ``QwenImageEdit-GGUF``) so every Qwen-Image-Edit spelling is excluded from the base Qwen-Image family. - core/inference/diffusion.py logger.info / logger.error in load_model run repo_id and effective_base through _redact_hf_tokens so URL-embedded ``hf_xxxxx`` tokens never reach structured-log sinks. - core/inference/diffusion.py _release_other_gpu_owners_for_diffusion now raises RuntimeError when an export job is active instead of logging and continuing, so direct backend callers cannot bypass the route layer's 409 guard. - core/inference/diffusion.py full-diffusers repo / base_repo paths expand ``~`` via _expand_existing_local_path so ``repo_id="~/models/my-flux"`` no longer falls through to the Hub. Tests: - 5 new regression cases (mixed Qwen-Image-Edit separators, token redaction, status full-filename, CPU offload generator device, staging Windows leaf already-set sanity). - All 68 diffusion backend + route tests pass.
for more information, see https://pre-commit.ci
acad7c6 to
b7207e3
Compare
Round 15 reviewer aggregate (logs/review_round15_aggregate.md): P1 fixes: - core/inference/llama_cpp.py publishes loading_model_identifier + loading_hf_variant AFTER acquiring _serial_load_lock; previously a queued second load could overwrite or clear the identifier currently in flight, breaking delete-safety and GPU handoff guards. - routes/models.py /delete-finetuned compares the pending llama load against loading_hf_variant (new), not the stale hf_variant from the previous loaded model. Without this, a Q4-loaded directory loading Q8 would still accept a Q8 delete. - core/inference/diffusion.py _release_other_gpu_owners_for_diffusion now also raises when training is active so direct backend callers cannot bypass the route layer's 409 guard. Mirrors the export-active check the same helper already enforces. - routes/models.py /delete-cached diffusion guard compares owned diffusion paths against the HF cache root for the target repo via _all_hf_cache_scans + _is_path_under. Without this, loading from a local models--owner--model/snapshots/<sha> path let the cache delete proceed while the snapshot was still mmap'd. - models/inference.py DiffusionLoadRequest refuses URL-embedded hf_xxxxx tokens in repo_id / base_repo at the API boundary, so the value never reaches self._repo_id and status() can never echo it back to other authenticated sessions. P2 fixes: - core/inference/diffusion.py status() routes UI-facing repo_id / base_repo through _display_repo_id, which collapses absolute local paths to the leaf name (delete guards still see the full path via active_*/pending_*). - routes/inference.py /images/load maps backend RuntimeError that reports an export/training conflict to HTTP 409 instead of 400. - core/inference/diffusion.py detect_family now uses token-boundary matching so owner/flux.20-model does not collide with flux.2. P3 fixes: - tests/test_diffusion_routes.py drops the partial routes.inference module from sys.modules if exec_module() raises, so the real ImportError surfaces instead of a misleading AttributeError on follow-up tests. Tests: - 5 new regression cases (display_repo_id, token-boundary family detection, training-active raise from backend helper, embedded HF token rejection). - All 72 diffusion backend + route tests pass.
b7207e3 to
4c75b61
Compare
for more information, see https://pre-commit.ci
| for holding ``_serial_load_lock`` and for publishing / | ||
| clearing ``_loading_model_identifier`` + ``_loading_hf_variant`` | ||
| in the surrounding try/finally.""" | ||
| if True: |
…lothai#5754 Round 15 split LlamaCppBackend.load_model into a thin wrapper that publishes _loading_model_identifier + _loading_hf_variant under _serial_load_lock and an inner _load_model_impl_locked body that actually launches llama-server. The pre-existing source-inspection regression tests inspected only load_model and broke because the flag literals and _wait_for_vram_settle call now live in the inner method: - tests/test_llama_cpp_no_context_shift.py test_no_context_shift_is_in_load_model test_flag_sits_inside_the_base_cmd_list - tests/test_llama_cpp_wait_for_vram_settle.py test_load_model_calls_helper_outside_lock_and_uses_last_kill_timestamp Update both helpers to concatenate the source of load_model AND _load_model_impl_locked so the assertions still cover the launch path without weakening their scope to the full module.
for more information, see https://pre-commit.ci
4c75b61 to
aa24f21
Compare
Round 16 reviewer aggregate (logs/review_round16_aggregate.md): P1 fixes: - routes/models.py /delete-cached llama guard pairs loading_id with loading_hf_variant so deleting a different cached quant (Q8_0) while another variant (Q4_K_M) is loading is no longer blocked. - core/inference/diffusion.py load_model now calls _release_other_gpu_owners_for_diffusion BEFORE _release_chat_backend_for_diffusion. The other-owners helper RAISES on active training/export, so a route -> worker race or direct backend caller no longer drops the user's chat model before the diffusion load is refused. - routes/models.py /delete-cached diffusion guard fails CLOSED (503) on HF cache scan failure instead of silently falling through to repo-id-only matching, which could miss a loaded local snapshot path. - routes/inference.py _release_llama_for and _release_safetensors_chat_for now raise 503 on actual unload failure (exception or False return), so new GPU workloads do not start while the old chat process still owns VRAM. - core/inference/diffusion.py status() now takes include_internal=False by default and only exposes the guard-facing active_*/pending_* paths when callers opt in. The public /api/inference/images/status route gets the redacted payload; routes/models.py delete guards pass include_internal=True so they still see the raw paths. - core/inference/diffusion.py generate_image_with_metadata routes the response model through _display_repo_id so /images/generate cannot echo back an absolute local path. P2 fixes: - routes/inference.py /images/load now maps backend "Could not verify training/export status" to 503 instead of 409, matching the route-level pre-check. - core/inference/diffusion.py _release_other_gpu_owners_for_diffusion raises "Could not verify export status" when the is_export_active() probe itself raises, instead of silently treating it as active export. - core/inference/diffusion.py detect_family compares compact family spellings (Flux2Klein) against per-token compact strings so unsloth/Flux2Klein-GGUF matches the flux.2-klein family without matching the embedded substring inside flux.20. - main.py installs a RequestValidationError handler that scrubs hf_xxxxx tokens out of the 422 response body so a rejected ``repo_id`` containing a URL-embedded HF token does not echo it back to the browser. Tests: - 3 new regression cases (Flux2Klein compact alias, public status redaction, generate_image_with_metadata redaction). - All 75 diffusion backend + route tests pass.
for more information, see https://pre-commit.ci
aa24f21 to
3c6a47d
Compare
Two diffusion tests broke on the Windows runner after round 16: - test_display_repo_id_collapses_absolute_path used hardcoded POSIX absolute paths; Windows reads /home/... as drive- relative so Path.is_absolute() returns False. Use pytest's tmp_path so the path is platform-correct. - test_load_publishes_pending_target_during_loading regressed because round 16 moved _release_other_gpu_owners_for_diffusion ahead of the chat unload. That helper imports core.training and core.export; on Windows CI the import resolved to a real but partially configured backend, which raised inside the new status-verification path and aborted the load before from_pretrained ran. Stub both modules with idle backends in _install_fake_diffusers. Also updated test_public_status_does_not_leak_local_path_via _active_fields and test_generate_image_with_metadata_redacts_ local_path to use tmp_path for the same Windows reason.
3c6a47d to
b841562
Compare
P1: route-layer chat/diffusion/export releases were still asymmetric. Training start and export load called ``diff_backend.unload_model`` inside a best-effort try/except so a wedged diffusion backend let the next workload allocate over the top of the resident pipeline and OOM. Both now use the strict ``_release_diffusion_for`` helper from routes.inference, which raises HTTPException 503 on status/unload failure or post-check mismatch. P2 #9: diffusion load exceptions can include the absolute local repo / base / gguf path verbatim (FileNotFoundError, OSError from diffusers / safetensors). The path flows into ``_last_error``, which ``status()`` returns to every authenticated session. Collapse the known repo_id / effective_base / gguf_filename paths to their leaf name before storing the error, mirroring the ``_display_repo_id`` convention used for the public repo label. P2 #10: when ``repo_id`` is an absolute local path, ``detect_family`` matched _FAMILY_EXCLUDE deny lists against the full path, so models stored under a parent directory containing ``qwen-image-edit`` or ``3.5`` were misclassified as None. Reduce the family-detection needle to the leaf directory when the input looks like a filesystem path; Hub-style ``owner/repo`` ids continue to use the original needle so existing detection rules keep working. P2 #12: ``gguf_filename`` was missing from the ``_reject_embedded_hf_token`` validator. A URL-form quant path like ``https://hf_xxxxx@huggingface.co/.../flux.gguf`` would be stored on ``DiffusionBackend._gguf_filename`` and surface in status() / log lines. Extend the validator to gguf_filename so the token is dropped before it can leak. All 85 diffusion-relevant backend tests pass locally.
for more information, see https://pre-commit.ci
P1 #1: ``_release_llama_for()`` now verifies ``llama.unload_model`` did not return False AND that ``is_loaded`` / ``is_active`` / ``loading_model_identifier`` are all cleared after the call. The previous version only treated raised exceptions as failure, so a subprocess refusing to terminate or an in-flight GGUF download let the next workload allocate on top. P1 #2: ``DiffusionBackend._release_other_gpu_owners_for_diffusion`` now raises RuntimeError when ``exp._shutdown_subprocess`` fails on a settled checkpoint. Direct backend callers used to log at debug level and proceed toward diffusion allocation while the export checkpoint still owned VRAM. P1 #3 + P1 #7: ``/images/load`` no longer drops chat + idle export before the cheap backend validation runs. ``DiffusionBackend.load_model`` already calls the strict ``_release_other_gpu_owners_for_diffusion`` and ``_release_chat_backend_for_diffusion`` helpers AFTER family inference and GGUF filename checks pass, so the GPU is still freed before allocation and a malformed payload no longer silently unloads the user's chat / chat-export pair. P1 #4: ``_release_chat_backend_for_diffusion`` now also rejects a post-unload state where ``loading_model_identifier`` is still set, matching the route-level ``_release_llama_for`` strictness. A GGUF download mid-flight before the diffusion handoff used to slip through and end up double-owning VRAM after diffusion allocated. P1 #5: ``_release_diffusion_for`` no longer swallows a post-unload ``status()`` failure as ``after = {}``. Training / chat / export handoffs need proof that the diffusion pipeline released VRAM; the helper now raises HTTP 503 when the verification status call itself raises, so the caller retries. P1 #6: ``DiffusionBackend._release_other_gpu_owners_for_diffusion`` raises RuntimeError when ``get_export_backend()`` itself raises. Direct backend callers used to silently ``return`` here and proceed to GPU allocation without being able to verify export ownership. P1 #8: ``/training/start`` releases settled export BEFORE chat, matching the chat-load helpers. If idle export shutdown fails the user's chat model is preserved instead of being dropped for a training run that never starts. P2 #9: GGUF load-error scrubber also collapses ``local_gguf_path``, the resolved HF cache path passed to ``transformer_cls.from_single_file()``. Without this an exception like ``OSError: cannot load /home/alice/.cache/huggingface/.../flux.gguf`` would leak the operator's filesystem layout through ``last_error`` and ``/images/status``. All 85 diffusion-relevant backend tests pass locally.
P1 #1: ``_release_safetensors_chat_for`` now re-reads ``active_model_name`` and ``loading_models`` after each unload AND runs a final sweep against the initial owned-name set. The previous helper trusted ``unload_model() -> True`` even though the orchestrator can respond ``unloaded`` while still holding weights or a concurrent ``load`` can repopulate the tracker between calls. Per-name and global post-state mismatches now raise HTTP 503 so the caller retries. P1 #2: same post-state guarantee inside ``_release_chat_backend_for_diffusion`` for direct backend callers. ``DiffusionBackend.load_model`` now raises RuntimeError when the safetensors tracker still owns a previously-resident name after the unload, matching the route-level helper. The route layer's existing classifier maps the new wording to HTTP 503. P1 #3: ``DiffusionBackend.load_model`` now preflights the full diffusers repo (or explicit GGUF ``base_repo``) via ``hf_hub_download(filename="model_index.json")`` BEFORE the chat / export unload runs. The GGUF path was already covered by the existing ``hf_hub_download(gguf_filename)`` round-trip; the full-repo path used to skip validation and let a typo / private / gated repo only surface inside ``from_pretrained`` AFTER the user's chat model was already dropped. Local paths are checked structurally (must be a directory containing ``model_index.json``) so we do not network-round-trip for an on-disk miss. Error messages route through ``_display_repo_id`` so an absolute filesystem path does not leak the operator's layout. P1 #6: ``/api/inference/unload`` (the direct chat unload endpoint) now treats ``unload_model() -> False`` AND a leftover state (``is_loaded`` / ``is_active`` / ``loading_model_identifier`` for GGUF, ``active_model_name`` / ``loading_models`` for safetensors) as 503 instead of unconditionally responding ``status="unloaded"``. The UI used to show the model as gone while the backend still owned VRAM. P2 #7: extended the /images/load RuntimeError -> HTTPException marker list with ``still active or loading after unload`` and ``still loading after unload``. Round 18 introduced these exact phrasings on the backend side; without the extension a retryable unload failure was returning HTTP 400 to the user instead of 503. P2 #8: removed the unused ``unsloth_backend = get_inference_backend()`` eager construction in the GGUF chat-load branch. Eager construction made the GGUF-only path needlessly fail or pay startup cost when the safetensors backend was unavailable / lazy; ``_release_safetensors_chat_for`` already handles that case as a no-op. All 85 diffusion-relevant + 98 related backend tests pass locally.
P1 #1: ``_preflight_full_diffusers_repo(effective_base, hf_token)`` now runs for every load mode, including the GGUF-with-auto-base path. Round 19 only preflighted the full repo or an explicit ``base_repo``, so an auto-picked companion that turned out to be gated / private / missing still unloaded the user's chat model before ``from_pretrained`` failed. ``effective_base`` is the same value that feeds every downstream allocation, so preflighting it unconditionally catches all three modes. P1 #2: ``diffusers.GGUFQuantizationConfig`` (which imports the ``gguf`` package at construction time) is now built up front, inside the same try block that surfaces "Re-run Studio setup". Previously the missing-dependency exception fired AFTER ``_release_other_gpu_owners_for_diffusion`` and ``_release_chat_backend_for_diffusion`` had already taken the chat / export models down. The downstream from_single_file call reuses the same ``quant_config`` reference. P1 #4: ``studio/backend/requirements/studio.txt`` now lists ``diffusers>=0.37.0`` and ``gguf>=0.10.0``. These were only in the extras files, so fresh standard Studio installs failed on /images/load with the round 20 P1 #2 dependency error message. P1 #5: ``LoadRequest``, ``UnloadRequest``, and ``ValidateModelRequest`` now apply the same control-character + embedded-HF-token validators that ``DiffusionLoadRequest`` already had. /api/inference/load, /api/inference/validate, and /api/inference/unload used to accept newline / tab / control characters in ``model_path`` (log-line smuggling) and URL-form ``https://hf_xxxxx@huggingface.co/...`` (credential leak through structured log sinks). P2 #6: ``_collapse_local`` in the diffusion load-error scrubber now resolves relative candidates and adds the absolute form to the substring set. A relative ``exports/my-flux`` used to leak ``/mnt/disks/.../exports/my-flux/...`` via downstream library errors because the scrubber only matched the original literal. Replacement is longest-first so a leaf-only context survives. All 85 diffusion-relevant + 35 related model-validation tests pass locally. (P1 #3 cross-workload GPU handoff lock is deferred: deserves a focused design pass across /images/load, /chat/load (both branches), /training/start, and /export/load to pick a lock boundary that does not deadlock against the backend load locks or stall the SSE log stream.)
P1 #1 + #2: ``LoadRequest._no_embedded_hf_tokens`` and ``ValidateModelRequest._no_embedded_hf_tokens`` now cover ``gguf_variant`` in addition to ``model_path``. A caller could pass a variant like ``Q4_K_M-hf_xxxxxxxx`` that flowed into structured log sinks via the GGUF resolver path; the matching ``DiffusionLoadRequest`` validator already covered every string field, so this restores parity. P1 #3: ``/api/inference/unload`` now also matches the llama ``loading_model_identifier`` when picking the GGUF branch. A pending GGUF download (``is_active`` still False, ``loading_model_identifier`` populated) used to fall through to the safetensors branch and respond ``status="unloaded"`` while llama-server kept downloading. P1 #4 + #5: the final safetensors-handoff sweeps (route-level ``_release_safetensors_chat_for`` and backend ``_release_chat_backend_for_diffusion``) now check ``active_model_name`` and ``loading_models`` WITHOUT the initial ``owned_names`` filter. A concurrent ``/load`` that landed AFTER the snapshot was previously ignored, so a chat model that began loading during the unload window let training / export / GGUF chat / diffusion start anyway and race the new chat for VRAM. P2 #6: added ``_preflight_diffusers_subfolder_config`` and invoked it for GGUF loads with a transformer class (``effective_base``, ``"transformer"``). A custom base companion that had ``model_index.json`` but lacked ``transformer/config.json`` previously passed the round 19 preflight, unloaded chat, then failed inside ``from_single_file``. P2 #7: ``_scrub_validation_obj`` in main.py also scrubs string dict KEYS. Pydantic ``string_type`` errors surface ``input`` verbatim, and a malformed payload like ``{"repo_id": {"hf_xxxxx": "owner/repo"}}`` would otherwise leak the token through the 422 response body. All 85 diffusion-relevant + 35 model-validation tests pass locally. Existing fakes for ``hf_hub_download`` updated to accept the new ``subfolder=`` kwarg the round 21 preflight uses. (P1 #3 cross-workload GPU handoff lock from round 20 is still deferred; round 21's P1 #4 / #5 raised the sweep-level guarantee, which closes the most common race without the deadlock risk of holding a process-wide lock across the entire load.)
P1 #1: ``TrainingStartRequest.model_name`` now runs the same control-character and embedded-HF-token validators that the chat and diffusion request models gained in rounds 5 / 15 / 20 / 21. ``/api/training/start`` previously accepted newline / tab / control characters and URL-form ``hf_xxxxx`` tokens that flowed into structured-log sinks via "Loading model %s" lines. P1 #2: ``_run_with_helper`` in ``utils/datasets/llm_assist.py`` now skips the helper GGUF when the diffusion image backend reports loaded / loading. The public chat / training / export routes already do this through ``_release_diffusion_for``, but this dataset-side helper loaded llama-server directly with no diffusion guard, so an Images-page allocation would race the helper for VRAM. New ``_diffusion_image_model_busy`` helper fails closed (treats status() failure as busy) so the resident image model is preserved instead of being overwritten. P1 #3: same ``_diffusion_image_model_busy`` guard added to ``_run_multi_pass_advisor`` (the dataset conversion advisor), which has the same direct llama.cpp load shape. P2 #4: the early "Could not infer a diffusion family" RuntimeError now routes ``repo_id`` through ``_display_repo_id`` before formatting. A local absolute path that did not match any known family used to leak the operator's filesystem layout via the 400 response body, last_error, and log line. All 97 diffusion + training-validation + related tests pass locally.
P1 #1 + #2 + #6: extended the chat / diffusion / training identifier hardening to every export-side request model. ExportCommonOptions (parent of ExportMergedModelRequest / ExportBaseModelRequest / ExportLoRAAdapterRequest) now applies _no_control_chars and _reject_embedded_hf_token to repo_id and base_model_id; ExportGGUFRequest gets the same on its repo_id plus a control-char check on quantization_method; and LoadCheckpointRequest validates checkpoint_path. Previously "/api/export/*" accepted newline-smuggled identifiers and URL-form ``hf_xxxxx`` tokens that flowed into log lines. P1 #3 + #4: ``_run_with_helper`` and ``_run_multi_pass_advisor`` now use a shared ``_gpu_workload_busy_for_helper`` that gates on diffusion (round 22 already), training, AND export. The round 22 guard only checked diffusion, so the dataset helper / advisor could still load llama-server on top of an active training run or a resident export checkpoint. Each step fails closed (unverifiable status counts as busy) so the user's primary workload is preserved. P1 #5: PublishDatasetRequest in models/data_recipe.py also applies the identifier hardening to repo_id; the publish path previously accepted control characters and URL-form tokens. P1 #7-10: added _validate_logged_identifier helper to routes/models.py and applied it to the path / query parameter endpoints that flow into logger.info(...) calls -- ``/config/{model_name}``, ``/check-vision/{model_name}``, ``/check-embedding/{model_name}``, ``/gguf-variants``. Mapped the validator's ValueError to HTTP 422 so the client sees the same shape as a Pydantic validation failure. P2 #11 + #12: ``Loading diffusion model %s`` and ``Diffusion load failed for %s`` log lines route ``repo_id`` / ``effective_base`` through ``_display_repo_id`` (collapses absolute local paths to the leaf, still scrubs HF tokens) instead of plain ``_redact_hf_tokens``. The error path was already collapsed in the user-facing 400 / RuntimeError, but the structured-log lines kept the full path. All 97 diffusion + training-validation + related tests pass locally.
P1 #1: ``_gpu_workload_busy_for_helper`` in ``utils/datasets/llm_assist.py`` now also gates on the GGUF chat backend (llama-server) AND the safetensors chat backend. Round 23 extended it to training + export but missed Chat, so a helper / advisor GGUF could still race a loaded chat model for VRAM. Both checks fail closed when status is unverifiable. P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff unloads so the diffusion release runs BEFORE the chat releases. A wedged diffusion unload used to fire AFTER chat was already gone, so the user lost both on a single failure. Drop chat last so an earlier failure preserves it. Applied to ``/training/start`` (training.py), ``/export/load`` (export.py), ``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch (routes/inference.py). P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens ``model_path`` and ``gguf_variant`` via the shared ``_validate_logged_identifier`` helper, so control characters and URL-form HF tokens can no longer log-line-smuggle. P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and ``variant`` the same way. P1 #9: ``/download-progress`` ``repo_id`` query parameter is also hardened; the value flows into log lines deep inside ``_get_repo_size_cached`` on lookup failure. P1 #11: ``CheckFormatRequest.dataset_name`` and ``AiAssistMappingRequest.{dataset_name, model_name}`` in ``models/datasets.py`` now apply the same control-char + embedded-HF-token validators, matching every other public request-body model. All 115 diffusion + training-validation + cached_gguf + export + inference model-validation tests pass locally. (P1 #6 native-path-lease enforcement for diffusion local paths and P1 #12 React Compiler frontend lint deferred -- both need focused design / frontend touchups separate from this batch.)
| return | ||
| try: | ||
| del obj | ||
| except Exception: |
| # supplied filename (e.g. ``BF16/model.gguf``) is kept | ||
| # separately as ``active_gguf_filename`` for delete | ||
| # guards. | ||
| gguf_basename = Path(self._gguf_path).name if self._gguf_path else None |
| self._cpu_offload_enabled = False | ||
| self._loaded_at = None | ||
| _release(old) | ||
| old = None # noqa: F841 |
| backend = get_inference_backend() | ||
| active_model_name = getattr(backend, "active_model_name", None) | ||
| loading_models = set(getattr(backend, "loading_models", set()) or set()) | ||
| owned_names = {name for name in ({active_model_name} | loading_models) if name} |
|
|
||
| active_model_name = getattr(inf, "active_model_name", None) | ||
| loading_models = set(getattr(inf, "loading_models", set()) or set()) | ||
| owned_names = {name for name in ({active_model_name} | loading_models) if name} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Staging-only PR to validate the Studio diffusion image generation pipeline on Ubuntu, macOS, and Windows GitHub Actions runners before opening the upstream PR.
Backend
Frontend
CI scope (staging only)
Three targeted workflows pinned to ubuntu-latest, macos-14, and windows-latest. Each scoped via paths to studio/backend/** so unrelated changes do not re-trigger them. The inherited workflow bloat from the staging fork was cleared so we stay under the 5-Windows-runner cap.
Not for merge upstream from this fork. Upstream PR will follow once the runners go green.