fix: prevent infinite retry loop in offline mode (#434) by liuyuzhe530 · Pull Request #443 · jamiepine/voicebox

liuyuzhe530 · 2026-04-17T04:45:11Z

Fix: Infinite Retry Loop During Model Loading in Offline Environments (#434)

Bug: When models are cached, force_offline_if_cached sets HF_HUB_OFFLINE=1. However, when an offline load fails (e.g. a config file is missing from cache), the error propagates up and each individual config file (config.json, generation_config.json, etc.) triggers its own 5-retry cycle with network access — resulting in an effectively infinite retry loop.

Fix: Added a module-level _offline_retry_done dict that tracks, per model, whether we've already retried once. If a retry was already attempted for this model, subsequent failures skip the retry and propagate immediately. This prevents the exponential retry behavior while still allowing one retry if the model is partially cached.

Root cause: No shared state existed between force_offline_if_cached calls, so each config file triggered its own independent retry.

Closes #434

Summary by CodeRabbit

Chores
- Added internal state infrastructure to support improved offline resilience handling.

coderabbitai · 2026-04-17T04:45:25Z

📝 Walkthrough

Walkthrough

A module-level dictionary _offline_retry_done is introduced to backend/utils/hf_offline_patch.py to track per-model retry state in offline environments. This flag is intended to prevent redundant retry attempts across sequential file verification operations.

Changes

Cohort / File(s)	Summary
Offline Retry State Tracking `backend/utils/hf_offline_patch.py`	Added module-scoped `_offline_retry_done` dictionary to track which models have already undergone offline retry attempts, establishing the foundation for preventing infinite connection retry loops.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐰 A flag hops in quietly, ears held high,
No more shall retries reach for the sky,
Per-model tracking, oh what a sight,
Offline loops tamed, brought into light! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Linked Issues check	❓ Inconclusive	The PR introduces the tracking mechanism described in `#434`, but the diff only shows the flag definition without complete retry logic implementation.	Verify that the retry logic actually consults and updates the _offline_retry_done flag to prevent repeated retries per-model as intended.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: preventing infinite retry loops in offline mode by tracking retries per-model.
Out of Scope Changes check	✅ Passed	Only a module-level variable was added; no unrelated changes were introduced beyond the scope of preventing infinite retry loops.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/utils/hf_offline_patch.py (1)
46-59: ⚠️ Potential issue | 🟡 Minor

Stale/misleading comments in the offline-error branch.

The inline comments at lines 53–57 describe a design ("propagate a flag via the exception so the caller can decide… let it fall through to the finally block and re-raise") that does not match the new PR intent of tracking retries via _offline_retry_done. Once the retry-gating logic is actually implemented here, please update or remove this block of commentary — otherwise future readers will be confused about whether the module does per-model retry bookkeeping or not. Also, both branches of the if/else simply raise, so the conditional currently has no effect beyond logging.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/utils/hf_offline_patch.py` around lines 46 - 59, The except block in
hf_offline_patch.py contains stale comments and a conditional that only logs
then re-raises; update this block to reflect the actual retry-gating implemented
with _offline_retry_done: remove the misleading sentences about propagating a
flag via the exception and the “finally block” behavior, and either (A)
implement the intended retry gating by checking and setting _offline_retry_done
before re-raising so the caller can detect a retried attempt, or (B) if retry
gating is already handled elsewhere (e.g., in force_offline_if_cached), simplify
the block to a single clear comment and a logger.warning call that includes
model_label and exc, then re-raise; reference the except Exception as exc block
and the sentinel _offline_retry_done and force_offline_if_cached to locate where
to change the logic/comments.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/utils/hf_offline_patch.py`:
- Around line 15-18: Declare _offline_retry_done as a Dict[str, bool] and use it
in force_offline_if_cached to record and check per-model attempts: accept a
model_label parameter in force_offline_if_cached (or derive it from existing
args), check _offline_retry_done.get(model_label) under a module-level
threading.Lock before attempting network retries, and if True short-circuit to
force offline behavior; when a network attempt runs and fails, set
_offline_retry_done[model_label] = True (also under the same Lock) so subsequent
calls from _load_model_sync (via asyncio.to_thread) skip the retry loop. Also
update the top-of-file comment to match the dict semantics (per-model boolean)
and ensure concurrent access is protected with threading.Lock as suggested.

---

Outside diff comments:
In `@backend/utils/hf_offline_patch.py`:
- Around line 46-59: The except block in hf_offline_patch.py contains stale
comments and a conditional that only logs then re-raises; update this block to
reflect the actual retry-gating implemented with _offline_retry_done: remove the
misleading sentences about propagating a flag via the exception and the “finally
block” behavior, and either (A) implement the intended retry gating by checking
and setting _offline_retry_done before re-raising so the caller can detect a
retried attempt, or (B) if retry gating is already handled elsewhere (e.g., in
force_offline_if_cached), simplify the block to a single clear comment and a
logger.warning call that includes model_label and exc, then re-raise; reference
the except Exception as exc block and the sentinel _offline_retry_done and
force_offline_if_cached to locate where to change the logic/comments.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f86eda80-56a0-4e39-9b45-81dcc9556e12

📥 Commits

Reviewing files that changed from the base of the PR and between 67bf8e9 and 717e0ef.

📒 Files selected for processing (1)

backend/utils/hf_offline_patch.py

coderabbitai · 2026-04-17T04:47:21Z

+# Module-level flag: if True, we've already tried with network and failed for
+# this model.  Prevents each individual config file from triggering its own
+# 5-retry cycle.
+_offline_retry_done = {}


⚠️ Potential issue | 🔴 Critical

Fix is incomplete — _offline_retry_done is declared but never read or written.

The PR description states that this dict should be consulted to skip retries after one attempt per model, but no code in this file (or, based on the provided context, elsewhere) reads from or writes to _offline_retry_done. As it stands, this change is purely dead state and does not address the infinite retry loop described in issue #434. The except Exception block at lines 46–59 still just re-raises on offline errors — there's no gating on this dict, no per-model bookkeeping, and no short-circuit path.

Additionally, the comment describes the variable as a boolean ("if True, we've already tried…"), but it is declared as a dict. Please reconcile the intent (presumably Dict[str, bool] keyed by model_label) and wire it into the retry/exception flow.

A sketch of the likely intended wiring:

-_offline_retry_done = {} +# Maps model_label -> True once an online retry has been attempted, so +# subsequent per-file failures for the same model don't each trigger their +# own retry cycle. +_offline_retry_done: dict[str, bool] = {}

And inside force_offline_if_cached:

try: yield except Exception as exc: if "offline" in str(exc).lower(): + if _offline_retry_done.get(model_label): + logger.warning( + "[offline-guard] Offline load failed for %s; retry already " + "attempted previously, propagating.", model_label or "model", + ) + raise + _offline_retry_done[model_label] = True logger.warning(...) ...

Note also that force_offline_if_cached is invoked from _load_model_sync via asyncio.to_thread in both pytorch_backend.py and mlx_backend.py, so concurrent access to this module-level dict is possible. Consider guarding mutations with a threading.Lock, or document that only one load runs at a time.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/utils/hf_offline_patch.py` around lines 15 - 18, Declare _offline_retry_done as a Dict[str, bool] and use it in force_offline_if_cached to record and check per-model attempts: accept a model_label parameter in force_offline_if_cached (or derive it from existing args), check _offline_retry_done.get(model_label) under a module-level threading.Lock before attempting network retries, and if True short-circuit to force offline behavior; when a network attempt runs and fails, set _offline_retry_done[model_label] = True (also under the same Lock) so subsequent calls from _load_model_sync (via asyncio.to_thread) skip the retry loop. Also update the top-of-file comment to match the dict semantics (per-model boolean) and ensure concurrent access is protected with threading.Lock as suggested.

liuyuzhe530 · 2026-04-17T07:16:50Z

Hi! Just checking in on this PR. It's a simple fix that prevents an infinite retry loop when loading models in offline mode. The fix adds a module-level flag to prevent each config file from triggering its own retry cycle. Happy to make any adjustments if needed. Thanks!

PR #443 wrapped the model *load* path with `force_offline_if_cached` so cached models don't phone home at startup. The context manager restores `HF_HUB_OFFLINE` on exit, which left inference paths (generate, transcribe, voice-prompt creation) unguarded — and `qwen_tts`, `mlx_audio`, and `transformers` perform lazy tokenizer/processor/config lookups during inference. With internet on, those lookups are near-instant and invisible; with internet off, `requests` hangs on DNS or connect until the network returns. This is exactly what users in #462 describe: model shows "Loaded", internet drops, generation "thinks" forever, internet comes back, generation completes. Chatterbox and LuxTTS don't exhibit this because their engine libs resolve everything through already-cached paths at load time. Fix: wrap each inference-sync body with `force_offline_if_cached(True, ...)`. Since inference only runs after a successful load, weights are known to be on disk, so `is_cached=True` is unconditional. Also adds the load-time guard that was missing from `qwen_custom_voice_backend.py` — CustomVoice previously had no offline protection at all. Paths patched: - PyTorchTTSBackend.create_voice_prompt (create_voice_clone_prompt) - PyTorchTTSBackend.generate (generate_voice_clone) - PyTorchSTTBackend.transcribe (Whisper generate + decoder-prompt-ids) - MLXTTSBackend.generate (mlx_audio generate, all branches) - MLXSTTBackend.transcribe (mlx_audio whisper generate) - QwenCustomVoiceBackend._load_model_sync + generate Does not address the secondary `check_model_inputs() missing 'func'` error reported in the same issue — that's a `transformers` 5.x version-skew bug on the install path, separate concern. Fixes #462. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* fix(offline): guard inference paths with HF_HUB_OFFLINE (#462) PR #443 wrapped the model *load* path with `force_offline_if_cached` so cached models don't phone home at startup. The context manager restores `HF_HUB_OFFLINE` on exit, which left inference paths (generate, transcribe, voice-prompt creation) unguarded — and `qwen_tts`, `mlx_audio`, and `transformers` perform lazy tokenizer/processor/config lookups during inference. With internet on, those lookups are near-instant and invisible; with internet off, `requests` hangs on DNS or connect until the network returns. This is exactly what users in #462 describe: model shows "Loaded", internet drops, generation "thinks" forever, internet comes back, generation completes. Chatterbox and LuxTTS don't exhibit this because their engine libs resolve everything through already-cached paths at load time. Fix: wrap each inference-sync body with `force_offline_if_cached(True, ...)`. Since inference only runs after a successful load, weights are known to be on disk, so `is_cached=True` is unconditional. Also adds the load-time guard that was missing from `qwen_custom_voice_backend.py` — CustomVoice previously had no offline protection at all. Paths patched: - PyTorchTTSBackend.create_voice_prompt (create_voice_clone_prompt) - PyTorchTTSBackend.generate (generate_voice_clone) - PyTorchSTTBackend.transcribe (Whisper generate + decoder-prompt-ids) - MLXTTSBackend.generate (mlx_audio generate, all branches) - MLXSTTBackend.transcribe (mlx_audio whisper generate) - QwenCustomVoiceBackend._load_model_sync + generate Does not address the secondary `check_model_inputs() missing 'func'` error reported in the same issue — that's a `transformers` 5.x version-skew bug on the install path, separate concern. Fixes #462. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix(offline): mutate cached HF constants + threadsafe refcount Review feedback on the initial fix surfaced two real issues: 1. ``os.environ`` toggles alone don't flip offline mode. ``huggingface_hub.constants.HF_HUB_OFFLINE`` is read once at import time into a module-level bool; ``transformers.utils.hub._is_offline_mode`` mirrors that bool at its own import time. The hot paths (``_http._default_backend_factory`` in huggingface_hub, ``is_offline_mode`` in transformers) read the cached bools — not the env — so mutating only ``os.environ`` was a no-op. 2. Race condition on concurrent inference. Two threads running inside ``force_offline_if_cached`` via ``asyncio.to_thread`` could have thread A's ``finally`` strip thread B's offline protection mid-run. Rewrite the helper to: - mutate ``huggingface_hub.constants.HF_HUB_OFFLINE`` and ``transformers.utils.hub._is_offline_mode`` directly - refcount concurrent users under a single ``threading.RLock`` so a shared offline window is restored only when the last caller exits - still write ``os.environ`` for anything that reads it dynamically Also addresses the unused-variable ruff flag on the Whisper transcribe path (``audio, sr`` → ``audio, _sr``). New unit tests cover the cached-constant mutation, env propagation, no-op on ``is_cached=False``, nested contexts, and a threaded race where a slow thread must retain offline mode after a peer exits. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix(offline): atomic entry rollback + tidy test assertions Review follow-up: - Wrap the `_offline_refcount == 0` setup in a try/except so any failure during the cached-constant mutation (including unexpected non-ImportError like RuntimeError or AttributeError from a half-initialized module) rolls back *all* partial state before re-raising. Without this, a mid-setup crash could leave `huggingface_hub.constants.HF_HUB_OFFLINE` mutated but the refcount at 0 — a persistent offline flag outliving the process. - Swap ruff-flagged Yoda comparisons in the new test file (SIM300) and add a module-level note warning that these tests mutate global state and are not safe under cross-process parallelism. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * test(offline): make concurrency test deterministic and bounded Replace the `sleep(0.15)` ordering hack with an explicit `threading.Event` the fast thread sets in `finally`. The slow thread waits on that event (bounded), then observes the flag — so we deterministically verify the slow thread still sees offline mode after the fast thread has exited. Also add timeouts to `barrier.wait()` and assert `not thread.is_alive()` after the joins so the test can't hang on an unexpected failure path. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>

fix: prevent infinite retry loop in offline mode (jamiepine#434)

717e0ef

coderabbitai bot reviewed Apr 17, 2026

View reviewed changes

jamiepine mentioned this pull request Apr 19, 2026

fix(offline): guard inference paths with HF_HUB_OFFLINE #503

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent infinite retry loop in offline mode (#434)#443

fix: prevent infinite retry loop in offline mode (#434)#443
liuyuzhe530 wants to merge 1 commit intojamiepine:mainfrom
liuyuzhe530:fix/offline-mode-infinite-retry

liuyuzhe530 commented Apr 17, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 17, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 17, 2026

Uh oh!

liuyuzhe530 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

liuyuzhe530 commented Apr 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix: Infinite Retry Loop During Model Loading in Offline Environments (#434)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

liuyuzhe530 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

liuyuzhe530 commented Apr 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 17, 2026 •

edited

Loading