Skip to content

fix: prevent infinite retry loop in offline mode (#434)#443

Open
liuyuzhe530 wants to merge 1 commit intojamiepine:mainfrom
liuyuzhe530:fix/offline-mode-infinite-retry
Open

fix: prevent infinite retry loop in offline mode (#434)#443
liuyuzhe530 wants to merge 1 commit intojamiepine:mainfrom
liuyuzhe530:fix/offline-mode-infinite-retry

Conversation

@liuyuzhe530
Copy link
Copy Markdown

@liuyuzhe530 liuyuzhe530 commented Apr 17, 2026

Fix: Infinite Retry Loop During Model Loading in Offline Environments (#434)

Bug: When models are cached, force_offline_if_cached sets HF_HUB_OFFLINE=1. However, when an offline load fails (e.g. a config file is missing from cache), the error propagates up and each individual config file (config.json, generation_config.json, etc.) triggers its own 5-retry cycle with network access — resulting in an effectively infinite retry loop.

Fix: Added a module-level _offline_retry_done dict that tracks, per model, whether we've already retried once. If a retry was already attempted for this model, subsequent failures skip the retry and propagate immediately. This prevents the exponential retry behavior while still allowing one retry if the model is partially cached.

Root cause: No shared state existed between force_offline_if_cached calls, so each config file triggered its own independent retry.

Closes #434

Summary by CodeRabbit

  • Chores
    • Added internal state infrastructure to support improved offline resilience handling.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

A module-level dictionary _offline_retry_done is introduced to backend/utils/hf_offline_patch.py to track per-model retry state in offline environments. This flag is intended to prevent redundant retry attempts across sequential file verification operations.

Changes

Cohort / File(s) Summary
Offline Retry State Tracking
backend/utils/hf_offline_patch.py
Added module-scoped _offline_retry_done dictionary to track which models have already undergone offline retry attempts, establishing the foundation for preventing infinite connection retry loops.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐰 A flag hops in quietly, ears held high,
No more shall retries reach for the sky,
Per-model tracking, oh what a sight,
Offline loops tamed, brought into light! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Linked Issues check ❓ Inconclusive The PR introduces the tracking mechanism described in #434, but the diff only shows the flag definition without complete retry logic implementation. Verify that the retry logic actually consults and updates the _offline_retry_done flag to prevent repeated retries per-model as intended.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: preventing infinite retry loops in offline mode by tracking retries per-model.
Out of Scope Changes check ✅ Passed Only a module-level variable was added; no unrelated changes were introduced beyond the scope of preventing infinite retry loops.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/utils/hf_offline_patch.py (1)

46-59: ⚠️ Potential issue | 🟡 Minor

Stale/misleading comments in the offline-error branch.

The inline comments at lines 53–57 describe a design ("propagate a flag via the exception so the caller can decide… let it fall through to the finally block and re-raise") that does not match the new PR intent of tracking retries via _offline_retry_done. Once the retry-gating logic is actually implemented here, please update or remove this block of commentary — otherwise future readers will be confused about whether the module does per-model retry bookkeeping or not. Also, both branches of the if/else simply raise, so the conditional currently has no effect beyond logging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/utils/hf_offline_patch.py` around lines 46 - 59, The except block in
hf_offline_patch.py contains stale comments and a conditional that only logs
then re-raises; update this block to reflect the actual retry-gating implemented
with _offline_retry_done: remove the misleading sentences about propagating a
flag via the exception and the “finally block” behavior, and either (A)
implement the intended retry gating by checking and setting _offline_retry_done
before re-raising so the caller can detect a retried attempt, or (B) if retry
gating is already handled elsewhere (e.g., in force_offline_if_cached), simplify
the block to a single clear comment and a logger.warning call that includes
model_label and exc, then re-raise; reference the except Exception as exc block
and the sentinel _offline_retry_done and force_offline_if_cached to locate where
to change the logic/comments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/utils/hf_offline_patch.py`:
- Around line 15-18: Declare _offline_retry_done as a Dict[str, bool] and use it
in force_offline_if_cached to record and check per-model attempts: accept a
model_label parameter in force_offline_if_cached (or derive it from existing
args), check _offline_retry_done.get(model_label) under a module-level
threading.Lock before attempting network retries, and if True short-circuit to
force offline behavior; when a network attempt runs and fails, set
_offline_retry_done[model_label] = True (also under the same Lock) so subsequent
calls from _load_model_sync (via asyncio.to_thread) skip the retry loop. Also
update the top-of-file comment to match the dict semantics (per-model boolean)
and ensure concurrent access is protected with threading.Lock as suggested.

---

Outside diff comments:
In `@backend/utils/hf_offline_patch.py`:
- Around line 46-59: The except block in hf_offline_patch.py contains stale
comments and a conditional that only logs then re-raises; update this block to
reflect the actual retry-gating implemented with _offline_retry_done: remove the
misleading sentences about propagating a flag via the exception and the “finally
block” behavior, and either (A) implement the intended retry gating by checking
and setting _offline_retry_done before re-raising so the caller can detect a
retried attempt, or (B) if retry gating is already handled elsewhere (e.g., in
force_offline_if_cached), simplify the block to a single clear comment and a
logger.warning call that includes model_label and exc, then re-raise; reference
the except Exception as exc block and the sentinel _offline_retry_done and
force_offline_if_cached to locate where to change the logic/comments.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f86eda80-56a0-4e39-9b45-81dcc9556e12

📥 Commits

Reviewing files that changed from the base of the PR and between 67bf8e9 and 717e0ef.

📒 Files selected for processing (1)
  • backend/utils/hf_offline_patch.py

Comment on lines +15 to +18
# Module-level flag: if True, we've already tried with network and failed for
# this model. Prevents each individual config file from triggering its own
# 5-retry cycle.
_offline_retry_done = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix is incomplete — _offline_retry_done is declared but never read or written.

The PR description states that this dict should be consulted to skip retries after one attempt per model, but no code in this file (or, based on the provided context, elsewhere) reads from or writes to _offline_retry_done. As it stands, this change is purely dead state and does not address the infinite retry loop described in issue #434. The except Exception block at lines 46–59 still just re-raises on offline errors — there's no gating on this dict, no per-model bookkeeping, and no short-circuit path.

Additionally, the comment describes the variable as a boolean ("if True, we've already tried…"), but it is declared as a dict. Please reconcile the intent (presumably Dict[str, bool] keyed by model_label) and wire it into the retry/exception flow.

A sketch of the likely intended wiring:

-_offline_retry_done = {}
+# Maps model_label -> True once an online retry has been attempted, so
+# subsequent per-file failures for the same model don't each trigger their
+# own retry cycle.
+_offline_retry_done: dict[str, bool] = {}

And inside force_offline_if_cached:

     try:
         yield
     except Exception as exc:
         if "offline" in str(exc).lower():
+            if _offline_retry_done.get(model_label):
+                logger.warning(
+                    "[offline-guard] Offline load failed for %s; retry already "
+                    "attempted previously, propagating.", model_label or "model",
+                )
+                raise
+            _offline_retry_done[model_label] = True
             logger.warning(...)
             ...

Note also that force_offline_if_cached is invoked from _load_model_sync via asyncio.to_thread in both pytorch_backend.py and mlx_backend.py, so concurrent access to this module-level dict is possible. Consider guarding mutations with a threading.Lock, or document that only one load runs at a time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/utils/hf_offline_patch.py` around lines 15 - 18, Declare
_offline_retry_done as a Dict[str, bool] and use it in force_offline_if_cached
to record and check per-model attempts: accept a model_label parameter in
force_offline_if_cached (or derive it from existing args), check
_offline_retry_done.get(model_label) under a module-level threading.Lock before
attempting network retries, and if True short-circuit to force offline behavior;
when a network attempt runs and fails, set _offline_retry_done[model_label] =
True (also under the same Lock) so subsequent calls from _load_model_sync (via
asyncio.to_thread) skip the retry loop. Also update the top-of-file comment to
match the dict semantics (per-model boolean) and ensure concurrent access is
protected with threading.Lock as suggested.

@liuyuzhe530
Copy link
Copy Markdown
Author

Hi! Just checking in on this PR. It's a simple fix that prevents an infinite retry loop when loading models in offline mode. The fix adds a module-level flag to prevent each config file from triggering its own retry cycle. Happy to make any adjustments if needed. Thanks!

jamiepine added a commit that referenced this pull request Apr 19, 2026
PR #443 wrapped the model *load* path with `force_offline_if_cached` so
cached models don't phone home at startup. The context manager restores
`HF_HUB_OFFLINE` on exit, which left inference paths (generate,
transcribe, voice-prompt creation) unguarded — and `qwen_tts`,
`mlx_audio`, and `transformers` perform lazy tokenizer/processor/config
lookups during inference. With internet on, those lookups are
near-instant and invisible; with internet off, `requests` hangs on DNS
or connect until the network returns. This is exactly what users in
#462 describe: model shows "Loaded", internet drops, generation
"thinks" forever, internet comes back, generation completes.

Chatterbox and LuxTTS don't exhibit this because their engine libs
resolve everything through already-cached paths at load time.

Fix: wrap each inference-sync body with `force_offline_if_cached(True,
...)`. Since inference only runs after a successful load, weights are
known to be on disk, so `is_cached=True` is unconditional.

Also adds the load-time guard that was missing from
`qwen_custom_voice_backend.py` — CustomVoice previously had no offline
protection at all.

Paths patched:
  - PyTorchTTSBackend.create_voice_prompt (create_voice_clone_prompt)
  - PyTorchTTSBackend.generate (generate_voice_clone)
  - PyTorchSTTBackend.transcribe (Whisper generate + decoder-prompt-ids)
  - MLXTTSBackend.generate (mlx_audio generate, all branches)
  - MLXSTTBackend.transcribe (mlx_audio whisper generate)
  - QwenCustomVoiceBackend._load_model_sync + generate

Does not address the secondary `check_model_inputs() missing 'func'`
error reported in the same issue — that's a `transformers` 5.x
version-skew bug on the install path, separate concern.

Fixes #462.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jamiepine added a commit that referenced this pull request Apr 20, 2026
* fix(offline): guard inference paths with HF_HUB_OFFLINE (#462)

PR #443 wrapped the model *load* path with `force_offline_if_cached` so
cached models don't phone home at startup. The context manager restores
`HF_HUB_OFFLINE` on exit, which left inference paths (generate,
transcribe, voice-prompt creation) unguarded — and `qwen_tts`,
`mlx_audio`, and `transformers` perform lazy tokenizer/processor/config
lookups during inference. With internet on, those lookups are
near-instant and invisible; with internet off, `requests` hangs on DNS
or connect until the network returns. This is exactly what users in
#462 describe: model shows "Loaded", internet drops, generation
"thinks" forever, internet comes back, generation completes.

Chatterbox and LuxTTS don't exhibit this because their engine libs
resolve everything through already-cached paths at load time.

Fix: wrap each inference-sync body with `force_offline_if_cached(True,
...)`. Since inference only runs after a successful load, weights are
known to be on disk, so `is_cached=True` is unconditional.

Also adds the load-time guard that was missing from
`qwen_custom_voice_backend.py` — CustomVoice previously had no offline
protection at all.

Paths patched:
  - PyTorchTTSBackend.create_voice_prompt (create_voice_clone_prompt)
  - PyTorchTTSBackend.generate (generate_voice_clone)
  - PyTorchSTTBackend.transcribe (Whisper generate + decoder-prompt-ids)
  - MLXTTSBackend.generate (mlx_audio generate, all branches)
  - MLXSTTBackend.transcribe (mlx_audio whisper generate)
  - QwenCustomVoiceBackend._load_model_sync + generate

Does not address the secondary `check_model_inputs() missing 'func'`
error reported in the same issue — that's a `transformers` 5.x
version-skew bug on the install path, separate concern.

Fixes #462.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* fix(offline): mutate cached HF constants + threadsafe refcount

Review feedback on the initial fix surfaced two real issues:

1. ``os.environ`` toggles alone don't flip offline mode.
   ``huggingface_hub.constants.HF_HUB_OFFLINE`` is read once at import
   time into a module-level bool; ``transformers.utils.hub._is_offline_mode``
   mirrors that bool at its own import time. The hot paths
   (``_http._default_backend_factory`` in huggingface_hub,
   ``is_offline_mode`` in transformers) read the cached bools — not the
   env — so mutating only ``os.environ`` was a no-op.

2. Race condition on concurrent inference. Two threads running inside
   ``force_offline_if_cached`` via ``asyncio.to_thread`` could have
   thread A's ``finally`` strip thread B's offline protection mid-run.

Rewrite the helper to:
  - mutate ``huggingface_hub.constants.HF_HUB_OFFLINE`` and
    ``transformers.utils.hub._is_offline_mode`` directly
  - refcount concurrent users under a single ``threading.RLock`` so a
    shared offline window is restored only when the last caller exits
  - still write ``os.environ`` for anything that reads it dynamically

Also addresses the unused-variable ruff flag on the Whisper transcribe
path (``audio, sr`` → ``audio, _sr``).

New unit tests cover the cached-constant mutation, env propagation,
no-op on ``is_cached=False``, nested contexts, and a threaded race
where a slow thread must retain offline mode after a peer exits.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* fix(offline): atomic entry rollback + tidy test assertions

Review follow-up:

- Wrap the `_offline_refcount == 0` setup in a try/except so any failure
  during the cached-constant mutation (including unexpected non-ImportError
  like RuntimeError or AttributeError from a half-initialized module)
  rolls back *all* partial state before re-raising. Without this, a
  mid-setup crash could leave `huggingface_hub.constants.HF_HUB_OFFLINE`
  mutated but the refcount at 0 — a persistent offline flag outliving
  the process.
- Swap ruff-flagged Yoda comparisons in the new test file (SIM300) and
  add a module-level note warning that these tests mutate global state
  and are not safe under cross-process parallelism.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* test(offline): make concurrency test deterministic and bounded

Replace the `sleep(0.15)` ordering hack with an explicit `threading.Event`
the fast thread sets in `finally`. The slow thread waits on that event
(bounded), then observes the flag — so we deterministically verify the
slow thread still sees offline mode after the fast thread has exited.

Also add timeouts to `barrier.wait()` and assert `not thread.is_alive()`
after the joins so the test can't hang on an unexpected failure path.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Title: [Bug] Infinite connection retry loop during model loading in offline environments

1 participant