fix: add ACESTEP_SKIP_VRAM_PREFLIGHT env var + gc before preflight check#1091
fix: add ACESTEP_SKIP_VRAM_PREFLIGHT env var + gc before preflight check#1091FlexOr2 wants to merge 3 commits intoace-step:mainfrom
Conversation
Two improvements to the VRAM pre-flight check in generate_music(): 1. Run gc.collect() + torch.cuda.empty_cache() before the check so it measures actual free VRAM, not memory held by PyTorch's caching allocator. On shared-GPU setups (desktop + generation) the check was reporting e.g. "1.3 GB free" while 0.5 GB was reclaimable cache, causing spurious rejections. 2. Add ACESTEP_SKIP_VRAM_PREFLIGHT=1 env var to bypass the check entirely. Useful for tight-VRAM setups (24 GB card with desktop compositor) where the conservative estimate blocks generations that PyTorch can actually handle via expandable_segments. The default behavior (check enabled) is unchanged. Non-target platforms unchanged: the env var check only runs on CUDA paths; CPU/MPS/XPU paths already return None before reaching it.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughConditionally runs CUDA-specific memory cleanup (gc.collect + torch.cuda.empty_cache) and adds an environment-variable-controlled bypass ( Changes
Sequence Diagram(s)(Skipped — changes are localized and do not introduce multi-component sequential flows that require visualization.) Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ChuxiJ
left a comment
There was a problem hiding this comment.
Thanks @FlexOr2 — nice, surgical patch. I traced through the surrounding code (generate_music.py:105-163 for _vram_preflight_check and the call site at :329) and it's correct in all the ways I checked:
gc.collect()paired withtorch.cuda.empty_cache()is the right idiom here, not cargo-cult. Without the GC pass,empty_cache()can't return blocks that still have live Python references; withoutempty_cache(),gc.collect()alone leaves PyTorch's caching allocator holding the memory. This matches existing patterns elsewhere in the repo (e.g.llm_inference.py:107,136,584,init_service_loader.py:134,reinitialize_route.py:99).- Safe on non-CUDA platforms. I verified
torch.cuda.empty_cache()is a silent no-op on Mac (no CUDA available), and_vram_preflight_checkitself early-returns at line 128 whentorch.cuda.is_available()is False, so CPU/MPS/XPU see no behavioral change beyond a cheap extra GC pass. - Env-var naming (
ACESTEP_SKIP_VRAM_PREFLIGHT) and boolean parsing (.lower() in ("1","true","yes")) are consistent with the project's existingACESTEP_*conventions. - Required imports (
gc,torch,logger) were already present; the diff only addsimport oswhich is correct.
One change I'd like before this lands:
Please raise the skip notice from debug to warning
if skip_preflight:
logger.debug(
"[generate_music] VRAM pre-flight skipped "
"(ACESTEP_SKIP_VRAM_PREFLIGHT=true)"
)logger.debug is filtered out at the default log level, so a user who sets ACESTEP_SKIP_VRAM_PREFLIGHT=1 gets no feedback that the safety net is off. The bigger problem is that if a later generation OOMs, nothing in the logs explains why preflight didn't catch it — support/debugging becomes harder because there's no breadcrumb.
Suggested change:
if skip_preflight:
logger.warning(
"[generate_music] VRAM pre-flight check skipped via "
"ACESTEP_SKIP_VRAM_PREFLIGHT=1. If generation OOMs, "
"unset this variable to re-enable the safety check."
)This way:
- Users who intentionally set the env var see an explicit reminder each run that they're running without the safety net (intentional, healthy friction).
- OOM post-mortems have a clear breadcrumb.
- Still non-blocking — we're not spamming errors, just one warning line per generation.
Optional nit (not blocking)
You could move the gc.collect() / empty_cache() pair into the else: branch so we don't pay the GC cost when the check is skipped anyway:
skip_preflight = os.environ.get("ACESTEP_SKIP_VRAM_PREFLIGHT", "").lower() in ("1", "true", "yes")
if skip_preflight:
logger.warning(...)
else:
gc.collect()
torch.cuda.empty_cache()
vram_error = self._vram_preflight_check(...)
if vram_error is not None:
return vram_errorNot a blocker — the GC overhead is negligible in practice (music generation runs on the order of seconds), and keeping it outside has the small advantage of also clearing memory before the actual generation path even if the preflight is skipped. Totally your call.
Happy to approve once the logger.debug → logger.warning change is in. Nice diagnosis on the root cause in the PR description — the expandable_segments:True note in particular helped me understand why the measurement-vs-reality gap is platform-specific.
ChuxiJ review feedback on PR ace-step#1091: a user who sets ACESTEP_SKIP_VRAM_PREFLIGHT=1 needs a visible reminder each run that the safety net is off, and an OOM post-mortem needs a breadcrumb in the logs explaining why preflight didn't catch it. debug-level is filtered out at the default log level, so neither use case worked.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@acestep/core/generation/handler/generate_music.py`:
- Around line 332-350: The new preflight cleanup and skip-bypass should be
limited to CUDA only: wrap the gc.collect(), torch.cuda.empty_cache(), reading
ACESTEP_SKIP_VRAM_PREFLIGHT and the logger.warning/skip behavior inside an if
torch.cuda.is_available() guard so non-CUDA (CPU/MPS/XPU) paths are untouched;
then call self._vram_preflight_check(actual_batch_size=actual_batch_size,
audio_duration=audio_duration, guidance_scale=guidance_scale) as before (and
return vram_error if not None) outside or conditional on CUDA as appropriate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 870a9690-16d4-44f6-be0e-869258ef88ae
📒 Files selected for processing (1)
acestep/core/generation/handler/generate_music.py
CodeRabbit review feedback on PR ace-step#1091: the gc.collect() / torch.cuda.empty_cache() pair and the skip-preflight warning should not run on CPU/MPS/XPU. empty_cache is a no-op there and the preflight check itself early-returns on non-CUDA, but a user setting ACESTEP_SKIP_VRAM_PREFLIGHT=1 on a CPU box would see a misleading "VRAM pre-flight skipped" warning referencing GPU semantics that don't apply. Wrap the whole block in torch.cuda.is_available() so non-CUDA paths are untouched.
Summary
gc.collect()+torch.cuda.empty_cache()before the VRAM pre-flight check so it measures actual free VRAM, not memory held by PyTorch's caching allocatorACESTEP_SKIP_VRAM_PREFLIGHT=1env var to bypass the check entirely for tight-VRAM setupsScope
acestep/core/generation/handler/generate_music.py_vram_preflight_check()method itself is unchangedRisk and Compatibility
Why
On shared-GPU setups (e.g., 24 GB RTX 3090 with desktop compositor using 2-3 GB), the pre-flight check reports e.g. "1.3 GB free, needs 1.4 GB" and blocks generation, even though PyTorch's caching allocator can handle it via
expandable_segments:True. Thegc.collect()+torch.cuda.empty_cache()before the check reclaims fragmented allocator cache so the measurement reflects actual free memory. The env var provides an escape hatch for setups where even the corrected check is too conservative.Regression Checks
ACESTEP_SKIP_VRAM_PREFLIGHT=1: check skipped, debug log emitted_vram_preflight_checkunit tests pass (method unchanged)Reviewer Notes
ACESTEP_prefix, boolean parsing via.lower() in ("1", "true", "yes")ACESTEP_VAE_ON_CPUingenerate_music_decode.pySummary by CodeRabbit