Skip to content

fix: add ACESTEP_SKIP_VRAM_PREFLIGHT env var + gc before preflight check#1091

Open
FlexOr2 wants to merge 3 commits intoace-step:mainfrom
FlexOr2:fix/optional-vram-preflight
Open

fix: add ACESTEP_SKIP_VRAM_PREFLIGHT env var + gc before preflight check#1091
FlexOr2 wants to merge 3 commits intoace-step:mainfrom
FlexOr2:fix/optional-vram-preflight

Conversation

@FlexOr2
Copy link
Copy Markdown

@FlexOr2 FlexOr2 commented Apr 12, 2026

Summary

  • Run gc.collect() + torch.cuda.empty_cache() before the VRAM pre-flight check so it measures actual free VRAM, not memory held by PyTorch's caching allocator
  • Add ACESTEP_SKIP_VRAM_PREFLIGHT=1 env var to bypass the check entirely for tight-VRAM setups

Scope

  • File changed: acestep/core/generation/handler/generate_music.py
  • Out of scope: _vram_preflight_check() method itself is unchanged

Risk and Compatibility

  • Target platform: CUDA (shared-GPU setups with 24 GB cards where desktop compositor shares VRAM)
  • Non-target platforms unchanged: CPU/MPS/XPU paths already return None before reaching the preflight check
  • Default behavior unchanged: check runs as before unless the env var is explicitly set

Why

On shared-GPU setups (e.g., 24 GB RTX 3090 with desktop compositor using 2-3 GB), the pre-flight check reports e.g. "1.3 GB free, needs 1.4 GB" and blocks generation, even though PyTorch's caching allocator can handle it via expandable_segments:True. The gc.collect() + torch.cuda.empty_cache() before the check reclaims fragmented allocator cache so the measurement reflects actual free memory. The env var provides an escape hatch for setups where even the corrected check is too conservative.

Regression Checks

  • Verified on RTX 3090 (24 GB) with desktop compositor (2.5 GB shared)
  • Default path (no env var): check still runs, now with more accurate free VRAM measurement
  • ACESTEP_SKIP_VRAM_PREFLIGHT=1: check skipped, debug log emitted
  • Existing _vram_preflight_check unit tests pass (method unchanged)

Reviewer Notes

  • Follows existing env var convention: ACESTEP_ prefix, boolean parsing via .lower() in ("1", "true", "yes")
  • Similar pattern to ACESTEP_VAE_ON_CPU in generate_music_decode.py

Summary by CodeRabbit

  • New Features
    • Added an environment-driven option to bypass the GPU VRAM preflight check. When enabled (case-insensitive values like "1", "true", or "yes"), the preflight check is skipped and a warning is logged; otherwise the standard preflight runs and halts generation on failure. Memory is proactively freed before the check, and the bypass is only considered when a CUDA-capable GPU is present.

Two improvements to the VRAM pre-flight check in generate_music():

1. Run gc.collect() + torch.cuda.empty_cache() before the check so it
   measures actual free VRAM, not memory held by PyTorch's caching
   allocator. On shared-GPU setups (desktop + generation) the check
   was reporting e.g. "1.3 GB free" while 0.5 GB was reclaimable
   cache, causing spurious rejections.

2. Add ACESTEP_SKIP_VRAM_PREFLIGHT=1 env var to bypass the check
   entirely. Useful for tight-VRAM setups (24 GB card with desktop
   compositor) where the conservative estimate blocks generations
   that PyTorch can actually handle via expandable_segments. The
   default behavior (check enabled) is unchanged.

Non-target platforms unchanged: the env var check only runs on
CUDA paths; CPU/MPS/XPU paths already return None before reaching it.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cfa99eb2-6707-4422-a6f9-0274aabb2d69

📥 Commits

Reviewing files that changed from the base of the PR and between 5166c17 and d67c4a6.

📒 Files selected for processing (1)
  • acestep/core/generation/handler/generate_music.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • acestep/core/generation/handler/generate_music.py

📝 Walkthrough

Walkthrough

Conditionally runs CUDA-specific memory cleanup (gc.collect + torch.cuda.empty_cache) and adds an environment-variable-controlled bypass (ACESTEP_SKIP_VRAM_PREFLIGHT) for the VRAM preflight check in the music generation handler; when CUDA is unavailable, the CUDA-gated block is skipped and existing flow continues.

Changes

Cohort / File(s) Summary
VRAM preflight & memory cleanup
acestep/core/generation/handler/generate_music.py
Adds os import; performs gc.collect() and torch.cuda.empty_cache() when CUDA is available; introduces ACESTEP_SKIP_VRAM_PREFLIGHT (case-insensitive 1/true/yes) to skip _vram_preflight_check with a warning; otherwise runs _vram_preflight_check and preserves its early-exit behavior.

Sequence Diagram(s)

(Skipped — changes are localized and do not introduce multi-component sequential flows that require visualization.)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Suggested reviewers

  • ChuxiJ

Poem

🐰 I hop through RAM and sweep the heap,
I nudge the cache awake from sleep,
A flag to skip, a gentle chime,
Now music runs — one hop at a time! 🎶

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: adding an environment variable to skip VRAM preflight checks and running garbage collection before the check.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@ChuxiJ ChuxiJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @FlexOr2 — nice, surgical patch. I traced through the surrounding code (generate_music.py:105-163 for _vram_preflight_check and the call site at :329) and it's correct in all the ways I checked:

  • gc.collect() paired with torch.cuda.empty_cache() is the right idiom here, not cargo-cult. Without the GC pass, empty_cache() can't return blocks that still have live Python references; without empty_cache(), gc.collect() alone leaves PyTorch's caching allocator holding the memory. This matches existing patterns elsewhere in the repo (e.g. llm_inference.py:107,136,584, init_service_loader.py:134, reinitialize_route.py:99).
  • Safe on non-CUDA platforms. I verified torch.cuda.empty_cache() is a silent no-op on Mac (no CUDA available), and _vram_preflight_check itself early-returns at line 128 when torch.cuda.is_available() is False, so CPU/MPS/XPU see no behavioral change beyond a cheap extra GC pass.
  • Env-var naming (ACESTEP_SKIP_VRAM_PREFLIGHT) and boolean parsing (.lower() in ("1","true","yes")) are consistent with the project's existing ACESTEP_* conventions.
  • Required imports (gc, torch, logger) were already present; the diff only adds import os which is correct.

One change I'd like before this lands:

Please raise the skip notice from debug to warning

if skip_preflight:
    logger.debug(
        "[generate_music] VRAM pre-flight skipped "
        "(ACESTEP_SKIP_VRAM_PREFLIGHT=true)"
    )

logger.debug is filtered out at the default log level, so a user who sets ACESTEP_SKIP_VRAM_PREFLIGHT=1 gets no feedback that the safety net is off. The bigger problem is that if a later generation OOMs, nothing in the logs explains why preflight didn't catch it — support/debugging becomes harder because there's no breadcrumb.

Suggested change:

if skip_preflight:
    logger.warning(
        "[generate_music] VRAM pre-flight check skipped via "
        "ACESTEP_SKIP_VRAM_PREFLIGHT=1. If generation OOMs, "
        "unset this variable to re-enable the safety check."
    )

This way:

  • Users who intentionally set the env var see an explicit reminder each run that they're running without the safety net (intentional, healthy friction).
  • OOM post-mortems have a clear breadcrumb.
  • Still non-blocking — we're not spamming errors, just one warning line per generation.

Optional nit (not blocking)

You could move the gc.collect() / empty_cache() pair into the else: branch so we don't pay the GC cost when the check is skipped anyway:

skip_preflight = os.environ.get("ACESTEP_SKIP_VRAM_PREFLIGHT", "").lower() in ("1", "true", "yes")
if skip_preflight:
    logger.warning(...)
else:
    gc.collect()
    torch.cuda.empty_cache()
    vram_error = self._vram_preflight_check(...)
    if vram_error is not None:
        return vram_error

Not a blocker — the GC overhead is negligible in practice (music generation runs on the order of seconds), and keeping it outside has the small advantage of also clearing memory before the actual generation path even if the preflight is skipped. Totally your call.

Happy to approve once the logger.debuglogger.warning change is in. Nice diagnosis on the root cause in the PR description — the expandable_segments:True note in particular helped me understand why the measurement-vs-reality gap is platform-specific.

ChuxiJ review feedback on PR ace-step#1091: a user who sets
ACESTEP_SKIP_VRAM_PREFLIGHT=1 needs a visible reminder each run that the
safety net is off, and an OOM post-mortem needs a breadcrumb in the logs
explaining why preflight didn't catch it. debug-level is filtered out at
the default log level, so neither use case worked.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@acestep/core/generation/handler/generate_music.py`:
- Around line 332-350: The new preflight cleanup and skip-bypass should be
limited to CUDA only: wrap the gc.collect(), torch.cuda.empty_cache(), reading
ACESTEP_SKIP_VRAM_PREFLIGHT and the logger.warning/skip behavior inside an if
torch.cuda.is_available() guard so non-CUDA (CPU/MPS/XPU) paths are untouched;
then call self._vram_preflight_check(actual_batch_size=actual_batch_size,
audio_duration=audio_duration, guidance_scale=guidance_scale) as before (and
return vram_error if not None) outside or conditional on CUDA as appropriate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 870a9690-16d4-44f6-be0e-869258ef88ae

📥 Commits

Reviewing files that changed from the base of the PR and between 84f3a08 and 5166c17.

📒 Files selected for processing (1)
  • acestep/core/generation/handler/generate_music.py

Comment thread acestep/core/generation/handler/generate_music.py Outdated
CodeRabbit review feedback on PR ace-step#1091: the gc.collect() /
torch.cuda.empty_cache() pair and the skip-preflight warning should not
run on CPU/MPS/XPU. empty_cache is a no-op there and the preflight check
itself early-returns on non-CUDA, but a user setting
ACESTEP_SKIP_VRAM_PREFLIGHT=1 on a CPU box would see a misleading "VRAM
pre-flight skipped" warning referencing GPU semantics that don't apply.
Wrap the whole block in torch.cuda.is_available() so non-CUDA paths are
untouched.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants