fix: add ACESTEP_SKIP_VRAM_PREFLIGHT env var + gc before preflight check by FlexOr2 · Pull Request #1091 · ace-step/ACE-Step-1.5

FlexOr2 · 2026-04-12T12:38:18Z

Summary

Run gc.collect() + torch.cuda.empty_cache() before the VRAM pre-flight check so it measures actual free VRAM, not memory held by PyTorch's caching allocator
Add ACESTEP_SKIP_VRAM_PREFLIGHT=1 env var to bypass the check entirely for tight-VRAM setups

Scope

File changed: acestep/core/generation/handler/generate_music.py
Out of scope: _vram_preflight_check() method itself is unchanged

Risk and Compatibility

Target platform: CUDA (shared-GPU setups with 24 GB cards where desktop compositor shares VRAM)
Non-target platforms unchanged: CPU/MPS/XPU paths already return None before reaching the preflight check
Default behavior unchanged: check runs as before unless the env var is explicitly set

Why

On shared-GPU setups (e.g., 24 GB RTX 3090 with desktop compositor using 2-3 GB), the pre-flight check reports e.g. "1.3 GB free, needs 1.4 GB" and blocks generation, even though PyTorch's caching allocator can handle it via expandable_segments:True. The gc.collect() + torch.cuda.empty_cache() before the check reclaims fragmented allocator cache so the measurement reflects actual free memory. The env var provides an escape hatch for setups where even the corrected check is too conservative.

Regression Checks

Verified on RTX 3090 (24 GB) with desktop compositor (2.5 GB shared)
Default path (no env var): check still runs, now with more accurate free VRAM measurement
ACESTEP_SKIP_VRAM_PREFLIGHT=1: check skipped, debug log emitted
Existing _vram_preflight_check unit tests pass (method unchanged)

Reviewer Notes

Follows existing env var convention: ACESTEP_ prefix, boolean parsing via .lower() in ("1", "true", "yes")
Similar pattern to ACESTEP_VAE_ON_CPU in generate_music_decode.py

Summary by CodeRabbit

New Features
- Added an environment-driven option to bypass the GPU VRAM preflight check. When enabled (case-insensitive values like "1", "true", or "yes"), the preflight check is skipped and a warning is logged; otherwise the standard preflight runs and halts generation on failure. Memory is proactively freed before the check, and the bypass is only considered when a CUDA-capable GPU is present.

Two improvements to the VRAM pre-flight check in generate_music(): 1. Run gc.collect() + torch.cuda.empty_cache() before the check so it measures actual free VRAM, not memory held by PyTorch's caching allocator. On shared-GPU setups (desktop + generation) the check was reporting e.g. "1.3 GB free" while 0.5 GB was reclaimable cache, causing spurious rejections. 2. Add ACESTEP_SKIP_VRAM_PREFLIGHT=1 env var to bypass the check entirely. Useful for tight-VRAM setups (24 GB card with desktop compositor) where the conservative estimate blocks generations that PyTorch can actually handle via expandable_segments. The default behavior (check enabled) is unchanged. Non-target platforms unchanged: the env var check only runs on CUDA paths; CPU/MPS/XPU paths already return None before reaching it.

coderabbitai · 2026-04-12T12:38:35Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cfa99eb2-6707-4422-a6f9-0274aabb2d69

📥 Commits

Reviewing files that changed from the base of the PR and between 5166c17 and d67c4a6.

📒 Files selected for processing (1)

acestep/core/generation/handler/generate_music.py

🚧 Files skipped from review as they are similar to previous changes (1)

acestep/core/generation/handler/generate_music.py

📝 Walkthrough

Walkthrough

Conditionally runs CUDA-specific memory cleanup (gc.collect + torch.cuda.empty_cache) and adds an environment-variable-controlled bypass (ACESTEP_SKIP_VRAM_PREFLIGHT) for the VRAM preflight check in the music generation handler; when CUDA is unavailable, the CUDA-gated block is skipped and existing flow continues.

Changes

Cohort / File(s)	Summary
VRAM preflight & memory cleanup `acestep/core/generation/handler/generate_music.py`	Adds `os` import; performs `gc.collect()` and `torch.cuda.empty_cache()` when CUDA is available; introduces `ACESTEP_SKIP_VRAM_PREFLIGHT` (case-insensitive `1/true/yes`) to skip `_vram_preflight_check` with a warning; otherwise runs `_vram_preflight_check` and preserves its early-exit behavior.

Sequence Diagram(s)

(Skipped — changes are localized and do not introduce multi-component sequential flows that require visualization.)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

fix(generation): add timeout, progress fallback, and VRAM pre-flight … #671 — Modifies generate_music VRAM preflight flow; directly related to how the preflight check is invoked.
fix(generation): skip VRAM pre-flight when CPU offload is active — fixes #747 #759 — Also adjusts VRAM preflight gating in the same handler; overlaps control-flow concerns.
fix: guard vLLM and CUDA graph usage on 16GB GPUs to prevent OOM #173 — Adds CUDA VRAM cleanup patterns (torch.cuda.empty_cache) and VRAM-guarding behavior; similar intent.

Suggested reviewers

ChuxiJ

Poem

🐰 I hop through RAM and sweep the heap,
I nudge the cache awake from sleep,
A flag to skip, a gentle chime,
Now music runs — one hop at a time! 🎶

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main changes: adding an environment variable to skip VRAM preflight checks and running garbage collection before the check.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ChuxiJ

Thanks @FlexOr2 — nice, surgical patch. I traced through the surrounding code (generate_music.py:105-163 for _vram_preflight_check and the call site at :329) and it's correct in all the ways I checked:

gc.collect() paired with torch.cuda.empty_cache() is the right idiom here, not cargo-cult. Without the GC pass, empty_cache() can't return blocks that still have live Python references; without empty_cache(), gc.collect() alone leaves PyTorch's caching allocator holding the memory. This matches existing patterns elsewhere in the repo (e.g. llm_inference.py:107,136,584, init_service_loader.py:134, reinitialize_route.py:99).
Safe on non-CUDA platforms. I verified torch.cuda.empty_cache() is a silent no-op on Mac (no CUDA available), and _vram_preflight_check itself early-returns at line 128 when torch.cuda.is_available() is False, so CPU/MPS/XPU see no behavioral change beyond a cheap extra GC pass.
Env-var naming (ACESTEP_SKIP_VRAM_PREFLIGHT) and boolean parsing (.lower() in ("1","true","yes")) are consistent with the project's existing ACESTEP_* conventions.
Required imports (gc, torch, logger) were already present; the diff only adds import os which is correct.

One change I'd like before this lands:

Please raise the skip notice from `debug` to `warning`

if skip_preflight:
    logger.debug(
        "[generate_music] VRAM pre-flight skipped "
        "(ACESTEP_SKIP_VRAM_PREFLIGHT=true)"
    )

logger.debug is filtered out at the default log level, so a user who sets ACESTEP_SKIP_VRAM_PREFLIGHT=1 gets no feedback that the safety net is off. The bigger problem is that if a later generation OOMs, nothing in the logs explains why preflight didn't catch it — support/debugging becomes harder because there's no breadcrumb.

Suggested change:

if skip_preflight:
    logger.warning(
        "[generate_music] VRAM pre-flight check skipped via "
        "ACESTEP_SKIP_VRAM_PREFLIGHT=1. If generation OOMs, "
        "unset this variable to re-enable the safety check."
    )

This way:

Users who intentionally set the env var see an explicit reminder each run that they're running without the safety net (intentional, healthy friction).
OOM post-mortems have a clear breadcrumb.
Still non-blocking — we're not spamming errors, just one warning line per generation.

Optional nit (not blocking)

You could move the gc.collect() / empty_cache() pair into the else: branch so we don't pay the GC cost when the check is skipped anyway:

skip_preflight = os.environ.get("ACESTEP_SKIP_VRAM_PREFLIGHT", "").lower() in ("1", "true", "yes")
if skip_preflight:
    logger.warning(...)
else:
    gc.collect()
    torch.cuda.empty_cache()
    vram_error = self._vram_preflight_check(...)
    if vram_error is not None:
        return vram_error

Not a blocker — the GC overhead is negligible in practice (music generation runs on the order of seconds), and keeping it outside has the small advantage of also clearing memory before the actual generation path even if the preflight is skipped. Totally your call.

Happy to approve once the logger.debug → logger.warning change is in. Nice diagnosis on the root cause in the PR description — the expandable_segments:True note in particular helped me understand why the measurement-vs-reality gap is platform-specific.

ChuxiJ review feedback on PR ace-step#1091: a user who sets ACESTEP_SKIP_VRAM_PREFLIGHT=1 needs a visible reminder each run that the safety net is off, and an OOM post-mortem needs a breadcrumb in the logs explaining why preflight didn't catch it. debug-level is filtered out at the default log level, so neither use case worked.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@acestep/core/generation/handler/generate_music.py`:
- Around line 332-350: The new preflight cleanup and skip-bypass should be
limited to CUDA only: wrap the gc.collect(), torch.cuda.empty_cache(), reading
ACESTEP_SKIP_VRAM_PREFLIGHT and the logger.warning/skip behavior inside an if
torch.cuda.is_available() guard so non-CUDA (CPU/MPS/XPU) paths are untouched;
then call self._vram_preflight_check(actual_batch_size=actual_batch_size,
audio_duration=audio_duration, guidance_scale=guidance_scale) as before (and
return vram_error if not None) outside or conditional on CUDA as appropriate.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 870a9690-16d4-44f6-be0e-869258ef88ae

📥 Commits

Reviewing files that changed from the base of the PR and between 84f3a08 and 5166c17.

📒 Files selected for processing (1)

acestep/core/generation/handler/generate_music.py

CodeRabbit review feedback on PR ace-step#1091: the gc.collect() / torch.cuda.empty_cache() pair and the skip-preflight warning should not run on CPU/MPS/XPU. empty_cache is a no-op there and the preflight check itself early-returns on non-CUDA, but a user setting ACESTEP_SKIP_VRAM_PREFLIGHT=1 on a CPU box would see a misleading "VRAM pre-flight skipped" warning referencing GPU semantics that don't apply. Wrap the whole block in torch.cuda.is_available() so non-CUDA paths are untouched.

ChuxiJ reviewed Apr 15, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 19, 2026

View reviewed changes

Comment thread acestep/core/generation/handler/generate_music.py Outdated

FlexOr2 mentioned this pull request Apr 22, 2026

refactor: pluggable guidance variant registry #1129

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add ACESTEP_SKIP_VRAM_PREFLIGHT env var + gc before preflight check#1091

fix: add ACESTEP_SKIP_VRAM_PREFLIGHT env var + gc before preflight check#1091
FlexOr2 wants to merge 3 commits intoace-step:mainfrom
FlexOr2:fix/optional-vram-preflight

FlexOr2 commented Apr 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 12, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

ChuxiJ left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FlexOr2 commented Apr 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope

Risk and Compatibility

Why

Regression Checks

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

ChuxiJ left a comment

Choose a reason for hiding this comment

Please raise the skip notice from debug to warning

Optional nit (not blocking)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FlexOr2 commented Apr 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 12, 2026 •

edited

Loading

Please raise the skip notice from `debug` to `warning`