Skip to content

Fix: bypass torchcodec crash in _save_mp3 on PyTorch 2.10+#1145

Open
psale wants to merge 1 commit intoace-step:mainfrom
psale:main
Open

Fix: bypass torchcodec crash in _save_mp3 on PyTorch 2.10+#1145
psale wants to merge 1 commit intoace-step:mainfrom
psale:main

Conversation

@psale
Copy link
Copy Markdown

@psale psale commented Apr 24, 2026

Summary

AudioSaver._save_mp3() crashes on environments where torchcodec's shared libraries
(libtorchcodec_core*.so) cannot load — notably Google Colab with PyTorch 2.10.0+cu128.

The fix replaces torchaudio.save() with direct soundfile.write() for the intermediate
WAV file in _save_mp3. This is a minimal, surgical change: the ffmpeg-based MP3 encoding
pipeline is untouched.

Root Cause

In torchaudio >= 2.10, torchaudio.save() unconditionally routes through
save_with_torchcodec() for all formats, even when backend='soundfile' is specified
explicitly. If torchcodec cannot load its native FFmpeg shared libraries (common on Colab
due to mismatched libavutil.so versions or the torch_dtype_float4_e2m1fn_x2 symbol),
the call fails with a hard RuntimeError.

The error chain is:
_save_mp3 → torchaudio.save(..., backend='soundfile') → save_with_torchcodec() ← dispatched unconditionally → load_torchcodec_shared_libraries() → RuntimeError: Could not load libtorchcodec

Since save_audio() has a soundfile fallback for non-MP3 formats (lines 317–338), those
survive. But MP3 has no fallback — it re-raises immediately, so the entire generation
crashes after the audio is already computed.

The Fix

Replace the torchaudio.save() call in _save_mp3 (used only to write a temporary WAV)
with a direct soundfile.write() call. The soundfile library is already a declared
dependency (soundfile>=0.13.1 in pyproject.toml) and writes WAV files without touching
torchcodec.

Risk Assessment

Area Risk Notes
MP3 export quality None WAV→MP3 conversion via ffmpeg is unchanged
Other formats None Untouched by this change
Non-Colab environments None soundfile produces identical WAV output
Platforms (CUDA/MPS/XPU/CPU) None soundfile is platform-agnostic

Tested On

  • Google Colab (Ubuntu 22.04, A100, PyTorch 2.10.0+cu128, torchaudio 2.10.0+cu128)
  • Generation completes and MP3 files are saved successfully after patch

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Enhanced MP3 export stability and reliability through an improved audio processing pipeline.
  • Refactor

    • Modernized the internal audio file serialization mechanism in the MP3 export workflow to use a more robust audio library approach.

Replaced torchaudio.save with soundfile.write for intermediate WAV files to avoid DLL/symbol errors in Colab/Linux environments. Also updated unit tests.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

The PR replaces torchaudio.save with soundfile.write for intermediate WAV file generation in the MP3 export path, converting torch tensors to numpy arrays and transposing from channel-first to sample-first layout before writing.

Changes

Cohort / File(s) Summary
Audio utils implementation
acestep/audio_utils.py
Replaced torchaudio-based WAV serialization with soundfile (sf.write) for MP3 export, including tensor-to-numpy conversion and channel dimension transposition.
Audio utils tests
acestep/audio_utils_test.py
Updated MP3 export tests to validate soundfile.write invocation and added regression tests ensuring torchaudio.save is not called during MP3 export.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • ChuxiJ

Poem

🐰 A hop, a skip, through audio streams,
From torch to sound, with numpy dreams,
WAV files dance in new array form,
MP3s emerge, reborn!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly references the specific fix—bypassing torchcodec crash in _save_mp3 on PyTorch 2.10+—which matches the main technical change: replacing torchaudio.save with soundfile.write to avoid torchcodec library loading failures.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
acestep/audio_utils.py (2)

302-308: Consider extending the torchcodec bypass to the wav/flac path as a follow-up.

The same torchcodec dispatch that motivated this PR affects torchaudio.save(..., backend='soundfile') on PyTorch 2.10+ — so format == "flac"/"wav" here will still crash in the Colab environment described in the PR, even though MP3 is now fixed. The exception handler at lines 321-342 will likely catch it and fall back to sf.write, but that makes the slow/noisy path the common case on 2.10+.

Out of scope for this PR (which explicitly limits the change to MP3), but worth a follow-up to route WAV/FLAC writes directly through sf.write for the same reason.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/audio_utils.py` around lines 302 - 308, The torchaudio.save call for
WAV/FLAC still goes through the torchcodec dispatch and can fail on PyTorch
2.10+, so update the save routine to bypass torchaudio when format is "wav" or
"flac" by dispatching directly to soundfile's sf.write (the same path used in
the exception fallback) instead of calling torchaudio.save; specifically modify
the branch around torchaudio.save (referencing torchaudio.save, the format
variable, and sf.write) to short-circuit for format == "wav" || format == "flac"
and write via sf.write with the same audio_tensor/sample_rate handling to avoid
triggering the exception handler and unnecessary slow fallbacks.

288-288: Nit: redundant local import soundfile as sf.

soundfile is now imported at module scope (line 22), so the local imports inside save_audio (line 288 for the wav32 path, line 326 in the exception fallback) are redundant. Safe to remove for consistency.

♻️ Proposed cleanup
@@ -285,8 +285,6 @@
                 if format == "wav32":
                     try:
-                        import soundfile as sf
-                        
                         # Use soundfile directly for 32-bit float
                         audio_np = audio_tensor.transpose(0, 1).numpy() # [channels, samples] -> [samples, channels]
@@ -323,7 +321,6 @@
                 logger.error(f"[AudioSaver] MP3 export failed without fallback: {e}")
                 raise
             try:
-                import soundfile as sf
                 audio_np = audio_tensor.transpose(0, 1).numpy()  # -> [samples, channels]

Also applies to: 326-326

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/audio_utils.py` at line 288, The local redundant "import soundfile as
sf" statements inside save_audio (present near the wav32 path branch and the
exception fallback) should be removed because soundfile is already imported at
module scope; update the save_audio function by deleting those local imports and
relying on the module-level sf, ensuring any references in the wav32 branch and
the exception handling branch continue to call sf.* without re-importing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@acestep/audio_utils.py`:
- Around line 302-308: The torchaudio.save call for WAV/FLAC still goes through
the torchcodec dispatch and can fail on PyTorch 2.10+, so update the save
routine to bypass torchaudio when format is "wav" or "flac" by dispatching
directly to soundfile's sf.write (the same path used in the exception fallback)
instead of calling torchaudio.save; specifically modify the branch around
torchaudio.save (referencing torchaudio.save, the format variable, and sf.write)
to short-circuit for format == "wav" || format == "flac" and write via sf.write
with the same audio_tensor/sample_rate handling to avoid triggering the
exception handler and unnecessary slow fallbacks.
- Line 288: The local redundant "import soundfile as sf" statements inside
save_audio (present near the wav32 path branch and the exception fallback)
should be removed because soundfile is already imported at module scope; update
the save_audio function by deleting those local imports and relying on the
module-level sf, ensuring any references in the wav32 branch and the exception
handling branch continue to call sf.* without re-importing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aaec91ae-0945-4722-841c-5a75d84234ac

📥 Commits

Reviewing files that changed from the base of the PR and between d5d958e and 00ad6f7.

📒 Files selected for processing (2)
  • acestep/audio_utils.py
  • acestep/audio_utils_test.py

@dvc50
Copy link
Copy Markdown

dvc50 commented May 2, 2026

This fix needs merging in. I used it successfully on my Linux Mint 22 install. It fixed the torchcodec error at the mp3 creation stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants