Fix Zamba2MambaMixer ignoring use_mamba_kernels=False#44853
Fix Zamba2MambaMixer ignoring use_mamba_kernels=False#44853Cyrilvallez merged 7 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Thanks for addressing the underlying issue! 🤗
Just one question: the attribute use_mamba_kernels is only defined in NemotronHConfig, why do you use it in zamba2?
Additionally, you modified modeling_nemotron_h.py, but, as explained in that file header, that file was automatically generated from modular_nemotron_h.py. I think you should modify modular_nemotron_h.py instead.
CC: some maintainers that modified these files recently are @ydshieh, @Cyrilvallez
The problem comes fork
Same idea as before |
|
Thanks for the fixing commit 93840ed: more clear now! 🤗 Let's wait for the opinion of some maintainers. |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: nemotron_h, zamba2 |
In this case, |
What does this PR do?
Zamba2MambaMixer.__init__callslazy_load_kernel("mamba-ssm")andlazy_load_kernel("causal-conv1d")unconditionally. Models that inherit from it (like NemotronH) and setuse_mamba_kernels=Falsein their config have the flag ignored, causing failures when thekernelspackage is installed butcausal-conv1dCUDA kernels are not available.Fix: Gate the
lazy_load_kernelcalls behindgetattr(config, "use_mamba_kernels", True)in the Zamba2 modular.Related to: huggingface/trl#5278
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker