Add Nemotron 3 to tests via tiny model#5278
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…o nemotron3-tiny-tests
There was a problem hiding this comment.
Thanks!
The CI is red:
FAILED tests/test_dpo_trainer.py::TestDPOTrainer::test_train[trl-internal-testing/tiny-NemotronHForCausalLM] - RuntimeError: causal_conv1d with channel last layout requires strides (x.stride(0) and x.stride(2)) to be multiples of 8
FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train[trl-internal-testing/tiny-NemotronHForCausalLM] - RuntimeError: causal_conv1d with channel last layout requires strides (x.stride(0) and x.stride(2)) to be multiples of 8
qgallouedec
left a comment
There was a problem hiding this comment.
thanks!! just a few comments
| kwargs = {} | ||
| if "NemotronH" in model_id: | ||
| kwargs["gradient_checkpointing"] = False | ||
| kwargs["use_cpu"] = True |
There was a problem hiding this comment.
really not sure about this. we don't train on cpu, so why testing it + we wouldn't know it a gpu-specific issue is introduced
is it possible that this error originates from what params are used to build the model? |
|
Fixes applied. There is an issue with some dependencies that needs to be addressed in transformers. |
|
Hi @sergiopaniego, is there any update on this or the corresponding upstream PR? |
|
Upstream PR has just been merged! |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0423cdb. Configure here.
yes! It could be approved and merged if tests are green 😄 |
|
failed test is unrelated |
|
Gradient checkpointing PR in transformers (huggingface/transformers#45625) is now merged so I've update the version here to the next transformers one (5.7.0). It should be now finally fixed once that version is released 😄 Tests are green |
|
thanks! I think I'll wait for #5637 to be merged or closed before merging this one :) |
|
makes sense, ping me so I can update this PR once #5637 is merged |

What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
@qgallouedec @albertvillanova
Note
Medium Risk
Primarily affects test coverage and internal tiny-model generation, but introduces a new architecture path (NemotronH) and tightens minimum
transformersversion expectations, which could cause CI/runtime mismatches if versions diverge.Overview
Adds a new
trl-internal-testing/tiny-NemotronHForCausalLMtiny model generator entry (hybrid Mamba+attention) so unit tests can exercise NemotronH/Nemotron 3–style models, including mirroring the model’s float32-only Mamba parameters.Wires this tiny model into existing tokenizer/data-utils and trainer test parametrizations (
SFTTrainer/DPOTrainer), withskipif(transformers<5.7.0)guards due to NemotronH gradient-checkpointing requirements. Also updates the Nemotron 3 SFT example to requiretransformers>=5.7.0and removes the forced disabling of gradient checkpointing.Reviewed by Cursor Bugbot for commit b12633b. Bugbot is set up for automated code reviews on this repo. Configure here.