Skip to content

Fix nested vocab_size for DistillationTrainer and GOLDTrainer#5592

Open
Beichen-Ma wants to merge 3 commits intohuggingface:mainfrom
Beichen-Ma:fix-nested-vocab-size
Open

Fix nested vocab_size for DistillationTrainer and GOLDTrainer#5592
Beichen-Ma wants to merge 3 commits intohuggingface:mainfrom
Beichen-Ma:fix-nested-vocab-size

Conversation

@Beichen-Ma
Copy link
Copy Markdown

@Beichen-Ma Beichen-Ma commented Apr 19, 2026

Fixes #5585. The root cause was in DistillationTrainer.__init__ calls teacher_model.resize_token_embeddings(self.model.config.vocab_size), which raises AttributeError on configs where vocab_size is nested (e.g. Qwen3_5Config exposes it under config.text_config).

Fix

  • Replaced with self.model.config.get_text_config().vocab_size, which transformers defines on PretrainedConfig to return the text sub-config on nested configs and self on flat ones.
  • Applied the same fix to GOLDTrainer (identical duplicated pattern). Updated an existing GOLD unit test whose SimpleNamespace mock didn't implement get_text_config.

Tests

  • Reproduced the original error on Qwen3.5-0.8B student / Qwen3.5-2B teacher before the fix.
  • After the fix, trainer init and one training step complete cleanly.
  • pytest tests/experimental/test_gold_trainer.py passes.
  • DistillationTrainer has no existing unit test file. Given it's experimental and the fix is one line mirroring an existing pattern, I scoped this PR to the minimal fix rather than introducing a new test file.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

  • No AI usage: the PR was written entirely by a human.
  • AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
  • AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.


Note

Low Risk
Single-line logic change in trainer initialization plus a test mock update; low risk aside from potential edge cases if a model config lacks get_text_config().

Overview
Fixes teacher embedding resizing during initialization for DistillationTrainer and GOLDTrainer by reading vocabulary size via self.model.config.get_text_config().vocab_size instead of self.model.config.vocab_size, supporting models with nested text configs.

Updates the existing GOLD init unit test mock config to implement get_text_config() so the new code path is exercised.

Reviewed by Cursor Bugbot for commit d1e8543. Bugbot is set up for automated code reviews on this repo. Configure here.

@cmpatino
Copy link
Copy Markdown
Collaborator

The changes look good to me! Let's wait for the review from another maintainer before merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] DistillationTrainer fails with Qwen3.5 due to nested config.vocab_size attribute

3 participants