Skip to content

Upload testing suite for DistillationTrainer#5615

Open
cmpatino wants to merge 5 commits intohuggingface:mainfrom
cmpatino:kd-distillation-tests
Open

Upload testing suite for DistillationTrainer#5615
cmpatino wants to merge 5 commits intohuggingface:mainfrom
cmpatino:kd-distillation-tests

Conversation

@cmpatino
Copy link
Copy Markdown
Collaborator

@cmpatino cmpatino commented Apr 21, 2026

What does this PR do?

Uploads tests for the DistillationTrainer

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

  • No AI usage: the PR was written entirely by a human.
  • AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
  • AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.


Note

Low Risk
Test-only change; no production code paths are modified, with risk limited to potential CI flakiness due to model/dataset dependencies and optional accelerator coverage.

Overview
Adds a new experimental test module covering DistillationTrainer and related utilities.

The suite validates DistillationConfig argument constraints (teacher server vs liger, server URL required, reverse-KL mode restrictions), correctness of build_teacher_request_inputs trimming/prompt-length logic, and generalized_jsd_loss behavior across beta, temperature, and reductions. It also adds integration-style tests that run a short train with a local teacher (including checkpoint output), optionally exercises the liger path, verifies parity between local-teacher and mocked vLLM-teacher-server losses in sampled top-1 mode, and checks _RepeatBatchDataLoader forwards set_epoch.

Reviewed by Cursor Bugbot for commit 43950bb. Bugbot is set up for automated code reviews on this repo. Configure here.

@cmpatino cmpatino marked this pull request as ready for review April 21, 2026 13:59
Copy link
Copy Markdown
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding tests to DistillationTrainer, it had none, so this is directionally welcome. A few things worth addressing before merge:

  1. Heavy use of DistillationTrainer.__new__(...) + manual attribute assignment, plus five Dummy*/custom mock classes. Every test assembles a fake trainer by bypassing __init__ and setting ~10 private attributes inline. This couples every test to internal attribute names. Any rename or new attribute used in compute_loss silently breaks the suite. It also cuts against two principles: consistency (the rest of tests/experimental/, e.g. test_gkd_trainer.py, loads a tiny real model and exercises the real __init__) and simplicity (the mock scaffolding and duplicated attribute setup in test_server_teacher_path_handles_variable_prompt_lengths / ..._padded_completions is exactly that). A tiny-model fixture would remove most of it.

  2. No end-to-end test. One short trainer.train() on a tiny student+teacher would catch far more than the current mocked suite combined, and would make most of the attribute-juggling tests redundant. Plus it would be more align with the principle of testing behavior over implementation.

torch.testing.assert_close(local_loss, server_loss)


def test_sampled_mode_keeps_teacher_argmax_for_forward_support():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is tautological. The expected value is computed by calling the same private helper _jsd_divergence with the same support/mask construction that compute_loss uses internally, so it tests wiring, not semantics. Any bug inside _jsd_divergence is invisible. Derive the expected value from first principles (plain JSD formula) or drop it.

report_to="none",
)

assert caught == []
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fragile: any unrelated DeprecationWarning from a dependency will break it. The pytest.raises is already the real assertion

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback! I addressed your comments with the following changes:

  • Followed GKD's tests and used a small model and dataset to test the trainer.
  • Improved the Liger tests by following the testing approach from the SFTTrainer.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread tests/experimental/test_distillation_trainer.py Outdated
Comment thread tests/experimental/test_distillation_trainer.py Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1033b7c. Configure here.

Comment thread tests/experimental/test_distillation_trainer.py
@cmpatino cmpatino requested a review from qgallouedec April 23, 2026 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants