Skip to content

Fix kontext finetune issue when batch size >1 #11921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mymusise
Copy link
Contributor

What does this PR do?

Problem

Training fails with shape mismatch when using custom instance prompts and batch_size > 1 due to partial batches from the dataloader.

Solution

Set drop_last=True in BucketBatchSampler to ensure consistent batch sizes during training. This prevents shape mismatch errors when the last batch is smaller than the specified batch size.

Testing

Verified the fix resolves the shape mismatch error by running training with custom instance prompts and batch_size > 1. No shape mismatch occurs after this change.

Fixes # (issue)


Before submitting

  • This PR fixes a bug in the training script.
  • Did you read the contributor guideline?
  • Did you read our philosophy doc?
  • Was this discussed/approved via a GitHub issue or the forum? (N/A if not discussed)
  • Did you make sure to update the documentation with your changes? (N/A for code-only bugfix)
  • Did you write any new necessary tests? (Manual test performed)

Who can review?

Anyone in the community is free to review the PR once the tests have passed.

For this example script and dataloader logic, relevant reviewers could be:

Signed-off-by: mymusise <[email protected]>
@mymusise mymusise changed the title Fix kontext finetune issue where batch size >1 Fix kontext finetune issue when batch size >1 Jul 14, 2025
@asomoza
Copy link
Member

asomoza commented Jul 15, 2025

cc: @linoytsaban

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@linoytsaban linoytsaban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mymusise!
and thanks @asomoza for the tag, I think I had it initially as False so to not waste samples with small datasets, but didn't make the needed adjustments to support batches of variant sizes. Better to have it as True as this PR suggests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants