Skip to content

fix: Added code to match interpolation of Google's ViT implementation… #38626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lerolynn
Copy link

@lerolynn lerolynn commented Jun 5, 2025

Fixes #28180

What does this PR do?

Fixes the interpolation method in ViT image processors to match the original Google ViT implementation. Changes the default resampling from BILINEAR to BICUBIC interpolation.

Implementation Notes

This implementation follows @NielsRogge's comments from #28180:

  • Added pixel value verification using torch.allclose similar to the DINOv2 conversion
  • Verification ensures HuggingFace preprocessing matches the reference implementation
  • DeiT verification is currently skipped with pass - can be updated in a follow-up PR if needed

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@NielsRogge @amyeroberts @qubvel

@qubvel
Copy link
Member

qubvel commented Jun 6, 2025

@bot /style

Copy link
Contributor

github-actions bot commented Jun 6, 2025

Style fixes have been applied. View the workflow run here.

@qubvel
Copy link
Member

qubvel commented Jun 6, 2025

Hey @lerolynn, thanks for the PR!
Other model image processors depend on ViT image processors, so we should either update them as well (in case they should use BICUBIC interpolation) or change "# Copied from" statement above them, here is a list of files to check:

  • src/transformers/models/efficientnet/image_processing_efficientnet.py: copy does not match models.vit.image_processing_vit.ViTImageProcessor.resize at line 127
  • src/transformers/models/imagegpt/image_processing_imagegpt.py: copy does not match models.vit.image_processing_vit.ViTImageProcessor.resize at line 113
  • src/transformers/models/layoutlmv2/image_processing_layoutlmv2.py: copy does not match models.vit.image_processing_vit.ViTImageProcessor.resize at line 155
  • src/transformers/models/layoutlmv3/image_processing_layoutlmv3.py: copy does not match models.vit.image_processing_vit.ViTImageProcessor.resize at line 183
  • src/transformers/models/pvt/image_processing_pvt.py: copy does not match models.vit.image_processing_vit.ViTImageProcessor.resize at line 104
  • src/transformers/models/segformer/image_processing_segformer.py: copy does not match models.vit.image_processing_vit.ViTImageProcessor.resize at line 140

Also, it would be super helpful if you could provide a link to the original code + line where the correct interpolation is specified, thanks!

@qubvel qubvel added the Vision label Jun 6, 2025
@lerolynn
Copy link
Author

lerolynn commented Jun 6, 2025

Hey @qubvel , thanks for the review!

I can check these models and make the changes if required. I didn't want to make so many changes in a single pull request initially, but if it's fine I can integrate the changes!

There are a few questions I have:

  1. There is a CI failure for style - should I just run black to change the style or is that alright?
  2. Following the DINOv2 conversion, I added an assert to check if the preprocessing matches the original implementation
image I'm a little wary of doing this because it might affect users who used the wrong interpolation/normalization/resize to fine-tune their models. I think it's better to assert only if the default values are used - what are your thoughts on this?

@qubvel
Copy link
Member

qubvel commented Jun 6, 2025

  1. The CI fails because the above files depend on ViT image processor -> as I said, we should either modify "Copied from" statement on top of them or update the interpolation to make CI happy. You can run make fix-copies to apply changes automatically and then review / revert / update applied changes

  2. It's OK to have assert in the conversion script, we will add 🚨 to indicate it's a breaking PR as well + highlight this on release notes

@lerolynn
Copy link
Author

lerolynn commented Jun 7, 2025

Got it, I'll check the original implementations of all the models on the list and update it in a few days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Verify interpolation of image processors
2 participants