fix: Added code to match interpolation of Google's ViT implementation… #38626

lerolynn · 2025-06-05T23:04:14Z

What does this PR do?

Fixes the interpolation method in ViT image processors to match the original Google ViT implementation. Changes the default resampling from BILINEAR to BICUBIC interpolation.

Implementation Notes

This implementation follows @NielsRogge's comments from #28180:

Added pixel value verification using torch.allclose similar to the DINOv2 conversion
Verification ensures HuggingFace preprocessing matches the reference implementation
DeiT verification is currently skipped with pass - can be updated in a follow-up PR if needed

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@NielsRogge @amyeroberts @qubvel

…huggingface#28180)

qubvel · 2025-06-06T10:15:16Z

@bot /style

github-actions · 2025-06-06T10:16:47Z

Style fixes have been applied. View the workflow run here.

qubvel · 2025-06-06T10:18:54Z

Hey @lerolynn, thanks for the PR!
Other model image processors depend on ViT image processors, so we should either update them as well (in case they should use BICUBIC interpolation) or change "# Copied from" statement above them, here is a list of files to check:

src/transformers/models/efficientnet/image_processing_efficientnet.py: copy does not match models.vit.image_processing_vit.ViTImageProcessor.resize at line 127
src/transformers/models/imagegpt/image_processing_imagegpt.py: copy does not match models.vit.image_processing_vit.ViTImageProcessor.resize at line 113
src/transformers/models/layoutlmv2/image_processing_layoutlmv2.py: copy does not match models.vit.image_processing_vit.ViTImageProcessor.resize at line 155
src/transformers/models/layoutlmv3/image_processing_layoutlmv3.py: copy does not match models.vit.image_processing_vit.ViTImageProcessor.resize at line 183
src/transformers/models/pvt/image_processing_pvt.py: copy does not match models.vit.image_processing_vit.ViTImageProcessor.resize at line 104
src/transformers/models/segformer/image_processing_segformer.py: copy does not match models.vit.image_processing_vit.ViTImageProcessor.resize at line 140

Also, it would be super helpful if you could provide a link to the original code + line where the correct interpolation is specified, thanks!

lerolynn · 2025-06-06T10:59:50Z

Hey @qubvel , thanks for the review!

I can check these models and make the changes if required. I didn't want to make so many changes in a single pull request initially, but if it's fine I can integrate the changes!

There are a few questions I have:

There is a CI failure for style - should I just run black to change the style or is that alright?
Following the DINOv2 conversion, I added an assert to check if the preprocessing matches the original implementation

I'm a little wary of doing this because it might affect users who used the wrong interpolation/normalization/resize to fine-tune their models. I think it's better to assert only if the default values are used - what are your thoughts on this?

qubvel · 2025-06-06T12:13:22Z

The CI fails because the above files depend on ViT image processor -> as I said, we should either modify "Copied from" statement on top of them or update the interpolation to make CI happy. You can run make fix-copies to apply changes automatically and then review / revert / update applied changes
It's OK to have assert in the conversion script, we will add 🚨 to indicate it's a breaking PR as well + highlight this on release notes

lerolynn · 2025-06-07T11:31:56Z

Got it, I'll check the original implementations of all the models on the list and update it in a few days

lerolynn added 2 commits June 6, 2025 06:35

fix: Added code to match interpolation of Google's ViT implementation (…

0ded77a

…huggingface#28180)

style: fix import order and whitespace issues

430917c

Apply style fixes

8ddcdb3

qubvel added the Vision label Jun 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Added code to match interpolation of Google's ViT implementation… #38626

fix: Added code to match interpolation of Google's ViT implementation… #38626

lerolynn commented Jun 5, 2025 •

edited

Loading

Uh oh!

qubvel commented Jun 6, 2025

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

qubvel commented Jun 6, 2025 •

edited

Loading

Uh oh!

lerolynn commented Jun 6, 2025

Uh oh!

qubvel commented Jun 6, 2025 •

edited

Loading

Uh oh!

lerolynn commented Jun 7, 2025

Uh oh!

Uh oh!

fix: Added code to match interpolation of Google's ViT implementation… #38626

Are you sure you want to change the base?

fix: Added code to match interpolation of Google's ViT implementation… #38626

Conversation

lerolynn commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Implementation Notes

Before submitting

Who can review?

Uh oh!

qubvel commented Jun 6, 2025

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

qubvel commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lerolynn commented Jun 6, 2025

Uh oh!

qubvel commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lerolynn commented Jun 7, 2025

Uh oh!

Uh oh!

lerolynn commented Jun 5, 2025 •

edited

Loading

qubvel commented Jun 6, 2025 •

edited

Loading

qubvel commented Jun 6, 2025 •

edited

Loading