-
Notifications
You must be signed in to change notification settings - Fork 29.7k
[Fast image processors] Improve handling of image-like inputs other than images (segmentation_maps) #39489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fast image processors] Improve handling of image-like inputs other than images (segmentation_maps) #39489
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does seem to simplify a lot! Not 100% sure it BC?
segmentation_maps_kwargs = kwargs.copy() | ||
segmentation_maps_kwargs["do_normalize"] = False | ||
segmentation_maps_kwargs["do_rescale"] = False | ||
segmentation_maps_kwargs["input_data_format"] = ChannelDimension.FIRST | ||
# Nearest interpolation is used for segmentation maps instead of BILINEAR. | ||
segmentation_maps_kwargs["interpolation"] = pil_torch_interpolation_mapping[PILImageResampling.NEAREST] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would pass these explicitly instead of updating them but a small nit maybe complicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still need to remove them from the kwargs as they have the default values for image processing there, so not sure if that simplifies...
@ArthurZucker Should be 100% BC 🤗 |
[For maintainers] Suggested jobs to run (before merge) run-slow: beit, dpt, eomt, idefics2, idefics3, llava_next, llava_onevision, mobilenet_v2, mobilevit, qwen2_vl, sam, smolvlm, vitmatte |
…han images (segmentation_maps) (huggingface#39489) * improve handlike of other image-like inputs in fast image processors * fix issues with _prepare_images_structure * update sam image processor fast * use dict update
What does this PR do?
As the title says, unbloats (😉) a lot of the fast processing code for models needing to processed segmentation maps, trimaps, depth_maps etc.