Open
Description
Feature request
We want to standardize the logic flow through Processor classes. Since processors can have different kwargs depending on the model and modality, we are adding a TypedDict
for each modality to keep track of which kwargs are accepted.
The initial design is merged and an example model is modified to follow the new uniform processor kwargs in #31198. Also #31197 has two more examples with standardized API.
This design has to be shipped to all the processors in Transformers, and appreciate contributions.
Below is an incomplete list of models that need standardization, feel free to add a model if it's missing:
- Align Uniformize model processors #31368
- AltClip Uniformize model processors #31368
- BLIP Uniformize model processors #31368
- BLIP-2 Uniformize model processors #31368
- Bridgetower Uniformize model processors #31368
- Chameleon -> Uniformize kwargs for chameleon processor #32181
- Chinese CLIP -> Uniformize model processors #31368
- CLIP -> in progress by @davidgxue
- ClipSeg -> Uniformize model processors (models *with* special arg names) #32841
- Donut Uniformize model processors #31368
- Flava -> Uniformize model processors (models w/o special arg names) #32845
- Fuyu -> Uniformize kwargs for image-text-to-text processors #32544
- GIT uniformize git processor #33668
- Grounding DINO Uniformize kwargs for processors - GroundingDINO #31964
- Idefics -> Uniformize kwargs for Idefics/2 processors #32568
- Idefics-2 -> Uniformize kwargs for Idefics/2 processors #32568
- InstructBlip -> Uniformize kwargs for image-text-to-text processors #32544
- InstructBlipVideo Uniformize model processors (models w/o special arg names) #32845
- Kosmos-2 -> Uniformize kwargs for image-text-to-text processors #32544
- LayoutLM (1, 2, 3) -> Uniformize kwargs for Layoutlm (2, 3, X) processors #32180
- LLaVa -> Uniformize kwargs for LLaVa processor and update docs #32858
- LLaVa-NeXT -> Uniformize kwargs for image-text-to-text processors #32544
- LLaVa-NeXT-Video Uniformize LlavaNextVideoProcessor kwargs #35613
- MGP-STR Uniformize model processors (models w/o special arg names) #32845
- Nouga -> Uniformize model processors (models *with* special arg names) #32841
- OneFormer -> uniformize kwargs for OneFormer #34547
- Owlv2 Uniformize OwlViT and Owlv2 processors #35700
- OwlVIT Uniformize OwlViT and Owlv2 processors #35700
- Paligemma -> Uniformize kwargs for Paligemma processor and update docs #33571
- Pix2Struct -> Uniformize kwargs for image-text-to-text processors #32544
- Pixtral -> Uniformize kwargs for Pixtral processor #33521
- SAM -> uniformize kwargs for SAM #34578
- SigLip -> Uniformize model processors (models w/o special arg names) #32845
- TrOCR -> 🚨🚨🚨 Uniformize kwargs for TrOCR Processor #34587
- TVP -> Uniformize model processors (models w/o special arg names) #32845
- Udop -> Uniformize kwargs for Udop processor and update docs #33628
- VideoLLaVa -> Uniformize model processors (models w/o special arg names) #32845
- VILT -> Uniformize model processors (models w/o special arg names) #32845
- VisionTextDualEncoder -> uniformize kwargs for VisionTextDualEncoder #34563
- X-CLIP -> Uniformize model processors (models w/o special arg names) #32845
Note: For now we'll start with image or image+text, #31368 is an ongoing PR that has also audio processor standardization
Motivation
.
Your contribution
.