Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Idefics 3! #32473

Merged
merged 53 commits into from
Sep 25, 2024
Merged
Changes from 1 commit
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
842a28d
Add Idefics 3!
andimarafioti Aug 6, 2024
afce007
fixes to make both pipelines identical
andimarafioti Aug 7, 2024
3e3b31d
fix for quantized models
andimarafioti Aug 8, 2024
9c8ffc4
First pass at the review
andimarafioti Aug 8, 2024
7e3d7a6
remove vocab size from the main config (it's still in the text_config)
andimarafioti Aug 8, 2024
dd99bca
hot fix for merve
andimarafioti Aug 8, 2024
ddac9ec
Apply suggestions from code review
andimarafioti Aug 9, 2024
188bb76
re-add model_type for text_config
andimarafioti Aug 9, 2024
43fb214
remove support for old_cache
andimarafioti Aug 9, 2024
c9e0d85
remove hidden_size from main config
andimarafioti Aug 9, 2024
1b2b89c
rename idefics3 HF repo
andimarafioti Aug 9, 2024
6ff766f
few changes suggested in the PR
andimarafioti Aug 12, 2024
11c2e1a
fix to input_data_format computation
andimarafioti Aug 12, 2024
c1048ed
remove overwrite of _autoset_attn_implementation following @zucchini-…
andimarafioti Aug 12, 2024
a163564
improve example
andimarafioti Aug 12, 2024
6f0a479
few improvements from amy's review
andimarafioti Aug 12, 2024
8361fce
big change to enable processing input images as numpy arrays
andimarafioti Aug 12, 2024
32970d0
Changes to the code to uniformize processor kwargs
andimarafioti Aug 13, 2024
c504f00
image processing tests
andimarafioti Aug 13, 2024
a914e41
image processing tests fixes and some bugs they discovered
andimarafioti Aug 13, 2024
6722d13
addressed review comments from Yoni
andimarafioti Aug 13, 2024
0533eda
fix modeling tests
andimarafioti Aug 13, 2024
b034091
remove special tokens that are not special
andimarafioti Aug 15, 2024
47fb7ce
fixes tests
andimarafioti Aug 15, 2024
4032a6f
skip failing tests - they also fail for idefics2
andimarafioti Aug 21, 2024
757e834
added paper and readded the tests with multi gpu, who knows
andimarafioti Aug 27, 2024
7797279
Update docs/source/en/model_doc/idefics3.md
andimarafioti Aug 30, 2024
b478124
Apply suggestions from code review
andimarafioti Aug 30, 2024
ada6219
review amy until image_processing_idefics3
andimarafioti Aug 30, 2024
164fbe8
last comments from Amy
andimarafioti Sep 2, 2024
000c8ea
review amy
andimarafioti Sep 6, 2024
4d02e0c
Update src/transformers/models/idefics3/image_processing_idefics3.py
andimarafioti Sep 4, 2024
3bf03c2
Update src/transformers/models/idefics3/modeling_idefics3.py
andimarafioti Sep 4, 2024
57bfd51
Update docs/source/en/model_doc/idefics3.md
andimarafioti Sep 6, 2024
63b1d7f
doc improvement - amy review
andimarafioti Sep 6, 2024
6325fbc
fix runtime error during fine-tuning
andimarafioti Sep 10, 2024
76b8892
amy's review
andimarafioti Sep 16, 2024
9a20306
Update src/transformers/models/idefics3/image_processing_idefics3.py
andimarafioti Sep 16, 2024
3129920
Update src/transformers/models/idefics3/image_processing_idefics3.py
andimarafioti Sep 16, 2024
e1a10b3
Update src/transformers/models/idefics3/modeling_idefics3.py
andimarafioti Sep 16, 2024
4c3756f
ruff
andimarafioti Sep 16, 2024
fbaf07e
amy's comment on the order
andimarafioti Sep 16, 2024
87fa179
ruff ruff
andimarafioti Sep 17, 2024
23d4cf8
fix copies
andimarafioti Sep 17, 2024
9e925b9
square images when they are not splitted
andimarafioti Sep 17, 2024
215b636
ruff :(
andimarafioti Sep 17, 2024
2967974
Update src/transformers/models/idefics3/image_processing_idefics3.py
andimarafioti Sep 18, 2024
ee041bf
Update tests/models/idefics3/test_processing_idefics3.py
andimarafioti Sep 18, 2024
4aad266
fix small bug introduced in refactor
andimarafioti Sep 18, 2024
f1ae8ae
amy's image processing changes
andimarafioti Sep 19, 2024
39d88b2
fixes peft tests and ruff
andimarafioti Sep 19, 2024
383f0db
modify to_pil_image from transformers. and review from emanuele.
andimarafioti Sep 23, 2024
682b82b
add modified to_pil_image
andimarafioti Sep 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
modify to_pil_image from transformers. and review from emanuele.
andimarafioti committed Sep 23, 2024
commit 383f0dbca31b5da59c247bfa06a15fce6ab1d809
2 changes: 1 addition & 1 deletion src/transformers/models/idefics2/modeling_idefics2.py
Original file line number Diff line number Diff line change
@@ -1097,7 +1097,7 @@ class Idefics2PreTrainedModel(PreTrainedModel):

def _init_weights(self, module):
std = (
self.config.text_config.initializer_range
self.config.initializer_range
if hasattr(self.config, "initializer_range")
else self.config.text_config.initializer_range
)
37 changes: 1 addition & 36 deletions src/transformers/models/idefics3/image_processing_idefics3.py
Original file line number Diff line number Diff line change
@@ -19,7 +19,7 @@
import numpy as np

from ...image_processing_utils import BaseImageProcessor, BatchFeature
from ...image_transforms import PaddingMode, pad, to_channel_dimension_format
from ...image_transforms import PaddingMode, pad, to_channel_dimension_format, to_pil_image
from ...image_utils import (
IMAGENET_STANDARD_MEAN,
IMAGENET_STANDARD_STD,
@@ -222,40 +222,6 @@ def make_pixel_mask(
return mask


# Custom to_pil_image function to support image_mode
def to_pil_image(
image: Union[np.ndarray, "PIL.Image.Image", TensorType],
image_mode: Optional[str] = None,
) -> "PIL.Image.Image":
"""
Converts `image` to a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if
needed.

Args:
image (`PIL.Image.Image` or `numpy.ndarray` or `torch.Tensor` or `tf.Tensor`):
The image to convert to the `PIL.Image` format.
image_mode (`str`, *optional*):
The mode of the image.

Returns:
`PIL.Image.Image`: The converted image.
"""
if isinstance(image, PIL.Image.Image):
return image
# Convert all tensors to numpy arrays before converting to PIL image
image = to_numpy_array(image)

# If the channel has been moved to first dim, we put it back at the end.
image = to_channel_dimension_format(
image, ChannelDimension.LAST, infer_channel_dimension_format(image, num_channels=(1, 3, 4))
)

# If there is a single channel, we squeeze it, as otherwise PIL can't handle it.
image = np.squeeze(image, axis=-1) if image.shape[-1] == 1 else image
image = image.astype(np.uint8)
return PIL.Image.fromarray(image, mode=image_mode)


def convert_to_rgb(
image: np.ndarray,
palette: Optional[PIL.ImagePalette.ImagePalette] = None,
@@ -282,7 +248,6 @@ def convert_to_rgb(
data_format = input_data_format if data_format is None else data_format

mode = "P" if palette is not None else None
# Custom to_pil_image function to support image_mode
image = to_pil_image(image, image_mode=mode)
if image.mode == "P" and palette is not None:
image.putpalette(palette)
2 changes: 1 addition & 1 deletion src/transformers/models/idefics3/modeling_idefics3.py
Original file line number Diff line number Diff line change
@@ -625,7 +625,7 @@ class Idefics3PreTrainedModel(PreTrainedModel):
# Copied from transformers.models.idefics2.modeling_idefics2.Idefics2PreTrainedModel._init_weights
def _init_weights(self, module):
std = (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is, this always assigns self.config.text_config.initializer_range while, from what I understand, it should assign self.config.initializer_range in case hasattr(self.config, "initializer_range"). Is it possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right. This seems to also be a mistake on idefics2. Thanks!

self.config.text_config.initializer_range
self.config.initializer_range
if hasattr(self.config, "initializer_range")
else self.config.text_config.initializer_range
)