Align MLlama code with Transformers 4.55 #2319

pbielak · 2025-10-23T11:24:21Z

What does this PR do?

After the Transformers 4.55 update, one of the attention classes failed to compute the attention scores due to mismatches between arguments in the torch.matmul op. This commits updates the whole Mllama code base to be fully aligned with the code in Transformers 4.55. In particular, it uses the _attn_implementation instead of custom classes.

github-actions · 2025-10-23T11:25:05Z

The code quality check failed, please run make style.

HuggingFaceDocBuilderDev · 2025-10-23T11:31:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-10-28T10:20:20Z

The code quality check failed, please run make style.

After the Transformers 4.55 update, one of the attention classes failed to compute the attention scores due to mismatches between arguments in the `torch.matmul` op. This commits updates the whole `Mllama` code base to be fully aligned with the code in Transformers 4.55. In particular, it: - uses the `_attn_implementation` instead of custom classes, - applies the changes from PR [1] - handles `_attn_implementation` passed to the model - fix argument preparation in `gaudi_fused_sdpa_attention` [1] huggingface/transformers#40083

astachowiczhabana · 2025-10-30T07:31:01Z

examples/image-to-text/run_pipeline.py

            ]
            args.prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

+    if model_type == "mllama" and args.use_flash_attention:


why not just allow user to select attn_implementation + add readme section about it?

This is done, because this script is also used by other models (such as llava), which are not yet aligned with the attn_implementation interface.

astachowiczhabana · 2025-10-30T07:31:19Z

optimum/habana/transformers/integrations/gaudi_fused_sdpa_attention.py

 ) -> tuple[torch.Tensor, None]:
    bsz, num_heads, tgt_len, head_dim = query.shape

+    softmax_mode = "fast" if os.getenv("FLASH_ATTENTION_FAST_SOFTMAX") == "1" else "None"


since attention implementation is now separated from model we don't have to use env vars, can you explore if it's possible to use kwargs instead of env vars?

Makes sense - I will have a look at it

pbielak force-pushed the dev/pbielak/update-mllama-implementation branch 2 times, most recently from 9710363 to a270769 Compare October 23, 2025 11:27

pbielak self-assigned this Oct 23, 2025

pbielak force-pushed the dev/pbielak/update-mllama-implementation branch 2 times, most recently from f5c5287 to 939e520 Compare October 28, 2025 10:19

pbielak force-pushed the dev/pbielak/update-mllama-implementation branch from 939e520 to 720f6a7 Compare October 28, 2025 10:22

pbielak force-pushed the dev/pbielak/update-mllama-implementation branch from 720f6a7 to 79a9ebc Compare October 28, 2025 13:02

pbielak requested a review from astachowiczhabana October 28, 2025 13:02

pbielak marked this pull request as ready for review October 28, 2025 13:02

pbielak requested a review from regisss as a code owner October 28, 2025 13:02

astachowiczhabana requested changes Oct 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Align MLlama code with Transformers 4.55 #2319

Align MLlama code with Transformers 4.55 #2319

Uh oh!

pbielak commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 28, 2025

Uh oh!

astachowiczhabana Oct 30, 2025

Uh oh!

pbielak Oct 30, 2025

Uh oh!

astachowiczhabana Oct 30, 2025

Uh oh!

pbielak Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Align MLlama code with Transformers 4.55 #2319

Are you sure you want to change the base?

Align MLlama code with Transformers 4.55 #2319

Uh oh!

Conversation

pbielak commented Oct 23, 2025

What does this PR do?

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 28, 2025

Uh oh!

astachowiczhabana Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

pbielak Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

astachowiczhabana Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

pbielak Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants