Skip to content

Fix Aria tests #37444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Apr 24, 2025
Merged

Fix Aria tests #37444

merged 18 commits into from
Apr 24, 2025

Conversation

jiqing-feng
Copy link
Contributor

Reproduce:
TRANSFORMERS_TEST_DEVICE=cuda RUN_SLOW=1 pytest -rA tests/models/aria/test_modeling_aria.py::AriaForConditionalGenerationIntegrationTest::test_batched_generation

Error log:

FAILED tests/models/aria/test_modeling_aria.py::AriaForConditionalGenerationIntegrationTest::test_batched_generation - RuntimeError: INDICES element is out of DATA bounds, id=100352 axis_dim=100352

I found 4 issues in this case:

  1. The pad_token_id is the same as vocab_size which will cause out of bounds. It cames from both config.json and added_tokens.json have pad token.
  2. The input image and input prompt should have same batch size
  3. The input pixel value shoule be cast to the model.dtype, see the model hub usage
  4. This model uses torch MHA, which will read the weight directly and apply torch.matmul without dequantizing the weights. We should skip quantizing the MHA weight.

After fixing the 4 issues, the test can correctly run.

@github-actions github-actions bot marked this pull request as draft April 11, 2025 06:49
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@jiqing-feng jiqing-feng marked this pull request as ready for review April 11, 2025 06:52
@github-actions github-actions bot requested a review from ydshieh April 11, 2025 06:52
@jiqing-feng
Copy link
Contributor Author

Hi @ydshieh @SunMarc . Please review this PR. Thanks!

@Rocketknight1
Copy link
Member

cc @zucchini-nlp for VLMs as well

@zucchini-nlp
Copy link
Member

run-slow: aria

Copy link
Contributor

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/aria']
quantizations: [] ...

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jiqing-feng Thank you a lot. I left a few comments.

processor = AutoProcessor.from_pretrained("rhymes-ai/Aria")
processor.tokenizer.pad_token_id = model.config.pad_token_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pad_token_id is the same as vocab_size which will cause out of bounds. It cames from both config.json and added_tokens.json have pad token.

Do you know if this is also an issue on the Hub repository which needs a fix too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, if pad token is OOV for official weights, we'll see errors. For ex, in batch generation where one sequence emits EOS earlier than the other

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see there are 2 different pad token in the model hub and am not sure if it's reasonable, but I can make sure transformers didn't handle this case. We can either change the model hub or handle this kind of case in transformers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now, the tokenizer has a new pad token added and the embeddings were not resized for that. Yes, can you open a PR in the hub repo to remove <pad> from tokenizer and assign pad token to <end_of_text> token? Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokenizer_config.json there still have

    "100352": {
      "content": "<pad>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    }

I think what @zucchini-nlp means is: load the tokenizer from Hub, assign pad_token_id to 2 , save it , and open a PR with the newly saved files.

But what <end_of_text> means here @zucchini-nlp , do you simply means the eos_token_id which is 2 in the config ..?

Copy link
Member

@zucchini-nlp zucchini-nlp Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, that is the token with id=2, in other words the id specified in config file. We need to use the same pad token on config and in tokenizer for consistency, otherwise it might raise errors as in this PR already

@jiqing-feng the PR you did is great but unfortunately doesn't solve the issue. Removing the added_tokens file is not going to remove <pad> from vocab entirely. I'd recommend to do as suggested by @ydshieh , load assign and save back. Currently we still have <pad> in special_token_map.json and tokenizer.json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change can correct assign the pad token id to 2, please check it. Thanks

Comment on lines 461 to 479
prompts = [processor.apply_chat_template([message], add_generation_prompt=True) for message in messages]
images = [[image1, image2], [image2]]
inputs = processor(text=prompts, images=images, padding=True, return_tensors="pt").to(model.device)
inputs["pixel_values"] = inputs["pixel_values"].to(model.dtype)

EXPECTED_OUTPUT = (
[
"<|im_start|>user\n<fim_prefix><fim_suffix> <image>\n <image>\n USER: What's the difference of two images?\n ASSISTANT:<fim_prefix><fim_suffix> <image>\n USER: Describe the image.\n ASSISTANT:<|im_end|>\n <|im_start|>assistant\n The first image features a cute, light-colored puppy sitting on a paved surface with",
"<|im_start|>user\n<fim_prefix><fim_suffix> <image>\n USER: Describe the image.\n ASSISTANT:<|im_end|>\n <|im_start|>assistant\n The image shows a young alpaca standing on a grassy hill. The alpaca has",
], # cpu output
[
"<|im_start|>user\n<fim_prefix><fim_suffix> <image>\n <image>\n USER: What's the difference of two images?\n ASSISTANT:<fim_prefix><fim_suffix> <image>\n USER: Describe the image.\n ASSISTANT:<|im_end|>\n <|im_start|>assistant\n The first image features a cute, light-colored puppy sitting on a paved surface with",
"<|im_start|>user\n<fim_prefix><fim_suffix> <image>\n USER: Describe the image.\n ASSISTANT:<|im_end|>\n <|im_start|>assistant\n The image shows a young alpaca standing on a patch of ground with some dry grass. The",
], # cuda output
[
"<|im_start|>user\n<fim_prefix><fim_suffix> <image>\n <image>\n USER: What's the difference of two images?\n ASSISTANT:<fim_prefix><fim_suffix> <image>\n USER: Describe the image.\n ASSISTANT:<|im_end|>\n <|im_start|>assistant\n The first image shows a cute, light-colored puppy sitting on a paved surface with",
"<|im_start|>user\n<fim_prefix><fim_suffix> <image>\n USER: Describe the image.\n ASSISTANT:<|im_end|>\n <|im_start|>assistant\n The image shows a young alpaca standing on a grassy hill. The alpaca has",
], # xpu output
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try to apply similar changes as in https://github.com/huggingface/transformers/pull/37126/files, the changes in tests/models/llama/test_modeling_llama.py there 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we didn't find any failed tests on llama. To avoid confuse, maybe we can do it in our next PR. Let's focus on Aria in this PR.

@zucchini-nlp
Copy link
Member

Hm slow tests are oom-ing for aria, do you think we need to add @require_torch_large_gpu @ydshieh ?

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 14, 2025

Hm slow tests are oom-ing for aria, do you think we need to add @require_torch_large_gpu @ydshieh ?

Add @require_torch_large_gpu means it will not be run in daily CI, only run in some occasions that team members have less focus to check.

However, all those integration tests already use 4-bits and there is no smaller model size for this model architecture. Let's add it anyway.

Thank you for bring this issue up 👍

@jiqing-feng
Copy link
Contributor Author

Hi @ydshieh @zucchini-nlp . I have fixed your comments, please review it again. Thanks!

Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jiqing-feng ! Looks good to me, left one tiny comment so we get better error messages when tests fails

Approving but before merging can you update hub config as suggested in above comment, and remove the line processor.tokenizer.pad_token_id = model.config.pad_token_id from tests (if pad token is assigned correctly, we shouldn't explicitly set it to eos again)

@jiqing-feng
Copy link
Contributor Author

Thanks @jiqing-feng ! Looks good to me, left one tiny comment so we get better error messages when tests fails

Approving but before merging can you update hub config as suggested in above comment, and remove the line processor.tokenizer.pad_token_id = model.config.pad_token_id from tests (if pad token is assigned correctly, we shouldn't explicitly set it to eos again)

Yes, I will remove it once 20 is merged

@jiqing-feng
Copy link
Contributor Author

Hi @ydshieh @zucchini-nlp The model hub PR is merged, I also removed the pad_token setting. The failed tests seem not related to my changes.

@zucchini-nlp
Copy link
Member

@ydshieh can you merge?

@zucchini-nlp
Copy link
Member

Tests were fixed on main, a rebase should help now

@jiqing-feng
Copy link
Contributor Author

Hi @zucchini-nlp . Yes, the tests all passed after I rebased the main branch. This PR is ready to be merged.

@zucchini-nlp zucchini-nlp merged commit b7f7aa7 into huggingface:main Apr 24, 2025
14 checks passed
rziga pushed a commit to rziga/transformers that referenced this pull request Apr 24, 2025
* update aria tests

Signed-off-by: jiqing-feng <[email protected]>

* add cuda tests

Signed-off-by: jiqing-feng <[email protected]>

* check outputs for cpu and cuda and xpu

Signed-off-by: jiqing-feng <[email protected]>

* check outputs for cpu and cuda and xpu

Signed-off-by: jiqing-feng <[email protected]>

* check outputs for cpu and cuda and xpu

Signed-off-by: jiqing-feng <[email protected]>

* check output for each device

Signed-off-by: jiqing-feng <[email protected]>

* fix style

Signed-off-by: jiqing-feng <[email protected]>

* fix style

Signed-off-by: jiqing-feng <[email protected]>

* fix xpu output

Signed-off-by: jiqing-feng <[email protected]>

* add comments and use assert list equal

Signed-off-by: jiqing-feng <[email protected]>

* rm pad token assign

Signed-off-by: jiqing-feng <[email protected]>

---------

Signed-off-by: jiqing-feng <[email protected]>
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* update aria tests

Signed-off-by: jiqing-feng <[email protected]>

* add cuda tests

Signed-off-by: jiqing-feng <[email protected]>

* check outputs for cpu and cuda and xpu

Signed-off-by: jiqing-feng <[email protected]>

* check outputs for cpu and cuda and xpu

Signed-off-by: jiqing-feng <[email protected]>

* check outputs for cpu and cuda and xpu

Signed-off-by: jiqing-feng <[email protected]>

* check output for each device

Signed-off-by: jiqing-feng <[email protected]>

* fix style

Signed-off-by: jiqing-feng <[email protected]>

* fix style

Signed-off-by: jiqing-feng <[email protected]>

* fix xpu output

Signed-off-by: jiqing-feng <[email protected]>

* add comments and use assert list equal

Signed-off-by: jiqing-feng <[email protected]>

* rm pad token assign

Signed-off-by: jiqing-feng <[email protected]>

---------

Signed-off-by: jiqing-feng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants