Fixed switched token_type_ids and attention_mask #412

rolshoven · 2023-08-25T15:36:49Z

I was having the same error as mentioned in #338 where I could not export my model with model_base stsb-xlm-roberta-base. After some debugging, I noticed that the attention_mask and token_type_ids were switched in the function forward (line 50) in setfit/exporters/onnx.py. The error then occurs because we are trying to look up both the token_type_id embedding with index 0 and the one with index 1, but there is only one embedding in the matrix. I believe that this did not happen with other model bases because they have more than two token_type embeddings.

However, I must confess that I was not yet able to test this fix with other models that previously worked. We should definitely do this before we merge this code. To make the code safer, I also made us of kwargs when calling self.model_body instead of positional arguments. In my case, I was able to export the model after this small fix.

rolshoven · 2023-08-28T07:47:46Z

Just noticed that the exported onnx model does only work if we switch the attention_mask and token_type_ids of the generated dcitionary after tokenization, which is probably caused by my change. I will investigate further and report back soon.

rolshoven · 2023-08-28T09:25:22Z

I reverted my previous changes and implemented the fix directly in function export_onnx_setfit_model of src/setfit/exporters/onnx.py. The problem was that the tokenizer dictionary keys were not ordered. I implemented a generic solution that uses the signature function from module inspect to make as few assumptions on the parameters as possible while ensuring the correct order of the input values.

rolshoven · 2023-09-18T06:42:20Z

I forgot to include the import ~~but now the tests should work~~

Edit: okay, there are still errors, I'll analyse and address them soon!

andreeapricopi · 2023-09-25T14:18:39Z

Tested with distiluse-base-multilingual-cased-v2, which leads to: ValueError: tuple.index(x): x not in tuple in setfit/exporters/onnx.py: 94 in <lambda>

Code snippet for reproductibility:

from setfit import SetFitModel
from sentence_transformers import SentenceTransformer
from setfit import SetFitHead, SetFitHead, SetFitModel
from setfit.exporters.onnx import export_onnx

model_id = "sentence-transformers/distiluse-base-multilingual-cased-v2" 

model_body = SentenceTransformer(model_id)
model_head = SetFitHead(in_features = model_body.get_sentence_embedding_dimension(), out_features = 4)
model = SetFitModel(model_body = model_body, model_head = model_head)

export_onnx(model.model_body,
            model.model_head,
            opset=12,
            output_path="dummy_path")

tomaarsen · 2023-11-24T12:12:16Z

Perhaps we can adopt the approach from #435 for ONNX, rather than sticking with the current export_onnx. It seems more consistent.

rolshoven mentioned this pull request Aug 25, 2023

error export onxx with body roberta #338

Open

rolshoven marked this pull request as draft August 28, 2023 07:46

rolshoven force-pushed the main branch from 937c408 to 1ca3207 Compare August 28, 2023 09:13

Fixed order of input parameters for onnx export

70610f5

rolshoven force-pushed the main branch from 1ca3207 to 70610f5 Compare September 18, 2023 06:40

rolshoven marked this pull request as ready for review September 18, 2023 06:41

rolshoven marked this pull request as draft September 18, 2023 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixed switched token_type_ids and attention_mask #412

Fixed switched token_type_ids and attention_mask #412

Uh oh!

rolshoven commented Aug 25, 2023

Uh oh!

rolshoven commented Aug 28, 2023

Uh oh!

rolshoven commented Aug 28, 2023

Uh oh!

rolshoven commented Sep 18, 2023 •

edited

Loading

Uh oh!

andreeapricopi commented Sep 25, 2023 •

edited

Loading

Uh oh!

tomaarsen commented Nov 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fixed switched token_type_ids and attention_mask #412

Are you sure you want to change the base?

Fixed switched token_type_ids and attention_mask #412

Uh oh!

Conversation

rolshoven commented Aug 25, 2023

Uh oh!

rolshoven commented Aug 28, 2023

Uh oh!

rolshoven commented Aug 28, 2023

Uh oh!

rolshoven commented Sep 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andreeapricopi commented Sep 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomaarsen commented Nov 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rolshoven commented Sep 18, 2023 •

edited

Loading

andreeapricopi commented Sep 25, 2023 •

edited

Loading