optimize the performance of FlashBert Path for HPU #575

kaixuanliu · 2025-04-09T10:01:45Z

Use WhereIsAI/UAE-Large-V1 model to do benchmark, below is the throughput(seq/s) comparison:

bs	before	after
1	199.65	219.61
2	239.61	284.78
4	321.37	373.13
8	506.40	549.61
16	759.07	822.17
32	1028.31	1285.57
64	1130.67	1708.73
128	OOM	1030.06

kaixuanliu · 2025-04-09T11:54:12Z

@regisss @Narsil pls help review, thx!

Signed-off-by: Liu, Kaixuan <[email protected]>

Narsil

LGTM

Narsil · 2025-04-10T12:30:55Z

backends/python/server/text_embeddings_server/models/flash_bert.py

@@ -323,19 +323,21 @@ def batch_type(self) -> Union[FlashBatch, PaddedBatch]:
    def embed(self, batch: Union[FlashBatch, PaddedBatch]) -> List[Embedding]:
        if isinstance(batch, PaddedBatch):
            input_lens = batch.attention_mask.cumsum(-1)[:, -1].to(torch.int32)
-            max_input_lens = input_lens.max().item()
+            max_input_lens = 0  # This value will not be used


Suggested change

max_input_lens = 0 # This value will not be used

NIT

Hi, sorry , there may be misunderstanding. Here I commented "This value will not be used" means this variable can be any value, but we need to keep it here, as we need to pass it to L352

@Narsil , can you help double check?

I guess there are cases where the forward of the model does need a right value for this right? Otherwise why not removing it there?

Well, this is a common file shared by CPU/XPU andd HPU devices. On CPU/XPU, we do need this variable with exact meaning, while on HPU, we do not have real varlen_attention API, so we pass attn_mask to replace its functionality. Here we just need to set a random value for max_input_lens. This line cannot be deleted, as we need to pass it to L352.

Got it 👍

kaixuanliu · 2025-04-16T00:58:07Z

@regisss @Narsil Hi, can you help merge this PR?

optimize the performance of FlashBert Path for HPU

fc9dee1

Signed-off-by: Liu, Kaixuan <[email protected]>

Narsil approved these changes Apr 10, 2025

View reviewed changes

regisss approved these changes Apr 10, 2025

View reviewed changes

regisss merged commit 5a791e5 into huggingface:main Apr 16, 2025
2 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize the performance of FlashBert Path for HPU #575

optimize the performance of FlashBert Path for HPU #575

kaixuanliu commented Apr 9, 2025 •

edited

Loading

kaixuanliu commented Apr 9, 2025

Narsil left a comment

Narsil Apr 10, 2025

kaixuanliu Apr 10, 2025

kaixuanliu Apr 11, 2025

regisss Apr 16, 2025

kaixuanliu Apr 16, 2025

regisss Apr 16, 2025

kaixuanliu commented Apr 16, 2025

optimize the performance of FlashBert Path for HPU #575

optimize the performance of FlashBert Path for HPU #575

Conversation

kaixuanliu commented Apr 9, 2025 • edited Loading

kaixuanliu commented Apr 9, 2025

Narsil left a comment

Choose a reason for hiding this comment

Narsil Apr 10, 2025

Choose a reason for hiding this comment

kaixuanliu Apr 10, 2025

Choose a reason for hiding this comment

kaixuanliu Apr 11, 2025

Choose a reason for hiding this comment

regisss Apr 16, 2025

Choose a reason for hiding this comment

kaixuanliu Apr 16, 2025

Choose a reason for hiding this comment

regisss Apr 16, 2025

Choose a reason for hiding this comment

kaixuanliu commented Apr 16, 2025

kaixuanliu commented Apr 9, 2025 •

edited

Loading