Skip to content

Commit 4eea5e1

Browse files
Add tuned parameters for Qwen/Qwen2.5-32B (#8966)
Signed-off-by: Yarong Mu <[email protected]>
1 parent 4583051 commit 4eea5e1

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

torch_xla/experimental/tuned_block_sizes.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ def _simplify_key_ragged_paged_attention(q_head_num, kv_head_num, token_num,
3333

3434

3535
# TODO: add more tuned block sizes in the table
36+
# q_head_num, kv_head_num, token_num, max_model_len
3637
_ragged_attention_table = {
3738
(32, 8, 4096, 2048): (128, 64),
3839
(4, 1, 4096, 2048): (128, 128),
@@ -58,6 +59,12 @@ def _simplify_key_ragged_paged_attention(q_head_num, kv_head_num, token_num,
5859
(4, 1, 2048, 128): (32, 32),
5960
(32, 8, 1024, 128): (32, 32),
6061
(1, 1, 1024, 128): (32, 32),
62+
(10, 2, 4096, 2048): (128, 32), # Qwen/Qwen2.5-32B
63+
(10, 2, 2048, 2048): (128, 32), # Qwen/Qwen2.5-32B
64+
(10, 2, 1024, 2048): (128, 32), # Qwen/Qwen2.5-32B
65+
(5, 1, 4098, 2048): (128, 64), # Qwen/Qwen2.5-32B
66+
(5, 1, 2048, 2048): (128, 32), # Qwen/Qwen2.5-32B
67+
(5, 1, 1024, 2048): (128, 32), # Qwen/Qwen2.5-32B
6168
}
6269

6370

0 commit comments

Comments
 (0)