Skip to content

Add tuned parameters for Qwen/Qwen2.5-32B #8966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 12, 2025

Conversation

yarongmu-google
Copy link
Contributor

num_q_heads num_kv_heads max_num_batched_tokens max_num_seqs input_len output_len best_block_size
10 2 4096 128 1800 128 (128, 32)
5 1 4096 128 1800 128 (128, 64)
10 2 1024 1024 2000 48 (128, 32)
5 1 1024 1024 2000 48 (128, 32)
10 2 2048 128 1800 128 (128, 32)
5 1 2048 128 1800 128 (128, 32)
10 2 1024 128 1800 128 (128, 32)
5 1 1024 128 1800 128 (128, 32)
10 2 4096 1024 2000 48 (128, 32)
5 1 4096 1024 2000 48 (128, 64)
10 2 2048 1024 2000 48 (128, 32)
5 1 2048 1024 2000 48 (128, 32)

Copy link
Collaborator

@yaochengji yaochengji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks, Yarong!

Signed-off-by: Yarong Mu <[email protected]>
@yaochengji yaochengji enabled auto-merge (squash) April 11, 2025 22:06
@yaochengji yaochengji merged commit 4eea5e1 into pytorch:master Apr 12, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants