-
Notifications
You must be signed in to change notification settings - Fork 288
initial support blackwell #747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @johnnynunez , thanks for bringing this up! Could we hold this PR and wait for the official release of torch 2.6 and blackwell software stack? |
Yeah for sure! I put all codegen blackwell family on pytorch. |
|
This is huge! |
@yzh119 can you merge? |
@johnnynunez remind #747 (comment) |
well, sure... pytorch is coming this week : M6: Release Day (1/29/25) |
Is there a prebuilt that can work for B200? |
What performance improvement should we expect out of the box on B200 compared to H100 SXM5 for different size models ? 8B, 70B, 400B. I expected to get some benefit even for 8B (e.g. 30% for low batch sizes), but I am getting no benefit using Llama 8B. Also is there any planned on in-progress work on flashinfer utilizing B200 specific capabilities (e.g. Tensor Memory Accelerator) ? |
10.0 blackwell b100/b200
12.0 blackwell rtx50