-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
Hi, thank you for releasing the dataset for your inspiring work. I'm trying to reproduce the speedup over torch/torch.compile mentioned in your paper (appendix C.2 table 4, on H100).
I'm using the script run_kernel.py to evaluate all the kernels in your highlighted folder. I've been able to reproduce most of the results. However, I noticed that on mnist_cross_entropy forward, my measured speedup over torch.compile is very different from the paper (8.96 vs 24.87). And I'm wondering if this is expected (e.g., the kernel used in paper for this task is different from the one in the highlighted folder)?
Attached is a screenshot of my reproduced results. Thank you very much for your time and help!

Metadata
Metadata
Assignees
Labels
No labels