Question on Reproducing Speedup Number in the Paper

Hi, thank you for releasing the dataset for your inspiring work. I'm trying to reproduce the speedup over torch/torch.compile mentioned in your [paper](https://arxiv.org/pdf/2509.14279) (appendix C.2 table 4, on H100).

I'm using the script `run_kernel.py` to evaluate all the kernels in your `highlighted` folder. I've been able to reproduce most of the results. However, I noticed that on `mnist_cross_entropy forward`, **my measured speedup over torch.compile is very different from the paper (8.96 vs 24.87)**. And I'm wondering if this is expected (e.g., the kernel used in paper for this task is different from the one in the `highlighted` folder)?

Attached is a screenshot of my reproduced results. Thank you very much for your time and help!

<img width="1295" height="296" alt="Image" src="https://github.com/user-attachments/assets/5d418ec0-0d28-4c2d-8f06-2e3d67ec25a6" />

 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on Reproducing Speedup Number in the Paper #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question on Reproducing Speedup Number in the Paper #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions