Cache auto-tuning? #15

vedantroy · 2023-05-03T02:32:39Z

When running the model--especially in a serverless environment where there may be many cold starts--it would be desirable to cache the auto-tuning results. Is this possible?

fpgaminer · 2023-05-03T02:33:51Z

Thank you for the issue. Yes, this is possible, I have it in-progress.

vedantroy · 2023-05-03T02:39:52Z

Thank you for the issue. Yes, this is possible, I have it in-progress.

Interesting. Last time I used triton, I wasn't sure if they exposed an API for caching autotune results--I'm guessing they now do? Might take a stab at hacking on this myself, if I can find the API, since I'm trying to ship something soon.

fpgaminer · 2023-05-03T02:44:29Z

I wish they did. They do however have a cache_key attribute on kernels. So I was going to throw something together by storing the results out as JSON into a cache directory, and keeping it keyed off of cache_key so it re-uses results only if the environment and kernel source are the same (just like they do for caching kernel compilations). e.g. llama_mlp_fused_4_kernel.fn.cache_key.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache auto-tuning? #15

Cache auto-tuning? #15

vedantroy commented May 3, 2023

fpgaminer commented May 3, 2023

vedantroy commented May 3, 2023

fpgaminer commented May 3, 2023

Cache auto-tuning? #15

Cache auto-tuning? #15

Comments

vedantroy commented May 3, 2023

fpgaminer commented May 3, 2023

vedantroy commented May 3, 2023

fpgaminer commented May 3, 2023