You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running the model--especially in a serverless environment where there may be many cold starts--it would be desirable to cache the auto-tuning results. Is this possible?
The text was updated successfully, but these errors were encountered:
Thank you for the issue. Yes, this is possible, I have it in-progress.
Interesting. Last time I used triton, I wasn't sure if they exposed an API for caching autotune results--I'm guessing they now do? Might take a stab at hacking on this myself, if I can find the API, since I'm trying to ship something soon.
I wish they did. They do however have a cache_key attribute on kernels. So I was going to throw something together by storing the results out as JSON into a cache directory, and keeping it keyed off of cache_key so it re-uses results only if the environment and kernel source are the same (just like they do for caching kernel compilations). e.g. llama_mlp_fused_4_kernel.fn.cache_key.
When running the model--especially in a serverless environment where there may be many cold starts--it would be desirable to cache the auto-tuning results. Is this possible?
The text was updated successfully, but these errors were encountered: