Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache auto-tuning? #15

Open
vedantroy opened this issue May 3, 2023 · 3 comments
Open

Cache auto-tuning? #15

vedantroy opened this issue May 3, 2023 · 3 comments

Comments

@vedantroy
Copy link

When running the model--especially in a serverless environment where there may be many cold starts--it would be desirable to cache the auto-tuning results. Is this possible?

@fpgaminer
Copy link
Owner

Thank you for the issue. Yes, this is possible, I have it in-progress.

@vedantroy
Copy link
Author

Thank you for the issue. Yes, this is possible, I have it in-progress.

Interesting. Last time I used triton, I wasn't sure if they exposed an API for caching autotune results--I'm guessing they now do? Might take a stab at hacking on this myself, if I can find the API, since I'm trying to ship something soon.

@fpgaminer
Copy link
Owner

I wish they did. They do however have a cache_key attribute on kernels. So I was going to throw something together by storing the results out as JSON into a cache directory, and keeping it keyed off of cache_key so it re-uses results only if the environment and kernel source are the same (just like they do for caching kernel compilations). e.g. llama_mlp_fused_4_kernel.fn.cache_key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants