This is a small enhancement release of Tilus.
Highlights
- Add more examples: flash attention with kv-cache, flash linear attention deocde
- Fix a bug when multiple tilus process access the dispatch table in cache
- Add targets
sm_100,sm_103,sm_110,sm_120andsm_121.
What's Changed
- [Docs] Update README.md by @yaoyaoding in #11
- [CI] Use RTX 4090 for docs building by @yaoyaoding in #12
- [Docs] Update README.md by @yaoyaoding in #13
- [Package] Rename to under @NVIDIA organization by @nekomeowww in #15
- [Docs] Update installation guide by @yaoyaoding in #17
- [CI] Fix concurrency issue by @yaoyaoding in #18
- [Docs] Correct gflops to tflops in examples by @YichengDWu in #19
- [Example] Add the attention example with kv-cache by @yaoyaoding in #21
- [Example] Add example for decoding kernel of flash linear attention by @yaoyaoding in #25
- [Example] Add a kernel in the flash linear attention by @yaoyaoding in #26
- [Example] Add the fused kernel for decoding of flash linear attention by @yaoyaoding in #27
- [Tuning] Add lock to cache dir when dump the tuning result by @yaoyaoding in #28
- [Target] Add targets properties by @yaoyaoding in #29
- [Bump] Bump version of hidet from 0.6.0 to 0.6.1 by @yaoyaoding in #30
New Contributors
- @nekomeowww made their first contribution in #15
- @YichengDWu made their first contribution in #19
Full Changelog: v0.1...v0.1.1