v0.1.7.post3 #1688
LeiWang1999
announced in
Announcements
v0.1.7.post3
#1688
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What's Changed
S_q != S_kvby @hukongyi in [BugFix] Fix bugs of varlen attention forward examples caused byS_q != S_kv#1530local.varbuffer aslocalby @LeiWang1999 in [Bugfix] Avoid consideringlocal.varbuffer aslocal#1541T.Fillfor local.var by @LeiWang1999 in [Bugfix] Fix ofT.Fillfor local.var #1543Hin deepseek sparse mla backward via split-H by @Rachmanino in [Enhancement] Support largerHin deepseek sparse mla backward via split-H #1548ParallelOPNodeandCopyNodeby @LeiWang1999 in [Refactor] Introduce layout annotations forParallelOPNodeandCopyNode#1539tl_pipeline_sync. by @c8ef in [Misc] Remove unusedtl_pipeline_sync. #1566test_tilelang_language_cooperative.pyby @silentCoder-dev in [Fix] Add register to read A ptr intest_tilelang_language_cooperative.py#1593import tilelangon CPU-only machines without CUDA libraries by @XuehaiPan in [Enhancement] Allowimport tilelangon CPU-only machines without CUDA libraries #1481T.sync_warp&T.shfl_sync; change extern pdl into intrin by @silentCoder-dev in [Feature] addT.sync_warp&T.shfl_sync; change extern pdl into intrin #1614ForwardRefusage in v2 frontend ([BUG] Incorrect usage ofForwardRefin v2 frontend #1619) by @kurisu6912 in [BugFix] FixForwardRefusage in v2 frontend (#1619) #1621ConstrVisitortosrc/transform/common/constr_visitor.hfor reuse by @silentCoder-dev in [Refactor] MoveConstrVisitortosrc/transform/common/constr_visitor.hfor reuse #1622T.reduce_absmaxto use less abs call by @kurisu6912 in [Feat] ImproveT.reduce_absmaxto use less abs call #1626nvidia-cuda-nvccasnvccby @clouds56 in [Enhancement][CUDA] Supportnvidia-cuda-nvccasnvcc#1528k_dim==4and open rocm-ci for gemmsr by @benenzhu in [BugFix] Correct index_map selection for transposed A matrix in MFMA Layout withk_dim==4and open rocm-ci for gemmsr #1627T.Pipelined#1263) by @kurisu6912 in [Feat] Allow dangling producer in wasp pipeline planning (#1263) #1647examples/deepseek_v32/sparse_mla_fwd.pyby @GoldenStain in [Example] Remove redundant T.copy inexamples/deepseek_v32/sparse_mla_fwd.py#1634cp.reduce.async.bulk.tensorby @Rachmanino in [Feature] Supportcp.reduce.async.bulk.tensor#1667ThreadsyncwithConstrVisitorby @silentCoder-dev in [Feature] ReimplementThreadsyncwithConstrVisitor#1631ParallelLoopTransformerby @LeiWang1999 in [Clean][Refactor] Phaseout Legacy PassParallelLoopTransformer#1672New Contributors
S_q != S_kv#1530tl_pipeline_sync. #1566examples/deepseek_v32/sparse_mla_fwd.py#1634Full Changelog: v0.1.7.post1...v0.1.7.post3
This discussion was created from the release v0.1.7.post3.
Beta Was this translation helpful? Give feedback.
All reactions