Release v0.2.0 · AMD-AGI/Primus-Turbo

What's Changed

fix(deepep): eliminate compile warning. by @zhenhuang12 in #123
feat(deep_ep): support num_worst_token and use_defaulta_stream_as_comm_stream for internode. by @zhenhuang12 in #120
feat(token_dispatcher): add DeepEPTokenDispatcher for MoE. by @zhenhuang12 in #114
build: support multi-arch compilation (gfx942;gfx950) by @xiaobochen-amd in #124
feat: attn add is_v3_atomic_fp32 env control by @xiaobochen-amd in #126
chore: move router to moe dir by @xiaobochen-amd in #125
[Sync-free MoE] feat: add swiglu, geglu and tokens_per_expert_to_mask api by @RuibinCheung in #122
[HOTFIX] triton version requirement by @RuibinCheung in #130
[Sync-free MoE] feat: refine act func by @RuibinCheung in #129
fix(deepep): fix bug when use expert_capacity_factor by @zhenhuang12 in #127
chore(docker): update default image to rocm/primus:v25.9_gfx942 by @xiaobochen-amd in #133
[Aiter] Update aiter to fix pybind11 issue by @GeneDer in #132
feat: gemm fp8 support cktile backend for both tensorwise and rowwise by @kyle-256 in #131
[Fix] import activation module by @GeneDer in #137
feat(deepep): move deep_ep header file to primus_turbo common header dir by @zhenhuang12 in #138
feat(permute): permute op support to compute tokens_per_expert by @zhenhuang12 in #140
feat: grouped gemm tensorwise impl update by @kyle-256 in #139
chore: support jax=0.6.2 & jax cicd by @xiaobochen-amd in #136
feat: add elementwise(unary/binary/quant/dequant) kernel by @xiaobochen-amd in #135
chore: remove uselsee debug code by @kyle-256 in #141
chore: refactor grouped gemm blockwise python code by @xiaobochen-amd in #142
feat: add build ext and opt build efficiency by @xiaobochen-amd in #143
feat: skip patch torch_extension when version >=2.8.0 by @zhenhuang12 in #144
chore: refactor gemm fp8 api by @xiaobochen-amd in #145
fix: skip disabled arch files in build by @xiaobochen-amd in #147
chore: support quant gemm when m%128!=0 by @kyle-256 in #146
feat: unify fp8 gemm API by @RuibinCheung in #148
chore: update aiter version. by @xiaobochen-amd in #151
add public primus-safe link in readme by @wenxie-amd in #152
fix readme's typo error by @wenxie-amd in #153
fix(deepep): fix internode_combine hang when set num_worst_token > 0 by @zhenhuang12 in #149
[feat]: CK based block quant by @kyle-256 in #155
feat(gemm): Add mxfp8 gemm and quantize kernel by @RuibinCheung in #154
opt: grouped gemm perf when len(group_lens)==1 by @xiaobochen-amd in #158
feat: add float4x2_e2m1 and float8_m8m0 data type by @xiaobochen-amd in #156
feat(mxfp8): add k padding in bwd by @RuibinCheung in #160
chore: Allow GEMM with k % 32 = 0 to participate in computation by @kyle-256 in #161
feat: jax backend support grouped_gemm by @kyle-256 in #157

Full Changelog: v0.1.1...v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Contributors

Uh oh!