Question about k_grouped_gemm #243

Open

opened

on Dec 25, 2025

Why, when computing wgrad in k_grouped, do dy and dx use per_channel_cast_to_fp8 instead of per_token_cast_to_fp8?

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests