-
Notifications
You must be signed in to change notification settings - Fork 516
[common] Split cast/gated kernels by scaling mode #2248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[common] Split cast/gated kernels by scaling mode #2248
Conversation
Signed-off-by: Oleg Goncharov <[email protected]>
Signed-off-by: Oleg Goncharov <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Oleg Goncharov <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Oleg Goncharov <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Oleg Goncharov <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Oleg Goncharov <[email protected]>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request refactors the large cast_kernels.cuh
and cast_gated_kernels.cuh
files into smaller, more organized header files structured by scaling mode. This improves code maintainability, readability, and navigation by creating specialized headers for different quantization and scaling implementations.
- Breaks down monolithic headers into focused, scaling-mode-specific files
- Reorganizes code structure without modifying functionality or behavior
- Creates dispatcher files to coordinate between different scaling implementations
Reviewed Changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
transformer_engine/common/util/cast_kernels.cuh | Removed all content - entire file deleted as part of refactoring |
transformer_engine/common/cast/nvfp4/quantize_transpose_nvfp4.cuh | NVFP4 quantize with transpose functionality, updated file path and namespacing |
transformer_engine/common/cast/nvfp4/quantize_nvfp4.cuh | New file containing NVFP4-specific quantization kernels |
transformer_engine/common/cast/nvfp4/dequantize_nvfp4.cuh | New file containing NVFP4 dequantization functionality |
transformer_engine/common/cast/nvfp4/core_nvfp4.cuh | New file with core NVFP4 utility functions and device operations |
transformer_engine/common/cast/mxfp8/quantize_mxfp8.cuh | New file containing MXFP8 quantization kernels |
transformer_engine/common/cast/mxfp8/gated_mxfp8.cuh | MXFP8 gated operations, significantly reduced from original gated kernels file |
transformer_engine/common/cast/mxfp8/dequantize_mxfp8.cuh | New file containing MXFP8 dequantization functionality |
transformer_engine/common/cast/fp8/quantize_fp8.cuh | New file containing FP8 quantization kernels |
transformer_engine/common/cast/fp8/gated_fp8.cuh | New file containing FP8 gated operations |
transformer_engine/common/cast/fp8/dequantize_fp8.cuh | New file containing FP8 dequantization functionality |
transformer_engine/common/cast/dispatch/quantize.cuh | New dispatcher file coordinating quantization across scaling modes |
transformer_engine/common/cast/dispatch/gated.cuh | New dispatcher file coordinating gated operations across scaling modes |
transformer_engine/common/cast/dispatch/dequantize.cuh | New dispatcher file coordinating dequantization across scaling modes |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
transformer_engine/common/cast/nvfp4/quantize_transpose_nvfp4.cuh
Outdated
Show resolved
Hide resolved
…s from the NVFP4 transpose test suite Signed-off-by: Oleg Goncharov <[email protected]>
/te-ci |
Description
Breaks up the large
cast_kernels.cuh
andcast_gated_kernels.cuh
into smaller headers organized by scaling mode.No functional or behavior changes: code is moved, not modified. This improves structure, readability, and maintainability (easier to navigate/extend specific scaling paths). Build includes/exports updated accordingly; tests unaffected.
Fixes # (issue)
Type of change
Changes
cast_kernels.cuh
andcast_gated_kernels.cuh
into smaller headers organized by scaling mode.Checklist: