-
Notifications
You must be signed in to change notification settings - Fork 570
Fix the sm120 compilation with CUDA 12 #2482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Przemek Tredak <[email protected]>
Greptile OverviewGreptile SummaryThis PR fixes a compilation error that occurred when building for SM120 architecture with CUDA 12. The issue was introduced in PR #2062 where the
Confidence Score: 5/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Compiler as CUDA Compiler
participant PTX as ptx.cuh
participant Func as reduce_sync_max_abs_f32()
Compiler->>PTX: Compile for target arch
PTX->>PTX: NVTE_CUDA_ARCH_MATCHES(FamilySpecific<100>)
alt SM100 Family (100, 101, 103, etc.)
PTX->>Func: is_sm_100f = true
Func->>Func: Use redux.sync.max.abs.f32
else SM110 or SM120 Family
PTX->>Func: is_sm_100f = false
Func->>Func: Use fallback (abs.f32 + redux.sync.max.u32)
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
timmoon10
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
/te-ci |
|
Will look into the failure on B200. |
Description
PR #2062 used the redux.sync.f32 instruction with arch 120a compilation incorrectly (since this instruction is only available on sm100f). This is the reason for our pyTorch Build GH action failures.
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: