Skip to content

Commit

Permalink
[MLIR][NVGPU] Fix the cga_cluster.mlir test (llvm#112191)
Browse files Browse the repository at this point in the history
This patch fixes the sm90 cluster test by:
* Fixing a typo in LowerGpuOpsToNVVMOps where one of the ClusterDim Op
   conversion pattern should actually be for the
   ClusterDimBlocks Op. This addresses the compilation error for this test.
* The grid-size should be (4,4,1) instead of (2,2,1). This passes the
   scf-if check against the threshold of 3 below and actually
   generates the required prints from the GPU.

Signed-off-by: Durgadoss R <[email protected]>
  • Loading branch information
durga4github authored Oct 14, 2024
1 parent ddb64e6 commit a8b5115
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 3 deletions.
5 changes: 3 additions & 2 deletions mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -373,8 +373,9 @@ void mlir::populateGpuToNVVMConversionPatterns(
NVVM::BlockInClusterIdYOp, NVVM::BlockInClusterIdZOp>>(
converter, IndexKind::Other, IntrType::Id);
patterns.add<gpu::index_lowering::OpLowering<
gpu::ClusterDimOp, NVVM::ClusterDimXOp, NVVM::ClusterDimYOp,
NVVM::ClusterDimZOp>>(converter, IndexKind::Other, IntrType::Dim);
gpu::ClusterDimBlocksOp, NVVM::ClusterDimBlocksXOp,
NVVM::ClusterDimBlocksYOp, NVVM::ClusterDimBlocksZOp>>(
converter, IndexKind::Other, IntrType::Dim);
patterns.add<gpu::index_lowering::OpLowering<
gpu::BlockIdOp, NVVM::BlockIdXOp, NVVM::BlockIdYOp, NVVM::BlockIdZOp>>(
converter, IndexKind::Grid, IntrType::Id);
Expand Down
2 changes: 1 addition & 1 deletion mlir/test/Integration/GPU/CUDA/sm90/cga_cluster.mlir
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ module attributes {gpu.container_module} {
return
}
gpu.module @gpumodule {
gpu.func @kernel_cluster() kernel attributes {gpu.known_block_size = array<i32: 1, 1, 1>, gpu.known_grid_size = array<i32: 2, 2, 1>} {
gpu.func @kernel_cluster() kernel attributes {gpu.known_block_size = array<i32: 1, 1, 1>, gpu.known_grid_size = array<i32: 4, 4, 1>} {
%cidX = gpu.cluster_id x
%cidY = gpu.cluster_id y
%cidZ = gpu.cluster_id z
Expand Down

0 comments on commit a8b5115

Please sign in to comment.