-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial GPU pipeline #202
base: main
Are you sure you want to change the base?
Initial GPU pipeline #202
Conversation
@@ -46,13 +52,23 @@ int main(int argc, char *argv[]) { | |||
#endif | |||
mlir::registerAllPasses(); | |||
mlir::gc::registerCPUPipeline(); | |||
mlir::gc::registerGPUPipeline(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mlir::gc::registerGPUPipeline(); | |
#ifdef GC_USE_GPU | |
mlir::gc::registerGPUPipeline(); | |
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the build option for enabling/disabling GPU. It could make sense if GPU is disabled.
@@ -18,6 +18,7 @@ using namespace mlir::cpuruntime; | |||
|
|||
namespace mlir::gc { | |||
void registerCPUPipeline(); | |||
void registerGPUPipeline(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void registerGPUPipeline(); | |
#ifdef GC_USE_GPU | |
void registerGPUPipeline(); | |
#endif |
@@ -29,6 +30,7 @@ extern "C" { | |||
|
|||
MLIR_CAPI_EXPORTED void mlirRegisterAllGCPassesAndPipelines() { | |||
registerCPUPipeline(); | |||
registerGPUPipeline(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
registerGPUPipeline(); | |
#ifdef GC_USE_GPU | |
registerGPUPipeline(); | |
#endif |
@@ -32,6 +37,7 @@ | |||
|
|||
namespace mlir::gc { | |||
void registerCPUPipeline(); | |||
void registerGPUPipeline(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void registerGPUPipeline(); | |
#ifdef GC_USE_GPU | |
void registerGPUPipeline(); | |
#endif |
PassPipelineRegistration<>("gc-gpu-pipeline", | ||
"The GPU pipeline for Graph Compiler", | ||
populateGPUPipeline); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | |
} | |
#endif |
@@ -145,10 +147,45 @@ void populateCPUPipeline(mlir::OpPassManager &pm) { | |||
populateLLVMPasses(pm); | |||
} | |||
|
|||
void populateGPUPipeline(mlir::OpPassManager &pm) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void populateGPUPipeline(mlir::OpPassManager &pm) { | |
#ifdef GC_USE_GPU | |
void populateGPUPipeline(mlir::OpPassManager &pm) { |
pm.addPass(createGpuGenAttachTarget()); | ||
GpuModuleToBinaryPassOptions gpuModuleToBinaryPassOptions; | ||
pm.addPass(createGpuModuleToBinaryPass(gpuModuleToBinaryPassOptions)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | |
} | |
#endif |
void registerCPUPipeline() { | ||
PassPipelineRegistration<>("gc-cpu-pipeline", | ||
"The CPU pipeline for Graph Compiler", | ||
populateCPUPipeline); | ||
} | ||
|
||
void registerGPUPipeline() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void registerGPUPipeline() { | |
#ifdef GC_USE_GPU | |
void registerGPUPipeline() { |
src/gc-opt/gc-opt.cpp
Outdated
// gpu.module op | ||
mlir::registerAllToLLVMIRTranslations(registry); | ||
mlir::gen::registerGenTargetInterfaceExternalModels(registry); | ||
mlir::registerGENDialectTranslation(registry); | ||
#ifdef GC_USE_GPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, it should be renamed to GC_USE_IMEX here and in all other places.
lib/gc/Dialect/LLVMIR/CMakeLists.txt
Outdated
@@ -0,0 +1,20 @@ | |||
add_mlir_dialect_library(MLIRGENDialect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the directory name be "GEN" instead of "LLVMIR"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the dialect is at the same level as llvmir, this follows the upstream structure.
} | ||
|
||
std::optional<SmallVector<char, 0>> | ||
GenSerializer::compileToBinary(const std::string &serializedSPV) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we skip this step and treat the spirv code as the final binary? This can free us from findTool("ocloc")
which depends on the environment, and we can pass the compilation issue to OCL runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be controlled by the targetOptions.getCompilationTarget()
. The binary generation here is for latency elimination on the first execution when the target arch is known (inference).
c1d1097
to
1287334
Compare
This reverts commit a31238e.
e876bb7
to
eefddb6
Compare
MLIRSupport | ||
MLIRGPUDialect | ||
MLIRTargetLLVM | ||
LLVMSPIRVCodeGen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weird, I couldn't reproduce the linking problem locally and I get a runtime problem with static llvm options reinitialization when I include LLVMSPIRVCodeGen
as a dependency.
GpuModuleToBinaryPassOptions gpuModuleToBinaryPassOptions; | ||
pm.addPass(createGpuModuleToBinaryPass(gpuModuleToBinaryPassOptions)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which tests did you use to verify the pipeline?
If I try to run the GPU pipeline on this simple matmul test the gpu-module-to-binary
pass fails with the following error:
error: LLVM Translation failed for operation: builtin.unrealized_conversion_cast
don't we need to add reconcile-unrealized-casts
somewhere?
UPD:
simply adding pm.addPass(createReconcileUnrealizedCastsPass());
before the gpu-to-bin pass didn't help :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing it on a simple vector add for now. I'll add it once we can execute it. Yes, we will need the reconcile pass for the pipeline to be complete. I'd like to get to an end-to-end working scenario first though.
@kurapov-peter What should I change in my mirror branch to ensure GPU is actually accessed? |
Let's review this one before end of iteration 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's have the initial version of the GPU pipeline
This adds a GPU pipeline and the necessary glue code for it to work with ocloc through the
gpu-module-to-binary
.The patch mainly adds the
gen
dialect to hold the target attribute for binary generation (has nothing to do with Triton's gen dialect, although sits at the same level, alongside LLVM) as well as some target-specific parameters.The lowering uses
gpu-to-llvm-spv
to generate OpenCL calls from GPU operations. The pass expects a SPIR-V target attached with some required settings. It is thus temporarily attached to make the pass happy. Down the pipeline, it is replaced with thegen.target
(otherwise the logic of binary generation would produce a spirv as a result).GPUOpsLowering.h
is temporarily copy-pasted until kernel signature conversion is available upstream. The upper part of the pipeline is "dumb" - generalizes and converts the input linalg into parallel loops to then map to GPU.The current implementation serializes the GPU module to a binary SPIR-V via LLVM's SPIR-V backend and uses
ocloc
to convert that into a GEN binary that is wrapped into agpu.obj
. Ocloc is searched for in thePATH
. Although there some code for its discovery in the base toolkit, the current location of the binary is inside vtune there, so it is unclear whether this should be implemented at all.As is, this does not work with the ocl wrappers (#191). There is no
gpux
, so the path expects wrappers to follow naming conventions.