Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial GPU pipeline #202

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Conversation

kurapov-peter
Copy link
Contributor

This adds a GPU pipeline and the necessary glue code for it to work with ocloc through the gpu-module-to-binary.

The patch mainly adds the gen dialect to hold the target attribute for binary generation (has nothing to do with Triton's gen dialect, although sits at the same level, alongside LLVM) as well as some target-specific parameters.

The lowering uses gpu-to-llvm-spv to generate OpenCL calls from GPU operations. The pass expects a SPIR-V target attached with some required settings. It is thus temporarily attached to make the pass happy. Down the pipeline, it is replaced with the gen.target (otherwise the logic of binary generation would produce a spirv as a result). GPUOpsLowering.h is temporarily copy-pasted until kernel signature conversion is available upstream. The upper part of the pipeline is "dumb" - generalizes and converts the input linalg into parallel loops to then map to GPU.

The current implementation serializes the GPU module to a binary SPIR-V via LLVM's SPIR-V backend and uses ocloc to convert that into a GEN binary that is wrapped into a gpu.obj. Ocloc is searched for in the PATH. Although there some code for its discovery in the base toolkit, the current location of the binary is inside vtune there, so it is unclear whether this should be implemented at all.

As is, this does not work with the ocl wrappers (#191). There is no gpux, so the path expects wrappers to follow naming conventions.

@kurapov-peter kurapov-peter linked an issue Jul 30, 2024 that may be closed by this pull request
@@ -46,13 +52,23 @@ int main(int argc, char *argv[]) {
#endif
mlir::registerAllPasses();
mlir::gc::registerCPUPipeline();
mlir::gc::registerGPUPipeline();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mlir::gc::registerGPUPipeline();
#ifdef GC_USE_GPU
mlir::gc::registerGPUPipeline();
#endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the build option for enabling/disabling GPU. It could make sense if GPU is disabled.

@@ -18,6 +18,7 @@ using namespace mlir::cpuruntime;

namespace mlir::gc {
void registerCPUPipeline();
void registerGPUPipeline();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void registerGPUPipeline();
#ifdef GC_USE_GPU
void registerGPUPipeline();
#endif

@@ -29,6 +30,7 @@ extern "C" {

MLIR_CAPI_EXPORTED void mlirRegisterAllGCPassesAndPipelines() {
registerCPUPipeline();
registerGPUPipeline();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
registerGPUPipeline();
#ifdef GC_USE_GPU
registerGPUPipeline();
#endif

@@ -32,6 +37,7 @@

namespace mlir::gc {
void registerCPUPipeline();
void registerGPUPipeline();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void registerGPUPipeline();
#ifdef GC_USE_GPU
void registerGPUPipeline();
#endif

PassPipelineRegistration<>("gc-gpu-pipeline",
"The GPU pipeline for Graph Compiler",
populateGPUPipeline);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
}
#endif

@@ -145,10 +147,45 @@ void populateCPUPipeline(mlir::OpPassManager &pm) {
populateLLVMPasses(pm);
}

void populateGPUPipeline(mlir::OpPassManager &pm) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void populateGPUPipeline(mlir::OpPassManager &pm) {
#ifdef GC_USE_GPU
void populateGPUPipeline(mlir::OpPassManager &pm) {

pm.addPass(createGpuGenAttachTarget());
GpuModuleToBinaryPassOptions gpuModuleToBinaryPassOptions;
pm.addPass(createGpuModuleToBinaryPass(gpuModuleToBinaryPassOptions));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
}
#endif

void registerCPUPipeline() {
PassPipelineRegistration<>("gc-cpu-pipeline",
"The CPU pipeline for Graph Compiler",
populateCPUPipeline);
}

void registerGPUPipeline() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void registerGPUPipeline() {
#ifdef GC_USE_GPU
void registerGPUPipeline() {

// gpu.module op
mlir::registerAllToLLVMIRTranslations(registry);
mlir::gen::registerGenTargetInterfaceExternalModels(registry);
mlir::registerGENDialectTranslation(registry);
#ifdef GC_USE_GPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, it should be renamed to GC_USE_IMEX here and in all other places.

@@ -0,0 +1,20 @@
add_mlir_dialect_library(MLIRGENDialect

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the directory name be "GEN" instead of "LLVMIR"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the dialect is at the same level as llvmir, this follows the upstream structure.

}

std::optional<SmallVector<char, 0>>
GenSerializer::compileToBinary(const std::string &serializedSPV) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we skip this step and treat the spirv code as the final binary? This can free us from findTool("ocloc") which depends on the environment, and we can pass the compilation issue to OCL runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be controlled by the targetOptions.getCompilationTarget(). The binary generation here is for latency elimination on the first execution when the target arch is known (inference).

@kurapov-peter kurapov-peter force-pushed the pakurapo/gpu-module-legalize branch from e876bb7 to eefddb6 Compare August 5, 2024 09:51
MLIRSupport
MLIRGPUDialect
MLIRTargetLLVM
LLVMSPIRVCodeGen
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird, I couldn't reproduce the linking problem locally and I get a runtime problem with static llvm options reinitialization when I include LLVMSPIRVCodeGen as a dependency.

Comment on lines +179 to +180
GpuModuleToBinaryPassOptions gpuModuleToBinaryPassOptions;
pm.addPass(createGpuModuleToBinaryPass(gpuModuleToBinaryPassOptions));
Copy link
Contributor

@dchigarev dchigarev Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tests did you use to verify the pipeline?

If I try to run the GPU pipeline on this simple matmul test the gpu-module-to-binary pass fails with the following error:

error: LLVM Translation failed for operation: builtin.unrealized_conversion_cast

don't we need to add reconcile-unrealized-casts somewhere?

UPD:
simply adding pm.addPass(createReconcileUnrealizedCastsPass()); before the gpu-to-bin pass didn't help :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing it on a simple vector add for now. I'll add it once we can execute it. Yes, we will need the reconcile pass for the pipeline to be complete. I'd like to get to an end-to-end working scenario first though.

@leshikus
Copy link
Contributor

leshikus commented Aug 12, 2024

@kurapov-peter What should I change in my mirror branch to ensure GPU is actually accessed?
https://github.com/intel/graph-compiler/actions/runs/10353628700

@lmontigny
Copy link

Let's review this one before end of iteration 4

Copy link

@lmontigny lmontigny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's have the initial version of the GPU pipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Construct initial GPU pipeline
6 participants