Skip to content

ORNL/MatRIS

Repository files navigation

MatRIS - A Math library using IRIS

MatRIS is a true heterogeous BLAS+LAPACK library packaged with compute unit specific optimized BLAS libraries provided by different vendors. BLAS+LAPACK has been a crucial and core computation component in high performance computing (HPC) and machine learning applications. Chip vendors and researchers have provided hand optimized BLAS kernels for specific compute units. Though there are many implementations of BLAS kernels, application developers hard code the compute unit specific BLAS function calls in application and doesn’t leave much choice of using the underlying heterogeneous compute unit on the fly. Some state of the art work abstracts the BLAS library APIs with wrappers for portability to select the BLAS library at compile time but lacks run-time selection of the library function for different compute units. In this paper, we provide an MatRIS BLAS APIs written with an extended IRIS framework to support vendor specific library function calls. Our MatRIS BLAS functions will call the appropriate compute unit specific BLAS library function based on the run-time task mapping of heterogeneous cores. We demonstrate the proof of concept with an example DGEMM/SGEMM BLAS library function with OpenBLAS, CuBLAS, CLBlast, XilinxBLAS and HIPBlas implementations, with a runtime selection of compute unit. We show the different performance metrics collected by running the GEMM on heterogeneous compute units

MatRIS - Directory Structure

MatRIS

|--- src
  |--- algorithms
    |--- matris_algorithm.h
    |--- blas
      |--- matris_algo_gemm.c
      |--- matris_algo_<method>_gemm.c. If multiple methods exists.
      |--- node
       |--- matris_node_gemm.c
    |--- lapack
      |--- matris_algo_getrf_no_pivot.c
      |--- node
       |--- matris_node_getrf.c
    |--- utility
      |--- matris_algo_util_tile2flat.c
      |--- matris_algo_util_flat2tile.c

  |--- kernel
    |--- matris_kernel.h
    |--- blas
      |--- matris_kernel_blas.h
      |--- cuBLAS
        |--- L3
          |--- matris_cuda_gemm_kernel.c
      |--- L3
        |--- matris_gemm_kernel.h
    |--- lapack
      |--- matris_kernel_lapack.h
      |--- cuBLAS
        |--- L3
          |--- matris_cuda_getrf_kernel.c
      |--- L3
        |--- matris_blas_getrf.h
  |--- utils
    |--- deffe
    |--- scripts
  |--- tests
    |--- matris_test_getrf.c
    |--- matris_test_gemm.c

Heterogenous Matris BLAS APIs with host address pointers

void init_matris_blas(int argc, char **argv);
void finalize_matris_blas();
void set_matris_blas_target(int target);
int get_matris_blas_target();
void matris_core_sgemm(...);
void matris_core_dgemm(...);

Setup toolchains

Setup Nvidia toolchain

Set up sources for nvidia toolchain

export NVIDIA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/21.7
export MATH_LIBS=${NVIDIA_PATH}/math_libs
export MATH_LIBS=${NVIDIA_PATH}/math_libs/11.4/targets/x86_64-linux/
export PATH=$NVIDIA_PATH/cuda/bin/:$PATH
export LD_LIBRARY_PATH=$NVIDIA_PATH/cuda/lib64/:$MATH_LIBS/lib:$LD_LIBRARY_PATH

Setup AMD toolchain

Set up sources for AMD toolchain

export ROCMLOC=/opt/rocm-4.1.0
export ROCMLOC=/opt/rocm
export HIP_CLANG_PATH=$ROCMLOC/llvm/bin/
export ROCM_PATH=$ROCMLOC
export HIPCC="$ROCMLOC/bin/hipcc"
export HIP_CFLAGS="-std=c++11 -I$ROCMLOC/hip/include -I$ROCMLOC/hsa/include"
export HIP_LDFLAGS="-L$ROCMLOC/hip/lib -lamdhip64 -L$ROCMLOC/hsa/lib -lhsa-runtime64 -L$ROCMLOC/lib64 -L$ROCMLOC/lib -lamd_comgr -lhsakmt -Wl,-rpath=/$ROCMLOC/lib,--enable-new-dtags"
export PATH=$ROCMLOC/hip/bin:$ROCMLOC/bin:$ROCMLOC/opencl/bin:$PATH
export LD_LIBRARY_PATH=$ROCMLOC/lib:$ROCMLOC/lib64:$LD_LIBRARY_PATH

Build BLAS libraries

The below steps lead you to install vendor specific BLAS Libraries and set the environment variables

CLBlast

git clone https://github.com/CNugteren/CLBlast.git 
cd CLBlast
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=$PWD/../install
make -j install
cd ../install
export CLBLAST=$PWD
export LD_LIBRARY_PATH=$CLBLAST/lib:$LD_LIBRARY_PATH

OpenBLAS

git clone https://github.com/xianyi/OpenBLAS.git
cd OpenBLAS 
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=$PWD/../install -DBUILD_SHARED_LIBS=ON
make -j install
cd ../install
export OPENBLAS=$PWD
export LD_LIBRARY_PATH=$OPENBLAS/lib:$LD_LIBRARY_PATH

hipBLAS

git clone https://github.com/ROCmSoftwarePlatform/hipBLAS.git
cd hipBLAS
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=$PWD/../install -DBUILD_SHARED_LIBS=ON
make -j install
cd ../install
export HIPBLAS=$PWD
export LD_LIBRARY_PATH=$HIPBLAS/lib:$LD_LIBRARY_PATH

Build IRIS framework

git clone https://github.com/ORNL/iris.git
cd iris
mkdir build
cd build
cmake ../ -DCMAKE_INSTALL_PREFIX=$PWD/../install -DAUTO_FLUSH=ON -DAUTO_PARALLEL=ON
make -j install
cd ..
source install/setup.source

Build MatRIS BLAS Library

To enable CUDA compute, use '-DUSE_CUDA=ON'. To enable AMD compute, use '-DUSE_HIP=ON'.

mkdir build
cd build
cmake ../ -DCMAKE_INSTALL_PREFIX=$PWD/../install -DUSE_CUDA=ON 
make -j install
cd ..
source install/setup.source

Run MatRIS BLAS test cases

Run SGEMM with input matrix size 8

cd tests
make 
export IRIS_ARCHS=openmp;  ./sgemm.x 8 0
export IRIS_ARCHS=cuda;    ./sgemm.x 8 1
export IRIS_ARCHS=hip;     ./sgemm.x 8 1
export IRIS_ARCHS=opencl;  ./sgemm.x 8 1

Run DGEMM with input matrix size 8

cd tests
make 
export IRIS_ARCHS=openmp;  ./dgemm.x 8 0
export IRIS_ARCHS=cuda;    ./dgemm.x 8 1
export IRIS_ARCHS=hip;     ./dgemm.x 8 1
export IRIS_ARCHS=opencl;  ./dgemm.x 8 1

Run Multi-task DGEMM example with input matrix size 8, target cpu or gpu, number of tile 2

cd install/bin/
export IRIS_ARCHS=openmp;  ./dbig_gemm.x 8 0 2 // Matrix Size 8, target cpu, number of tiles 2 which enables 2x2 decomposition where each tile is 4x4
export IRIS_ARCHS=cuda;    ./dbig_gemm.x 8 1 2
export IRIS_ARCHS=hip;     ./dbig_gemm.x 8 1 2
export IRIS_ARCHS=opencl;  ./dbig_gemm.x 8 1 2

Cite this

Monil MAH, Miniskar NR, Teranishi K, Vetter JS, Valero-Lara P. MatRIS: Multi-level Math Library Abstraction for Heterogeneity and Performance Portability using IRIS Runtime. InProceedings of the SC'23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis 2023 Nov 12 (pp. 1081-1092).

Contributors

  1. Narasinga Rao Miniskar
  2. Monil Mohammad Alaul Haque
  3. Pedro Valero-Lara

About

MatRIS: A heterogenous Math library using IRIS run-time

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5