This is a version optimized by SCC team from Nanyang Technological University for the ISC17 Student Cluster Competition.
Thanks Shao Yiyang and Lu Shengliang for helping me solve bugs when optimizing and porting the code.
Make sure you have the following dependencies
- MAGMA (without OpenMP)
 - Intel compilers and MPI
 - CUDA (with Fortran thunking cuBLAS interface)
 - Nvidia MPS server
 
Go into folder src
$ module load CUDA OpenMPI
$ source /opt/intel/compilers_and_libraries/linux/bin/compilervars.sh intel64
$ cp ../fortran_thunking.o .
$ make comp=intel # make sure you have fortran_thunking.oOptimization macros
__CUDAenables CUDA based optimization__MAGMAenables MAGMA to solve diagonalization__CUBLASenables cuBLAS to solve ZGEMM__NONBLOCKING_FFTenables non-blocking fft_scatter__ZHEGVDenables MAGMA call to magmaf_zhegvd
- OpenMP seems to be problematic, please disable OpenMP.
 
Go into folder benchmark
$ sudo nvidia-smi -c 3
$ sudo nvidia-cuda-mps-control -d
$ mpirun -np 88 -ppn 44 -hosts compute0,compute1 bash run.sh