- 
                Notifications
    You must be signed in to change notification settings 
- Fork 359
SetUp
- Download attachment:HowToOptimizeGemm.tar.gz (make sure the file is stored as OptimizeGemm.tar.gz)
- Uncompress by executing gunzip HowToOptimizeGemm.tar.gz
- Expand the tar file by executing tar HowToOptimizeGemm.tar
- Change into the directory that is created by executing cd HowToOptimizeGemm
In the directory HowToOptimizeGemm you will find the following files
that you will use to systematically optimize the matrix-matrix multiplication
operation:
- 
makefileThe makefile that describes how to compile, link, and execute the driver/implementations. When you type `make', this file is consulted and commands in it are executed. Note: there are "tab" characters in the makefile. These are important...
- 
Test driver- 
parameters.hFile that holds parameters that control what data is collected
- 
test_MMult.cDriver routine that tests and times the different implementations. This routine executes a reference implementation and the current optimization to be timed. Parameters for this routine are initialized inparameters.h. In particular, in that file it is indicated how many times to repeat each experiment (problem size) and how each of the three dimensionsm,n, andkare tied to the problem size being timed.
 
- 
- 
Matrix multiplication implementations- 
REF_MMult.cReference implementation used to check correctness
- 
MMult0.cVersion 0: simplest implementation
- 
MMult1.cOptimization 1
 
- 
- 
Utility routines- 
compare_matrices.cCompares the contents of two matrices and returns the maximum absolute difference
- 
copy_matrix.cCopies one matrix to another
- 
dclock.cReturns elapsed time in seconds
- 
random_matrix.cGenerates a random matrix
- 
print_matrix.cPrints the contents of a matrix
 
- 
- 
Plotting the results- 
PlotAll.mPlots graphs corresponding to the data in filesoutput_old.mandoutput_new.m
- 
proc_parameters.mFile in which parameters about the architecture are given
 
- 
These last routines allow one to use octave to plot the performance of two different implementations.
In this exercise, we use the gcc (Gnu C) compiler with optimization level -O2.  (See the makefile.)
This is neither the best compiler nor the best optimization level.  The reason is that with this compiler and
level of optimization, we have a certain level of control:
- 
Had we used the intel compiler, the "simple loops" in MMult0.c would probably have yielded quite good performance: This compiler is very good at optimizing a triple loop. Try it!!! 
- 
Had we used -O3 (optimization level 3), the gnu compiler would have more aggressively optimized, making the step-by-step optimizations we demonstrate much less predictable.