Major refactoring.

- Renaming files to make naming scheme more consistent; - Moving environtment variable definitons to toplevel; - Move GBS science to workflow category; - Use environment variables to define pathes referenced in the documentations.
AgResearch · Jun 19, 2018 · 2e42283 · 2e42283
1 parent ce9f45c
commit 2e42283
Show file tree

Hide file tree

Showing 30 changed files with 226 additions and 144 deletions.
diff --git a/README-science-data.md → Benchmark-Input-Data-Packaging.md b/README-science-data.md → Benchmark-Input-Data-Packaging.md
@@ -1,13 +1,10 @@
-# Science Data Tarballs
-
-The science data for the benchmarks lives in the directory pointed to by
-$SCIENCE_DATA_ROOTDIR, which is set in science/science-benchmarks.env
+# Packaging Benchmarks' Input Data
 
 This directory at AgResearch contains many symlinks.  When a tarball is made,
 it should be created like this:
 
 ```
-$ cd $SCIENCE_DATA_ROOTDIR
+$ cd $INPUT_DATA_ROOT_DIR
 $ tar czhf ../data.tgz *
 ```
 

diff --git a/README.md b/README.md
@@ -10,4 +10,22 @@ This repository includes benchmarks that will be used to ensure the fit for purp
 
 Each individual benchmark has its own README file which describes the purpose of the benchmark, how to run the benchmark and how to verify its output(s).
 
-If a benchmark needs to be built from the source, it should be build and execute in a [Conda](https://conda.io) environment created by using the Conda environment specified in the benchmark's documentation.  This approach ensures a stable, although not necessary optimal, building and executing environment.  If the target platform does not have Conda installed, follow instruction [here](https://conda.io/miniconda.html) to install it on the platform.
+This benchmark suite uses binary distributions in the [Conda](https://conda.io) repositories to deploy benchmark programs.  In such a case, there shall be a Conda environment specification file included in the benchmark's subdirectory.  Please follow its README file to deploy the benchmark program.  Some benchmark program will required to be built from the source.  Please use the Conda environment specification file included in the benchmark to crate a Conda environment for building and running such a benchmark program.  This approach ensures a stable, although not necessary optimal, building and executing environment for benchmarking.  If the target platform does not have Conda installed, follow instruction [here](https://conda.io/miniconda.html) to install it on the platform.
+
+## Environment Variables
+
+Please update environment *BENCHMARK_ROOT* variable in file ```benchmark.env``` included in this repository based on target platform's local environment.  This file must be sourced before deploying and running this benchmark suite.  
+
+```
+$ source benchmark.env
+```
+
+## Getting and Preparing Input Data
+
+All input data required to execute this benchmark suite can be downloaded from [here](https://url/to/be/confirmed).  Please download it and save it in the same root directory as the benchmark suite and then use the following command to extract data from the tarball:
+
+```
+$ cd $BENCHMARK_ROOT
+$ wget https://url/to/be/confirmed
+$ tar xzf benchmark_input_data.taz
+```
diff --git a/benchmark.env b/benchmark.env
@@ -0,0 +1,55 @@
+# Update this according to local directory structure
+# and source it before running any benchmarks
+
+# root directory to the benchmark source
+BENCHMARK_SOURCE=$PWD
+export BENCHMARK_SOURCE
+
+# root directory for benchmarks
+BENCHMARK_ROOT=/tmp/benchmarks
+export BENCHMARK_ROOT
+mkdir -p $BENCHMARK_ROOT
+
+# input data for all  benchmarks
+INPUT_DATA_ROOT_DIR=$BENCHMARK_ROOT/benchmark_input_data
+export INPUT_DATA_ROOT_DIR
+
+# output directories
+OUTPUT_DATA_ROOT_DIR=$BENCHMARK_ROOT/benchmark_output_data
+export OUTPUT_DATA_ROOT_DIR
+
+## Platform benchmarks
+# IOR benchmark
+
+IOR_CONDA_ENV=$BENCHMARK_ROOT/conda-env/ior
+IOR_CONDA_ENV_SPEC=$BENCHMARK_SOURCE/platform/IOR/ior-conda-env.yml
+export IOR_CONDA_ENV
+export IOR_CONDA_ENV_SPEC
+
+# IOZONE benchmark
+IOZONE_CONDA_ENV=$BENCHMARK_ROOT/conda-env/iozone
+IOZONE_CONDA_ENV_SPEC=$BENCHMARK_SOURCE/platform/IOZONE/iozone-conda-env.yml
+export IOZONE_CONDA_ENV
+export IOZONE_CONDA_ENV_SPEC
+
+## Science benchmarks
+# ABYSS benchmark
+ABYSS_CONDA_ENV=$BENCHMARK_ROOT/conda-env/abyss
+ABYSS_CONDA_ENV_SPEC=$BENCHMARK_SOURCE/science/abyss/abyss-conda-env.yml
+export ABYSS_CONDA_ENV
+export ABYSS_CONDA_ENV_SPEC
+NCORES=20
+export NCORES
+
+# VELVET benchmark
+VELVET_CONDA_ENV=$BENCHMARK_ROOT/conda-env/velvet
+VELVET_CONDA_ENV_SPEC=$BENCHMARK_SOURCE/science/velvet/velvet-conda-env.yml
+export VELVET_CONDA_ENV
+export VELVET_CONDA_ENV_SPEC
+
+## Workflow benchmarks
+# GBS
+GBS_CONDA_ENV=$BENCHMARK_ROOT/conda-env/gbs
+GBS_CONDA_ENV_SPEC=$BENCHMARK_SOURCE/workflow/gbs/gbs-conda-env.yml
+export GBS_CONDA_ENV
+export GBS_CONDA_ENV_SPEC
diff --git a/platform/IOR/IOR.md b/platform/IOR/IOR.md
@@ -11,27 +11,26 @@ It measures the sustainable bandwidth of a file system  using the various APIs,
 Download the latest release (v3.1.0) from GitHub and unpack the file.
 
 ```
-mkdir -p /tmp/benchmark/
-cd /tmp/benchmark/
-git clone [email protected]:AgResearch/ior.git
-
+$ cd $BENCHMARK_ROOT
+$ git clone [email protected]:AgResearch/ior.git
 ```
 
-Create a Conda environment based on the provided environment file and then activate the environment before building and running the benchmark.  For example:
+Create a Conda environment based on the provided environment specification file and then activate the environment before building and running the benchmark.  For example:
 
 ```
-conda-env conda-env create -p /tmp/benchmark/ior-env -f <path>/ior-conda-env.yml
-source activate /tmp/benchmark/ior-env
+$ mkdir -p $IOR_CONDA_ENV
+$ conda-env create -p $IOR_CONDA_ENV -f $IOR_CONDA_ENV_SPEC
+$ source activate $IOR_CONDA_ENV
 ```
 
 Use the following instructions to navigate into directory, ior, and to build it.
 
 ```
-cd ior
-./bootstrap
-./configure --prefix=$PWD
-make
-make install
+$ cd $BENCHMARK_ROOT/ior
+$ ./bootstrap
+$ ./configure --prefix=$CONDA_PREFIX
+$ make
+$ make install
 ```
 
 ## Execution
@@ -41,9 +40,8 @@ make install
 Run IOR to benchmark the performance of a single process writing to a file and then reading such a file sequentially. The following commands serve as an example, you may need to customise it for the benchmarking platform. 
 
 ```
-cd /tmp/benchmark/ior/bin
-source activate /tmp/benchmark/ior-env
-./ior -a POSIX -w -r -e -b <block_size> -o <path_to_target_filesystem>\ior_seq_test
+$ source activate $IOR_CONDA_ENV
+$ ior -a POSIX -w -r -e -b <block_size> -o <path_to_target_filesystem>\ior_seq_test
 ```
 
 Where ```<block_size>``` should be at least twice as large as the size of the compute node where the benchmark is executed and ``` <path_to_target_filesystem>``` is the path to the target filesystem that is been benchmarked.
@@ -53,12 +51,13 @@ Where ```<block_size>``` should be at least twice as large as the size of the co
 Run IOR tests concurrently to benchmark the performance of a filesystem on a compute node. The following is an example bash script for this test, although it may need to be customised for the benchmarking platform.
 
 ```
+source activate $IOR_CONDA_ENV
 echo "Preparing testing data..."
-./ior -a POSIX -w -e -k -b <block_size> -o <path_to_target_filesystem>/ior_rw_test > ./ior_concurent.out
+ior -a POSIX -w -e -k -b <block_size> -o <path_to_target_filesystem>/ior_rw_test > ./ior_concurent.out
 echo "Starging Concurrent Read..."
-./ior -a POSIX -r -b <block_size> -o <path_to_target_filesystem>/ior_rw_test > ./ior_concurent_r.out&
+ior -a POSIX -r -b <block_size> -o <path_to_target_filesystem>/ior_rw_test > ./ior_concurent_r.out&
 echo "Starting Concurrent Write..."
-./ior -a POSIX -w -e -b <block_size> -o <path_to_target_filesystem>/ior_rw_test2 > ./ior_concurent_w.out
+ior -a POSIX -w -e -b <block_size> -o <path_to_target_filesystem>/ior_rw_test2 > ./ior_concurent_w.out
 echo "Done!"
 ```
 
@@ -69,9 +68,8 @@ Where ```<block_size>``` should be at least twice as large as the size of the co
 Run IOR as a MPI program to benchmark the write and read performance of a platform's filesystem.  The following commands serve as an example, you may need to customise it for the benchmarking platform.
 
 ```
-cd /tmp/benchmark/ior/bin
-source activate /tmp/benchmark/ior-env
-mpirun -np <num_tasks> -N <num_tasks_per_node> ./ior -a MPIIO -w -r -N <num_tasks> -b <block_size> -o <path_to_target_filesystem>\ior_seq_test
+$ source activate $IOR_CONDA_ENV
+$ mpirun -np <num_tasks> -N <num_tasks_per_node> ior -a MPIIO -w -r -N <num_tasks> -b <block_size> -o <path_to_target_filesystem>\ior_seq_test
 ```
 
 Where ```<num_tasks>``` should be large to create sufficient load to test the aggregated bandwidth of the specified filesystem, ```<num_tasks_per_node>``` is number of tasks to run on a allocated node,  ```<block_size>``` times ```<num_tasks_per_node>``` should be twice as large as the size of the compute node where the benchmark is executed, and ```<path_to_target_filesystem>``` is the path to the target filesystem that is been benchmarked.
diff --git a/platform/IOZONE/IOZONE.md b/platform/IOZONE/IOZONE.md
@@ -11,17 +11,27 @@ This benchmark is used to measure the performance of the platform's filesystem.
 The benchmark (v3-471)can be downloaded from http://www.iozone.org/src/current/iozone3_471.tar
 
 ```
-wget http://www.iozone.org/src/current/iozone3_471.tar
+$ cd $BENCHMARK_ROOT
+$ wget http://www.iozone.org/src/current/iozone3_471.tar
 ```
 
-Once the file is downloaded, navigate to the directory where the downloaded file is store and use the following instructions to build it.  A C compiler and make is required to build it.
+Create a Conda environment based on the provided environment file and then activate the environment before building and running the benchmark.  For example:
 
 ```
-tar xf iozone3_471.tar
-cd iozone3_471/src
-make
+$ mkdir -p $IOZONE_CONDA_ENV
+$ conda-env create -p $IOZONE_CONDA_ENV -f $IOZONE_CONDA_ENV_SPEC
+$ source activate $IOZONE_CONDA_ENV
+```
+
+Navigate to the directory where the downloaded file is store and use the following instructions to build it in the created Conda environment.
+
+```
+$ tar xf iozone3_471.tar
+$ cd iozone3_471/src/current
+$ make
 # make will display a list of supported platforms.  Pick the one that matches the testing platform.
-make <target>
+$ make <target>
+$ cp ./iozone $IOZONE_CONDA_ENV/bin
 ```
 
 ## Execution
@@ -35,7 +45,8 @@ Use the following command to test the performance of a specified file system on
 The test produces output that cover all tested file operations for record size of 4k to 16M for file size of 64k to a specified file size, which should be twice the size of the memory of the node where the benchmark is run.  The output will also be stored in an Excel file called IOZone_results.xls
 
 ```
-./iozone -az -i 0 -i 1 -i 2 –c –e -b IOZone_results.xls \
+$ source activate $IOZONE_CONDA_ENV
+$ iozone -az -i 0 -i 1 -i 2 –c –e -b IOZone_results.xls \
          -f <file system> \
          -y 4 -q 16m \
          -g <max file size>

diff --git a/platform/IOZONE/iozone-conda-env.yml b/platform/IOZONE/iozone-conda-env.yml
@@ -0,0 +1,14 @@
+name: iozone-env
+channels:
+  - bioconda
+  - conda-forge
+  - defaults
+  - r
+dependencies:
+  - gmp=6.1.2=0
+  - mpc=1.1.0=4
+  - mpfr=3.1.5=0
+  - cloog=0.18.0=0
+  - gcc=4.8.5=7
+  - isl=0.12.2=0
+prefix: /tmp/iozone
diff --git a/platform/MDTEST/MDTEST.md b/platform/MDTEST/MDTEST.md
@@ -19,9 +19,8 @@ Benchmark the performance of a specified filesystem by creating 1,048,576 (1024x
 The following example will launch a test on a single compute node to create and remove required files and directories and then remove them.
 
 ```
-cd /tmp/benchmark/ior/bin
-source activate /tmp/benchmark/ior-env
-./mdtest -F -C -T -r -n 1048576 -d <path_to_target_filesystem>
+$ source activate $IOR_CONDA_ENV
+$ mdtest -F -C -T -r -n 1048576 -d <path_to_target_filesystem>
 ```
 
 Where, ``` <path_to_target_filesystem>``` is the path to the target filesystem that is been benchmarked.
@@ -31,9 +30,8 @@ Where, ``` <path_to_target_filesystem>``` is the path to the target filesystem t
 The following example will launch a test on group of nodes to create and remove required files and directories and then remove them.
 
 ```
-cd /tmp/benchmark/ior/bin
-source activate /tmp/benchmark/ior-env
-mpirun -np <num_tasks> -N <num_tasks_per_node> ./mdtest -F -C -T -r -n <1048576/<num_tasks>> -d <path_to_target_filesystem> -N <num_tasks_per_node>
+$ source activate $IOR_CONDA_ENV
+$ mpirun -np <num_tasks> -N <num_tasks_per_node> mdtest -F -C -T -r -n <1048576/<num_tasks>> -d <path_to_target_filesystem> -N <num_tasks_per_node>
 ```
 
 Where ```<num_tasks>``` should be sufficiently large to create sufficient load to stress metadata operations of the specified filesystem, ```<num_tasks_per_node>``` is number of tasks to run on a allocated node, and ```<path_to_target_filesystem>``` is the path to the target filesystem that is been benchmarked.
diff --git a/science/README.md b/science/README.md
diff --git a/science/abyss/abyss.md b/science/abyss/abyss.md
@@ -1,28 +1,27 @@
 # ABySS
 
-## Purpose
-
 ABySS is a de novo, parallel, paired-end sequence assembler.
 
+## Purpose
+TBD
+
 ## Installation
 
-Once the science datasets have been unpacked, and `../science-benchmarks.env`
-has been updated appropriately:
+Create a Conda environment based on the provided environment specification file and then activate the environment before building and running the benchmark. 
 
 ```
-$ conda env create -f abyss-conda-env.yml
+$ mkdir -p $ABYSS_CONDA_ENV
+$ conda-env create -p $ABYSS_CONDA_ENV -f $ABYSS_CONDA_ENV_SPEC
+$ source activate $ABYSS_CONDA_ENV
 ```
 
-### Sample data [optional]
-
-$SAMPLE_DATA_ROOT/VELVET/*.fastq.gz
-
 ## Execution
 
+Activate the Conda environment created for this benchmark then run the shell script, *run-abyss-benchmark*, to launch the benchmark.
+
 ```
 $ source activate abyss
-$ . ../science-benchmarks.env
-$ ./run-abyss-benchmark
+$ BENCHMARK_SOURCE/science/abyss/run-abyss-benchmark
 ```
 
 ### Output verification

diff --git a/science/abyss/run-abyss-benchmark b/science/abyss/run-abyss-benchmark
@@ -1,12 +1,12 @@
 #!/bin/sh
 # needs to run in conda environment abyss
 
-test -n "$SCIENCE_DATA_ROOTDIR" || {
-    echo >&2 "fatal error: missing environment variable SCIENCE_DATA_ROOTDIR - source the top-level environment file"
+test -n "$INPUT_DATA_ROOT_DIR" || {
+    echo >&2 "fatal error: missing environment variable INPUT_DATA_ROOT_DIR - source the top-level environment file"
     exit 1
 }
-test -n "$SCIENCE_OUTPUT_ROOTDIR" || {
-    echo >&2 "fatal error: missing environment variable SCIENCE_OUTPUT_ROOTDIR - source the top-level environment file"
+test -n "$OUTPUT_DATA_ROOT_DIR" || {
+    echo >&2 "fatal error: missing environment variable OUTPUT_DATA_ROOT_DIR - source the top-level environment file"
     exit 1
 }
 
@@ -15,8 +15,8 @@ test -n "$NCORES" -a "$NCORES" -ge 1 -a "$NCORES" -le 100 || {
     exit 1
 }
 
-datadir=$SCIENCE_DATA_ROOTDIR/abyss
-outdir=$SCIENCE_OUTPUT_ROOTDIR/abyss
+datadir=$INPUT_DATA_ROOT_DIR/abyss
+outdir=$OUTPUT_DATA_ROOT_DIR/abyss
 test -d "$datadir" || {
     echo >&2 "fatal error: missing data directory $datadir - unpack the data tarball"
     exit 1

diff --git a/science/science-benchmarks.env b/science/science-benchmarks.env
diff --git a/science/tassel3-kgd/README.md b/science/tassel3-kgd/README.md