Skip to content

Commit

Permalink
Major refactoring.
Browse files Browse the repository at this point in the history
- Renaming files to make naming scheme more consistent;
- Moving environtment variable definitons to toplevel;
- Move GBS science to workflow category;
- Use environment variables to define pathes referenced in the
   documentations.
  • Loading branch information
Dan Sun authored and Dan Sun committed Jun 19, 2018
1 parent ce9f45c commit 2e42283
Show file tree
Hide file tree
Showing 30 changed files with 226 additions and 144 deletions.
7 changes: 2 additions & 5 deletions README-science-data.md → Benchmark-Input-Data-Packaging.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,10 @@
# Science Data Tarballs

The science data for the benchmarks lives in the directory pointed to by
$SCIENCE_DATA_ROOTDIR, which is set in science/science-benchmarks.env
# Packaging Benchmarks' Input Data

This directory at AgResearch contains many symlinks. When a tarball is made,
it should be created like this:

```
$ cd $SCIENCE_DATA_ROOTDIR
$ cd $INPUT_DATA_ROOT_DIR
$ tar czhf ../data.tgz *
```

Expand Down
20 changes: 19 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,22 @@ This repository includes benchmarks that will be used to ensure the fit for purp

Each individual benchmark has its own README file which describes the purpose of the benchmark, how to run the benchmark and how to verify its output(s).

If a benchmark needs to be built from the source, it should be build and execute in a [Conda](https://conda.io) environment created by using the Conda environment specified in the benchmark's documentation. This approach ensures a stable, although not necessary optimal, building and executing environment. If the target platform does not have Conda installed, follow instruction [here](https://conda.io/miniconda.html) to install it on the platform.
This benchmark suite uses binary distributions in the [Conda](https://conda.io) repositories to deploy benchmark programs. In such a case, there shall be a Conda environment specification file included in the benchmark's subdirectory. Please follow its README file to deploy the benchmark program. Some benchmark program will required to be built from the source. Please use the Conda environment specification file included in the benchmark to crate a Conda environment for building and running such a benchmark program. This approach ensures a stable, although not necessary optimal, building and executing environment for benchmarking. If the target platform does not have Conda installed, follow instruction [here](https://conda.io/miniconda.html) to install it on the platform.

## Environment Variables

Please update environment *BENCHMARK_ROOT* variable in file ```benchmark.env``` included in this repository based on target platform's local environment. This file must be sourced before deploying and running this benchmark suite.

```
$ source benchmark.env
```

## Getting and Preparing Input Data

All input data required to execute this benchmark suite can be downloaded from [here](https://url/to/be/confirmed). Please download it and save it in the same root directory as the benchmark suite and then use the following command to extract data from the tarball:

```
$ cd $BENCHMARK_ROOT
$ wget https://url/to/be/confirmed
$ tar xzf benchmark_input_data.taz
```
55 changes: 55 additions & 0 deletions benchmark.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Update this according to local directory structure
# and source it before running any benchmarks

# root directory to the benchmark source
BENCHMARK_SOURCE=$PWD
export BENCHMARK_SOURCE

# root directory for benchmarks
BENCHMARK_ROOT=/tmp/benchmarks
export BENCHMARK_ROOT
mkdir -p $BENCHMARK_ROOT

# input data for all benchmarks
INPUT_DATA_ROOT_DIR=$BENCHMARK_ROOT/benchmark_input_data
export INPUT_DATA_ROOT_DIR

# output directories
OUTPUT_DATA_ROOT_DIR=$BENCHMARK_ROOT/benchmark_output_data
export OUTPUT_DATA_ROOT_DIR

## Platform benchmarks
# IOR benchmark

IOR_CONDA_ENV=$BENCHMARK_ROOT/conda-env/ior
IOR_CONDA_ENV_SPEC=$BENCHMARK_SOURCE/platform/IOR/ior-conda-env.yml
export IOR_CONDA_ENV
export IOR_CONDA_ENV_SPEC

# IOZONE benchmark
IOZONE_CONDA_ENV=$BENCHMARK_ROOT/conda-env/iozone
IOZONE_CONDA_ENV_SPEC=$BENCHMARK_SOURCE/platform/IOZONE/iozone-conda-env.yml
export IOZONE_CONDA_ENV
export IOZONE_CONDA_ENV_SPEC

## Science benchmarks
# ABYSS benchmark
ABYSS_CONDA_ENV=$BENCHMARK_ROOT/conda-env/abyss
ABYSS_CONDA_ENV_SPEC=$BENCHMARK_SOURCE/science/abyss/abyss-conda-env.yml
export ABYSS_CONDA_ENV
export ABYSS_CONDA_ENV_SPEC
NCORES=20
export NCORES

# VELVET benchmark
VELVET_CONDA_ENV=$BENCHMARK_ROOT/conda-env/velvet
VELVET_CONDA_ENV_SPEC=$BENCHMARK_SOURCE/science/velvet/velvet-conda-env.yml
export VELVET_CONDA_ENV
export VELVET_CONDA_ENV_SPEC

## Workflow benchmarks
# GBS
GBS_CONDA_ENV=$BENCHMARK_ROOT/conda-env/gbs
GBS_CONDA_ENV_SPEC=$BENCHMARK_SOURCE/workflow/gbs/gbs-conda-env.yml
export GBS_CONDA_ENV
export GBS_CONDA_ENV_SPEC
40 changes: 19 additions & 21 deletions platform/IOR/IOR.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,27 +11,26 @@ It measures the sustainable bandwidth of a file system using the various APIs,
Download the latest release (v3.1.0) from GitHub and unpack the file.

```
mkdir -p /tmp/benchmark/
cd /tmp/benchmark/
git clone [email protected]:AgResearch/ior.git
$ cd $BENCHMARK_ROOT
$ git clone [email protected]:AgResearch/ior.git
```

Create a Conda environment based on the provided environment file and then activate the environment before building and running the benchmark. For example:
Create a Conda environment based on the provided environment specification file and then activate the environment before building and running the benchmark. For example:

```
conda-env conda-env create -p /tmp/benchmark/ior-env -f <path>/ior-conda-env.yml
source activate /tmp/benchmark/ior-env
$ mkdir -p $IOR_CONDA_ENV
$ conda-env create -p $IOR_CONDA_ENV -f $IOR_CONDA_ENV_SPEC
$ source activate $IOR_CONDA_ENV
```

Use the following instructions to navigate into directory, ior, and to build it.

```
cd ior
./bootstrap
./configure --prefix=$PWD
make
make install
$ cd $BENCHMARK_ROOT/ior
$ ./bootstrap
$ ./configure --prefix=$CONDA_PREFIX
$ make
$ make install
```

## Execution
Expand All @@ -41,9 +40,8 @@ make install
Run IOR to benchmark the performance of a single process writing to a file and then reading such a file sequentially. The following commands serve as an example, you may need to customise it for the benchmarking platform.

```
cd /tmp/benchmark/ior/bin
source activate /tmp/benchmark/ior-env
./ior -a POSIX -w -r -e -b <block_size> -o <path_to_target_filesystem>\ior_seq_test
$ source activate $IOR_CONDA_ENV
$ ior -a POSIX -w -r -e -b <block_size> -o <path_to_target_filesystem>\ior_seq_test
```

Where ```<block_size>``` should be at least twice as large as the size of the compute node where the benchmark is executed and ``` <path_to_target_filesystem>``` is the path to the target filesystem that is been benchmarked.
Expand All @@ -53,12 +51,13 @@ Where ```<block_size>``` should be at least twice as large as the size of the co
Run IOR tests concurrently to benchmark the performance of a filesystem on a compute node. The following is an example bash script for this test, although it may need to be customised for the benchmarking platform.

```
source activate $IOR_CONDA_ENV
echo "Preparing testing data..."
./ior -a POSIX -w -e -k -b <block_size> -o <path_to_target_filesystem>/ior_rw_test > ./ior_concurent.out
ior -a POSIX -w -e -k -b <block_size> -o <path_to_target_filesystem>/ior_rw_test > ./ior_concurent.out
echo "Starging Concurrent Read..."
./ior -a POSIX -r -b <block_size> -o <path_to_target_filesystem>/ior_rw_test > ./ior_concurent_r.out&
ior -a POSIX -r -b <block_size> -o <path_to_target_filesystem>/ior_rw_test > ./ior_concurent_r.out&
echo "Starting Concurrent Write..."
./ior -a POSIX -w -e -b <block_size> -o <path_to_target_filesystem>/ior_rw_test2 > ./ior_concurent_w.out
ior -a POSIX -w -e -b <block_size> -o <path_to_target_filesystem>/ior_rw_test2 > ./ior_concurent_w.out
echo "Done!"
```

Expand All @@ -69,9 +68,8 @@ Where ```<block_size>``` should be at least twice as large as the size of the co
Run IOR as a MPI program to benchmark the write and read performance of a platform's filesystem. The following commands serve as an example, you may need to customise it for the benchmarking platform.

```
cd /tmp/benchmark/ior/bin
source activate /tmp/benchmark/ior-env
mpirun -np <num_tasks> -N <num_tasks_per_node> ./ior -a MPIIO -w -r -N <num_tasks> -b <block_size> -o <path_to_target_filesystem>\ior_seq_test
$ source activate $IOR_CONDA_ENV
$ mpirun -np <num_tasks> -N <num_tasks_per_node> ior -a MPIIO -w -r -N <num_tasks> -b <block_size> -o <path_to_target_filesystem>\ior_seq_test
```

Where ```<num_tasks>``` should be large to create sufficient load to test the aggregated bandwidth of the specified filesystem, ```<num_tasks_per_node>``` is number of tasks to run on a allocated node, ```<block_size>``` times ```<num_tasks_per_node>``` should be twice as large as the size of the compute node where the benchmark is executed, and ```<path_to_target_filesystem>``` is the path to the target filesystem that is been benchmarked.
25 changes: 18 additions & 7 deletions platform/IOZONE/IOZONE.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,27 @@ This benchmark is used to measure the performance of the platform's filesystem.
The benchmark (v3-471)can be downloaded from http://www.iozone.org/src/current/iozone3_471.tar

```
wget http://www.iozone.org/src/current/iozone3_471.tar
$ cd $BENCHMARK_ROOT
$ wget http://www.iozone.org/src/current/iozone3_471.tar
```

Once the file is downloaded, navigate to the directory where the downloaded file is store and use the following instructions to build it. A C compiler and make is required to build it.
Create a Conda environment based on the provided environment file and then activate the environment before building and running the benchmark. For example:

```
tar xf iozone3_471.tar
cd iozone3_471/src
make
$ mkdir -p $IOZONE_CONDA_ENV
$ conda-env create -p $IOZONE_CONDA_ENV -f $IOZONE_CONDA_ENV_SPEC
$ source activate $IOZONE_CONDA_ENV
```

Navigate to the directory where the downloaded file is store and use the following instructions to build it in the created Conda environment.

```
$ tar xf iozone3_471.tar
$ cd iozone3_471/src/current
$ make
# make will display a list of supported platforms. Pick the one that matches the testing platform.
make <target>
$ make <target>
$ cp ./iozone $IOZONE_CONDA_ENV/bin
```

## Execution
Expand All @@ -35,7 +45,8 @@ Use the following command to test the performance of a specified file system on
The test produces output that cover all tested file operations for record size of 4k to 16M for file size of 64k to a specified file size, which should be twice the size of the memory of the node where the benchmark is run. The output will also be stored in an Excel file called IOZone_results.xls

```
./iozone -az -i 0 -i 1 -i 2 –c –e -b IOZone_results.xls \
$ source activate $IOZONE_CONDA_ENV
$ iozone -az -i 0 -i 1 -i 2 –c –e -b IOZone_results.xls \
-f <file system> \
-y 4 -q 16m \
-g <max file size>
Expand Down
14 changes: 14 additions & 0 deletions platform/IOZONE/iozone-conda-env.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: iozone-env
channels:
- bioconda
- conda-forge
- defaults
- r
dependencies:
- gmp=6.1.2=0
- mpc=1.1.0=4
- mpfr=3.1.5=0
- cloog=0.18.0=0
- gcc=4.8.5=7
- isl=0.12.2=0
prefix: /tmp/iozone
10 changes: 4 additions & 6 deletions platform/MDTEST/MDTEST.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,8 @@ Benchmark the performance of a specified filesystem by creating 1,048,576 (1024x
The following example will launch a test on a single compute node to create and remove required files and directories and then remove them.

```
cd /tmp/benchmark/ior/bin
source activate /tmp/benchmark/ior-env
./mdtest -F -C -T -r -n 1048576 -d <path_to_target_filesystem>
$ source activate $IOR_CONDA_ENV
$ mdtest -F -C -T -r -n 1048576 -d <path_to_target_filesystem>
```

Where, ``` <path_to_target_filesystem>``` is the path to the target filesystem that is been benchmarked.
Expand All @@ -31,9 +30,8 @@ Where, ``` <path_to_target_filesystem>``` is the path to the target filesystem t
The following example will launch a test on group of nodes to create and remove required files and directories and then remove them.

```
cd /tmp/benchmark/ior/bin
source activate /tmp/benchmark/ior-env
mpirun -np <num_tasks> -N <num_tasks_per_node> ./mdtest -F -C -T -r -n <1048576/<num_tasks>> -d <path_to_target_filesystem> -N <num_tasks_per_node>
$ source activate $IOR_CONDA_ENV
$ mpirun -np <num_tasks> -N <num_tasks_per_node> mdtest -F -C -T -r -n <1048576/<num_tasks>> -d <path_to_target_filesystem> -N <num_tasks_per_node>
```

Where ```<num_tasks>``` should be sufficiently large to create sufficient load to stress metadata operations of the specified filesystem, ```<num_tasks_per_node>``` is number of tasks to run on a allocated node, and ```<path_to_target_filesystem>``` is the path to the target filesystem that is been benchmarked.
4 changes: 0 additions & 4 deletions science/README.md

This file was deleted.

21 changes: 10 additions & 11 deletions science/abyss/abyss.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,27 @@
# ABySS

## Purpose

ABySS is a de novo, parallel, paired-end sequence assembler.

## Purpose
TBD

## Installation

Once the science datasets have been unpacked, and `../science-benchmarks.env`
has been updated appropriately:
Create a Conda environment based on the provided environment specification file and then activate the environment before building and running the benchmark.

```
$ conda env create -f abyss-conda-env.yml
$ mkdir -p $ABYSS_CONDA_ENV
$ conda-env create -p $ABYSS_CONDA_ENV -f $ABYSS_CONDA_ENV_SPEC
$ source activate $ABYSS_CONDA_ENV
```

### Sample data [optional]

$SAMPLE_DATA_ROOT/VELVET/*.fastq.gz

## Execution

Activate the Conda environment created for this benchmark then run the shell script, *run-abyss-benchmark*, to launch the benchmark.

```
$ source activate abyss
$ . ../science-benchmarks.env
$ ./run-abyss-benchmark
$ BENCHMARK_SOURCE/science/abyss/run-abyss-benchmark
```

### Output verification
Expand Down
12 changes: 6 additions & 6 deletions science/abyss/run-abyss-benchmark
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
#!/bin/sh
# needs to run in conda environment abyss

test -n "$SCIENCE_DATA_ROOTDIR" || {
echo >&2 "fatal error: missing environment variable SCIENCE_DATA_ROOTDIR - source the top-level environment file"
test -n "$INPUT_DATA_ROOT_DIR" || {
echo >&2 "fatal error: missing environment variable INPUT_DATA_ROOT_DIR - source the top-level environment file"
exit 1
}
test -n "$SCIENCE_OUTPUT_ROOTDIR" || {
echo >&2 "fatal error: missing environment variable SCIENCE_OUTPUT_ROOTDIR - source the top-level environment file"
test -n "$OUTPUT_DATA_ROOT_DIR" || {
echo >&2 "fatal error: missing environment variable OUTPUT_DATA_ROOT_DIR - source the top-level environment file"
exit 1
}

Expand All @@ -15,8 +15,8 @@ test -n "$NCORES" -a "$NCORES" -ge 1 -a "$NCORES" -le 100 || {
exit 1
}

datadir=$SCIENCE_DATA_ROOTDIR/abyss
outdir=$SCIENCE_OUTPUT_ROOTDIR/abyss
datadir=$INPUT_DATA_ROOT_DIR/abyss
outdir=$OUTPUT_DATA_ROOT_DIR/abyss
test -d "$datadir" || {
echo >&2 "fatal error: missing data directory $datadir - unpack the data tarball"
exit 1
Expand Down
14 changes: 0 additions & 14 deletions science/science-benchmarks.env

This file was deleted.

25 changes: 0 additions & 25 deletions science/tassel3-kgd/README.md

This file was deleted.

Loading

0 comments on commit 2e42283

Please sign in to comment.