diff --git a/README-science-data.md b/Benchmark-Input-Data-Packaging.md similarity index 53% rename from README-science-data.md rename to Benchmark-Input-Data-Packaging.md index aceb22a..c1e1116 100644 --- a/README-science-data.md +++ b/Benchmark-Input-Data-Packaging.md @@ -1,13 +1,10 @@ -# Science Data Tarballs - -The science data for the benchmarks lives in the directory pointed to by -$SCIENCE_DATA_ROOTDIR, which is set in science/science-benchmarks.env +# Packaging Benchmarks' Input Data This directory at AgResearch contains many symlinks. When a tarball is made, it should be created like this: ``` -$ cd $SCIENCE_DATA_ROOTDIR +$ cd $INPUT_DATA_ROOT_DIR $ tar czhf ../data.tgz * ``` diff --git a/README.md b/README.md index 11f8c20..5a0538b 100644 --- a/README.md +++ b/README.md @@ -10,4 +10,22 @@ This repository includes benchmarks that will be used to ensure the fit for purp Each individual benchmark has its own README file which describes the purpose of the benchmark, how to run the benchmark and how to verify its output(s). -If a benchmark needs to be built from the source, it should be build and execute in a [Conda](https://conda.io) environment created by using the Conda environment specified in the benchmark's documentation. This approach ensures a stable, although not necessary optimal, building and executing environment. If the target platform does not have Conda installed, follow instruction [here](https://conda.io/miniconda.html) to install it on the platform. +This benchmark suite uses binary distributions in the [Conda](https://conda.io) repositories to deploy benchmark programs. In such a case, there shall be a Conda environment specification file included in the benchmark's subdirectory. Please follow its README file to deploy the benchmark program. Some benchmark program will required to be built from the source. Please use the Conda environment specification file included in the benchmark to crate a Conda environment for building and running such a benchmark program. This approach ensures a stable, although not necessary optimal, building and executing environment for benchmarking. If the target platform does not have Conda installed, follow instruction [here](https://conda.io/miniconda.html) to install it on the platform. + +## Environment Variables + +Please update environment *BENCHMARK_ROOT* variable in file ```benchmark.env``` included in this repository based on target platform's local environment. This file must be sourced before deploying and running this benchmark suite. + +``` +$ source benchmark.env +``` + +## Getting and Preparing Input Data + +All input data required to execute this benchmark suite can be downloaded from [here](https://url/to/be/confirmed). Please download it and save it in the same root directory as the benchmark suite and then use the following command to extract data from the tarball: + +``` +$ cd $BENCHMARK_ROOT +$ wget https://url/to/be/confirmed +$ tar xzf benchmark_input_data.taz +``` diff --git a/benchmark.env b/benchmark.env new file mode 100644 index 0000000..ccecb33 --- /dev/null +++ b/benchmark.env @@ -0,0 +1,55 @@ +# Update this according to local directory structure +# and source it before running any benchmarks + +# root directory to the benchmark source +BENCHMARK_SOURCE=$PWD +export BENCHMARK_SOURCE + +# root directory for benchmarks +BENCHMARK_ROOT=/tmp/benchmarks +export BENCHMARK_ROOT +mkdir -p $BENCHMARK_ROOT + +# input data for all benchmarks +INPUT_DATA_ROOT_DIR=$BENCHMARK_ROOT/benchmark_input_data +export INPUT_DATA_ROOT_DIR + +# output directories +OUTPUT_DATA_ROOT_DIR=$BENCHMARK_ROOT/benchmark_output_data +export OUTPUT_DATA_ROOT_DIR + +## Platform benchmarks +# IOR benchmark + +IOR_CONDA_ENV=$BENCHMARK_ROOT/conda-env/ior +IOR_CONDA_ENV_SPEC=$BENCHMARK_SOURCE/platform/IOR/ior-conda-env.yml +export IOR_CONDA_ENV +export IOR_CONDA_ENV_SPEC + +# IOZONE benchmark +IOZONE_CONDA_ENV=$BENCHMARK_ROOT/conda-env/iozone +IOZONE_CONDA_ENV_SPEC=$BENCHMARK_SOURCE/platform/IOZONE/iozone-conda-env.yml +export IOZONE_CONDA_ENV +export IOZONE_CONDA_ENV_SPEC + +## Science benchmarks +# ABYSS benchmark +ABYSS_CONDA_ENV=$BENCHMARK_ROOT/conda-env/abyss +ABYSS_CONDA_ENV_SPEC=$BENCHMARK_SOURCE/science/abyss/abyss-conda-env.yml +export ABYSS_CONDA_ENV +export ABYSS_CONDA_ENV_SPEC +NCORES=20 +export NCORES + +# VELVET benchmark +VELVET_CONDA_ENV=$BENCHMARK_ROOT/conda-env/velvet +VELVET_CONDA_ENV_SPEC=$BENCHMARK_SOURCE/science/velvet/velvet-conda-env.yml +export VELVET_CONDA_ENV +export VELVET_CONDA_ENV_SPEC + +## Workflow benchmarks +# GBS +GBS_CONDA_ENV=$BENCHMARK_ROOT/conda-env/gbs +GBS_CONDA_ENV_SPEC=$BENCHMARK_SOURCE/workflow/gbs/gbs-conda-env.yml +export GBS_CONDA_ENV +export GBS_CONDA_ENV_SPEC diff --git a/platform/IOR/IOR.md b/platform/IOR/IOR.md index 19cc187..142cd54 100644 --- a/platform/IOR/IOR.md +++ b/platform/IOR/IOR.md @@ -11,27 +11,26 @@ It measures the sustainable bandwidth of a file system using the various APIs, Download the latest release (v3.1.0) from GitHub and unpack the file. ``` -mkdir -p /tmp/benchmark/ -cd /tmp/benchmark/ -git clone git@github.com:AgResearch/ior.git - +$ cd $BENCHMARK_ROOT +$ git clone git@github.com:AgResearch/ior.git ``` -Create a Conda environment based on the provided environment file and then activate the environment before building and running the benchmark. For example: +Create a Conda environment based on the provided environment specification file and then activate the environment before building and running the benchmark. For example: ``` -conda-env conda-env create -p /tmp/benchmark/ior-env -f /ior-conda-env.yml -source activate /tmp/benchmark/ior-env +$ mkdir -p $IOR_CONDA_ENV +$ conda-env create -p $IOR_CONDA_ENV -f $IOR_CONDA_ENV_SPEC +$ source activate $IOR_CONDA_ENV ``` Use the following instructions to navigate into directory, ior, and to build it. ``` -cd ior -./bootstrap -./configure --prefix=$PWD -make -make install +$ cd $BENCHMARK_ROOT/ior +$ ./bootstrap +$ ./configure --prefix=$CONDA_PREFIX +$ make +$ make install ``` ## Execution @@ -41,9 +40,8 @@ make install Run IOR to benchmark the performance of a single process writing to a file and then reading such a file sequentially. The following commands serve as an example, you may need to customise it for the benchmarking platform. ``` -cd /tmp/benchmark/ior/bin -source activate /tmp/benchmark/ior-env -./ior -a POSIX -w -r -e -b -o \ior_seq_test +$ source activate $IOR_CONDA_ENV +$ ior -a POSIX -w -r -e -b -o \ior_seq_test ``` Where `````` should be at least twice as large as the size of the compute node where the benchmark is executed and ``` ``` is the path to the target filesystem that is been benchmarked. @@ -53,12 +51,13 @@ Where `````` should be at least twice as large as the size of the co Run IOR tests concurrently to benchmark the performance of a filesystem on a compute node. The following is an example bash script for this test, although it may need to be customised for the benchmarking platform. ``` +source activate $IOR_CONDA_ENV echo "Preparing testing data..." -./ior -a POSIX -w -e -k -b -o /ior_rw_test > ./ior_concurent.out +ior -a POSIX -w -e -k -b -o /ior_rw_test > ./ior_concurent.out echo "Starging Concurrent Read..." -./ior -a POSIX -r -b -o /ior_rw_test > ./ior_concurent_r.out& +ior -a POSIX -r -b -o /ior_rw_test > ./ior_concurent_r.out& echo "Starting Concurrent Write..." -./ior -a POSIX -w -e -b -o /ior_rw_test2 > ./ior_concurent_w.out +ior -a POSIX -w -e -b -o /ior_rw_test2 > ./ior_concurent_w.out echo "Done!" ``` @@ -69,9 +68,8 @@ Where `````` should be at least twice as large as the size of the co Run IOR as a MPI program to benchmark the write and read performance of a platform's filesystem. The following commands serve as an example, you may need to customise it for the benchmarking platform. ``` -cd /tmp/benchmark/ior/bin -source activate /tmp/benchmark/ior-env -mpirun -np -N ./ior -a MPIIO -w -r -N -b -o \ior_seq_test +$ source activate $IOR_CONDA_ENV +$ mpirun -np -N ior -a MPIIO -w -r -N -b -o \ior_seq_test ``` Where `````` should be large to create sufficient load to test the aggregated bandwidth of the specified filesystem, `````` is number of tasks to run on a allocated node, `````` times `````` should be twice as large as the size of the compute node where the benchmark is executed, and `````` is the path to the target filesystem that is been benchmarked. diff --git a/platform/IOZONE/IOZONE.md b/platform/IOZONE/IOZONE.md index 5bba671..3b7894a 100644 --- a/platform/IOZONE/IOZONE.md +++ b/platform/IOZONE/IOZONE.md @@ -11,17 +11,27 @@ This benchmark is used to measure the performance of the platform's filesystem. The benchmark (v3-471)can be downloaded from http://www.iozone.org/src/current/iozone3_471.tar ``` -wget http://www.iozone.org/src/current/iozone3_471.tar +$ cd $BENCHMARK_ROOT +$ wget http://www.iozone.org/src/current/iozone3_471.tar ``` -Once the file is downloaded, navigate to the directory where the downloaded file is store and use the following instructions to build it. A C compiler and make is required to build it. +Create a Conda environment based on the provided environment file and then activate the environment before building and running the benchmark. For example: ``` -tar xf iozone3_471.tar -cd iozone3_471/src -make +$ mkdir -p $IOZONE_CONDA_ENV +$ conda-env create -p $IOZONE_CONDA_ENV -f $IOZONE_CONDA_ENV_SPEC +$ source activate $IOZONE_CONDA_ENV +``` + +Navigate to the directory where the downloaded file is store and use the following instructions to build it in the created Conda environment. + +``` +$ tar xf iozone3_471.tar +$ cd iozone3_471/src/current +$ make # make will display a list of supported platforms. Pick the one that matches the testing platform. -make +$ make +$ cp ./iozone $IOZONE_CONDA_ENV/bin ``` ## Execution @@ -35,7 +45,8 @@ Use the following command to test the performance of a specified file system on The test produces output that cover all tested file operations for record size of 4k to 16M for file size of 64k to a specified file size, which should be twice the size of the memory of the node where the benchmark is run. The output will also be stored in an Excel file called IOZone_results.xls ``` -./iozone -az -i 0 -i 1 -i 2 –c –e -b IOZone_results.xls \ +$ source activate $IOZONE_CONDA_ENV +$ iozone -az -i 0 -i 1 -i 2 –c –e -b IOZone_results.xls \ -f \ -y 4 -q 16m \ -g diff --git a/platform/IOZONE/iozone-conda-env.yml b/platform/IOZONE/iozone-conda-env.yml new file mode 100644 index 0000000..d8c8450 --- /dev/null +++ b/platform/IOZONE/iozone-conda-env.yml @@ -0,0 +1,14 @@ +name: iozone-env +channels: + - bioconda + - conda-forge + - defaults + - r +dependencies: + - gmp=6.1.2=0 + - mpc=1.1.0=4 + - mpfr=3.1.5=0 + - cloog=0.18.0=0 + - gcc=4.8.5=7 + - isl=0.12.2=0 +prefix: /tmp/iozone diff --git a/platform/MDTEST/MDTEST.md b/platform/MDTEST/MDTEST.md index e992fe7..ca0360c 100644 --- a/platform/MDTEST/MDTEST.md +++ b/platform/MDTEST/MDTEST.md @@ -19,9 +19,8 @@ Benchmark the performance of a specified filesystem by creating 1,048,576 (1024x The following example will launch a test on a single compute node to create and remove required files and directories and then remove them. ``` -cd /tmp/benchmark/ior/bin -source activate /tmp/benchmark/ior-env -./mdtest -F -C -T -r -n 1048576 -d +$ source activate $IOR_CONDA_ENV +$ mdtest -F -C -T -r -n 1048576 -d ``` Where, ``` ``` is the path to the target filesystem that is been benchmarked. @@ -31,9 +30,8 @@ Where, ``` ``` is the path to the target filesystem t The following example will launch a test on group of nodes to create and remove required files and directories and then remove them. ``` -cd /tmp/benchmark/ior/bin -source activate /tmp/benchmark/ior-env -mpirun -np -N ./mdtest -F -C -T -r -n <1048576/> -d -N +$ source activate $IOR_CONDA_ENV +$ mpirun -np -N mdtest -F -C -T -r -n <1048576/> -d -N ``` Where `````` should be sufficiently large to create sufficient load to stress metadata operations of the specified filesystem, `````` is number of tasks to run on a allocated node, and `````` is the path to the target filesystem that is been benchmarked. diff --git a/science/README.md b/science/README.md deleted file mode 100644 index 991003e..0000000 --- a/science/README.md +++ /dev/null @@ -1,4 +0,0 @@ -# Science Benchmarks - -These benchmarks use some large data files, which must be unpacked in an -appropriate directory, which is then set in `science-benchmarks.env`. diff --git a/science/abyss/abyss.md b/science/abyss/abyss.md index d6ff871..eb46e18 100644 --- a/science/abyss/abyss.md +++ b/science/abyss/abyss.md @@ -1,28 +1,27 @@ # ABySS -## Purpose - ABySS is a de novo, parallel, paired-end sequence assembler. +## Purpose +TBD + ## Installation -Once the science datasets have been unpacked, and `../science-benchmarks.env` -has been updated appropriately: +Create a Conda environment based on the provided environment specification file and then activate the environment before building and running the benchmark. ``` -$ conda env create -f abyss-conda-env.yml +$ mkdir -p $ABYSS_CONDA_ENV +$ conda-env create -p $ABYSS_CONDA_ENV -f $ABYSS_CONDA_ENV_SPEC +$ source activate $ABYSS_CONDA_ENV ``` -### Sample data [optional] - -$SAMPLE_DATA_ROOT/VELVET/*.fastq.gz - ## Execution +Activate the Conda environment created for this benchmark then run the shell script, *run-abyss-benchmark*, to launch the benchmark. + ``` $ source activate abyss -$ . ../science-benchmarks.env -$ ./run-abyss-benchmark +$ BENCHMARK_SOURCE/science/abyss/run-abyss-benchmark ``` ### Output verification diff --git a/science/abyss/run-abyss-benchmark b/science/abyss/run-abyss-benchmark index c76ad2b..a66e6fe 100755 --- a/science/abyss/run-abyss-benchmark +++ b/science/abyss/run-abyss-benchmark @@ -1,12 +1,12 @@ #!/bin/sh # needs to run in conda environment abyss -test -n "$SCIENCE_DATA_ROOTDIR" || { - echo >&2 "fatal error: missing environment variable SCIENCE_DATA_ROOTDIR - source the top-level environment file" +test -n "$INPUT_DATA_ROOT_DIR" || { + echo >&2 "fatal error: missing environment variable INPUT_DATA_ROOT_DIR - source the top-level environment file" exit 1 } -test -n "$SCIENCE_OUTPUT_ROOTDIR" || { - echo >&2 "fatal error: missing environment variable SCIENCE_OUTPUT_ROOTDIR - source the top-level environment file" +test -n "$OUTPUT_DATA_ROOT_DIR" || { + echo >&2 "fatal error: missing environment variable OUTPUT_DATA_ROOT_DIR - source the top-level environment file" exit 1 } @@ -15,8 +15,8 @@ test -n "$NCORES" -a "$NCORES" -ge 1 -a "$NCORES" -le 100 || { exit 1 } -datadir=$SCIENCE_DATA_ROOTDIR/abyss -outdir=$SCIENCE_OUTPUT_ROOTDIR/abyss +datadir=$INPUT_DATA_ROOT_DIR/abyss +outdir=$OUTPUT_DATA_ROOT_DIR/abyss test -d "$datadir" || { echo >&2 "fatal error: missing data directory $datadir - unpack the data tarball" exit 1 diff --git a/science/science-benchmarks.env b/science/science-benchmarks.env deleted file mode 100644 index 54d20cd..0000000 --- a/science/science-benchmarks.env +++ /dev/null @@ -1,14 +0,0 @@ -# update this according to local directory structure -# and source it before running the science benchmarks - -# input data for all science benchmarks -SCIENCE_DATA_ROOTDIR=/dataset/invermay_hpc_benchmarking/active/data -export SCIENCE_DATA_ROOTDIR - -# output directories -SCIENCE_OUTPUT_ROOTDIR=/dataset/invermay_hpc_benchmarking/scratch/output -export SCIENCE_OUTPUT_ROOTDIR - -# number of cores to run on -NCORES=20 -export NCORES diff --git a/science/tassel3-kgd/README.md b/science/tassel3-kgd/README.md deleted file mode 100644 index 0cc5aad..0000000 --- a/science/tassel3-kgd/README.md +++ /dev/null @@ -1,25 +0,0 @@ -# Tassel 3 / KGD - -## Purpose - - -## Installation - -Once the science datasets have been unpacked, and `../science-benchmarks.env` -has been updated appropriately: - -``` -$ conda env create -f tassel3-kgd-conda-env.yml -``` - -## Execution - -``` -$ source activate tassel3 -$ . ../science-benchmarks.env -$ ./run-tassel3-kgd-benchmark -``` - -### Output verification - -**TBD** diff --git a/science/velvet/README.md b/science/velvet/README.md deleted file mode 100644 index 1fd2432..0000000 --- a/science/velvet/README.md +++ /dev/null @@ -1,29 +0,0 @@ -# Name of the Benchmark - -## Purpose - -Velvet is a genome assembler that we use reasonably often. Velvetoptimizer is a wrapper script for velvet to find the optimal parameters. This script will generate several instances of the assembler at once, providing a decent benchmark of IO, CPU and memory performance. - -## Installation - -Once the science datasets have been unpacked, and `../science-benchmarks.env` -has been updated appropriately: - -``` -$ conda env create -f velvet-conda-env.yml -``` - -## Execution - -``` -$ source activate velvet -$ . ../science-benchmarks.env -$ ./run-velvet-benchmark -``` - -### Output verification [optional] - -Optimal Velvet hash value should be 189, i.e. grep for the followingi n the output: - -"Velvet hash value: 189" - diff --git a/science/velvet/run-velvet-benchmark b/science/velvet/run-velvet-benchmark index 5c60da5..e498285 100755 --- a/science/velvet/run-velvet-benchmark +++ b/science/velvet/run-velvet-benchmark @@ -1,17 +1,17 @@ #!/bin/sh # needs to run in conda environment velvet -test -n "$SCIENCE_DATA_ROOTDIR" || { - echo >&2 "fatal error: missing environment variable SCIENCE_DATA_ROOTDIR - source the top-level environment file" +test -n "$INPUT_DATA_ROOT_DIR" || { + echo >&2 "fatal error: missing environment variable INPUT_DATA_ROOT_DIR - source the top-level environment file" exit 1 } -test -n "$SCIENCE_OUTPUT_ROOTDIR" || { - echo >&2 "fatal error: missing environment variable SCIENCE_OUTPUT_ROOTDIR - source the top-level environment file" +test -n "$OUTPUT_DATA_ROOT_DIR" || { + echo >&2 "fatal error: missing environment variable OUTPUT_DATA_ROOT_DIR - source the top-level environment file" exit 1 } -datadir=$SCIENCE_DATA_ROOTDIR/velvet -outdir=$SCIENCE_OUTPUT_ROOTDIR/velvet +datadir=$INPUT_DATA_ROOT_DIR/velvet +outdir=$OUTPUT_DATA_ROOT_DIR/velvet test -d "$datadir" || { echo >&2 "fatal error: missing data directory $datadir - unpack the data tarball" exit 1 diff --git a/science/velvet/velvet.md b/science/velvet/velvet.md new file mode 100644 index 0000000..13f264c --- /dev/null +++ b/science/velvet/velvet.md @@ -0,0 +1,32 @@ +# Name of the Benchmark + +Velvet is a genome assembler that we use reasonably often. Velvetoptimizer is a wrapper script for velvet to find the optimal parameters. This script will generate several instances of the assembler at once, + +## Purpose + +Benchmark a platform's IO, CPU and memory performance via a regularly used genome assembler. + +## Installation + +Create a Conda environment based on the provided environment specification file and then activate the environment before building and running the benchmark. + +``` +$ mkdir -p $VELVET_CONDA_ENV +$ conda-env create -p $VELVET_CONDA_ENV -f $VELVET_CONDA_ENV_SPEC +$ source activate $VELVET_CONDA_ENV +``` + +## Execution + +Activate the Conda environment created for this benchmark then run the shell script, *run-velvet-benchmark*, to launch the benchmark. + +``` +$ source activate $VELVET_CONDA_ENV +$ BENCHMARK_SOURCE/science/velvet/run-velvet-benchmark +``` + +### Output verification [optional] + +Optimal Velvet hash value should be 189, i.e. grep for the following in the output: + +"Velvet hash value: 189" diff --git a/science/tassel3-kgd/01_create_dirs.sh b/workflow/GBS/01_create_dirs.sh similarity index 100% rename from science/tassel3-kgd/01_create_dirs.sh rename to workflow/GBS/01_create_dirs.sh diff --git a/science/tassel3-kgd/02_FastqToTagCount.sh b/workflow/GBS/02_FastqToTagCount.sh similarity index 100% rename from science/tassel3-kgd/02_FastqToTagCount.sh rename to workflow/GBS/02_FastqToTagCount.sh diff --git a/science/tassel3-kgd/03_MergeTaxaTagCount.sh b/workflow/GBS/03_MergeTaxaTagCount.sh similarity index 100% rename from science/tassel3-kgd/03_MergeTaxaTagCount.sh rename to workflow/GBS/03_MergeTaxaTagCount.sh diff --git a/science/tassel3-kgd/04_TagCountToTagPair.sh b/workflow/GBS/04_TagCountToTagPair.sh similarity index 100% rename from science/tassel3-kgd/04_TagCountToTagPair.sh rename to workflow/GBS/04_TagCountToTagPair.sh diff --git a/science/tassel3-kgd/05_TagPairToTBT.sh b/workflow/GBS/05_TagPairToTBT.sh similarity index 100% rename from science/tassel3-kgd/05_TagPairToTBT.sh rename to workflow/GBS/05_TagPairToTBT.sh diff --git a/science/tassel3-kgd/06_TBTToMapInfo.sh b/workflow/GBS/06_TBTToMapInfo.sh similarity index 100% rename from science/tassel3-kgd/06_TBTToMapInfo.sh rename to workflow/GBS/06_TBTToMapInfo.sh diff --git a/science/tassel3-kgd/07_MapInfoToHapMap.sh b/workflow/GBS/07_MapInfoToHapMap.sh similarity index 100% rename from science/tassel3-kgd/07_MapInfoToHapMap.sh rename to workflow/GBS/07_MapInfoToHapMap.sh diff --git a/science/tassel3-kgd/08_get_reads_tags_per_samplev2.py b/workflow/GBS/08_get_reads_tags_per_samplev2.py similarity index 100% rename from science/tassel3-kgd/08_get_reads_tags_per_samplev2.py rename to workflow/GBS/08_get_reads_tags_per_samplev2.py diff --git a/science/tassel3-kgd/09_run_KGDs.sh b/workflow/GBS/09_run_KGDs.sh similarity index 100% rename from science/tassel3-kgd/09_run_KGDs.sh rename to workflow/GBS/09_run_KGDs.sh diff --git a/science/tassel3-kgd/GBS-Chip-Gmatrix.R b/workflow/GBS/GBS-Chip-Gmatrix.R similarity index 100% rename from science/tassel3-kgd/GBS-Chip-Gmatrix.R rename to workflow/GBS/GBS-Chip-Gmatrix.R diff --git a/workflow/GBS/GBS.md b/workflow/GBS/GBS.md new file mode 100644 index 0000000..df65b58 --- /dev/null +++ b/workflow/GBS/GBS.md @@ -0,0 +1,33 @@ +# GBS + +Genotyping By Sequencing is a frequently used method in genomics today. It is implemented as a pipeline in AgResearch based on TASSEL3 and KGD (open sourced code developed by AgResearch). + +## Purpose + +Validate a platform's fit for purpose to support a major workflow used by scientists in AgResearch. + +## Installation + +## Installation + +Create a Conda environment based on the provided environment specification file and then activate the environment before building and running the benchmark. + +``` +$ mkdir -p $GBS_CONDA_ENV +$ conda-env create -p $GBS_CONDA_ENV -f $GBS_CONDA_ENV_SPEC +$ source activate $GBS_CONDA_ENV +``` + + +## Execution + +This workflow benchmark consists a serial of tasks, which are encapsulated by a single shell script, *run-gbs-benchmark*. To execute this benchmark, first activate the Conda environment created for this benchmark and then execute. + +``` +$ source activate $GBS_CONDA_ENV +$ $BENCHMARK_SOURCE/workflow/GBS/run-gbs-benchmark +``` + +### Output verification + +**TBD** diff --git a/science/tassel3-kgd/tassel3-kgd-conda-env.yml b/workflow/GBS/gbs-conda-env.yml similarity index 99% rename from science/tassel3-kgd/tassel3-kgd-conda-env.yml rename to workflow/GBS/gbs-conda-env.yml index b0c2cfd..cef4bf9 100644 --- a/science/tassel3-kgd/tassel3-kgd-conda-env.yml +++ b/workflow/GBS/gbs-conda-env.yml @@ -1,4 +1,4 @@ -name: tassel3-kgd +name: gbs channels: - bioconda - r diff --git a/science/tassel3-kgd/run-tassel3-kgd-benchmark b/workflow/GBS/run-gbs-benchmark similarity index 83% rename from science/tassel3-kgd/run-tassel3-kgd-benchmark rename to workflow/GBS/run-gbs-benchmark index da6b4df..c8a0025 100755 --- a/science/tassel3-kgd/run-tassel3-kgd-benchmark +++ b/workflow/GBS/run-gbs-benchmark @@ -1,17 +1,17 @@ #!/bin/sh -test -n "$SCIENCE_DATA_ROOTDIR" || { - echo >&2 "fatal error: missing environment variable SCIENCE_DATA_ROOTDIR - source the top-level environment file" +test -n "$INPUT_DATA_ROOT_DIR" || { + echo >&2 "fatal error: missing environment variable INPUT_DATA_ROOT_DIR - source the top-level environment file" exit 1 } -test -n "$SCIENCE_OUTPUT_ROOTDIR" || { - echo >&2 "fatal error: missing environment variable SCIENCE_OUTPUT_ROOTDIR - source the top-level environment file" +test -n "$OUTPUT_DATA_ROOT_DIR" || { + echo >&2 "fatal error: missing environment variable OUTPUT_DATA_ROOT_DIR - source the top-level environment file" exit 1 } -datadir=$SCIENCE_DATA_ROOTDIR/tassel3-kgd -outdir=$SCIENCE_OUTPUT_ROOTDIR/tassel3-kgd -scriptdir=`/bin/pwd` +datadir=$INPUT_DATA_ROOT_DIR/tassel3-kgd +outdir=$OUTPUT_DATA_ROOT_DIR/tassel3-kgd +scriptdir=$BENCHMARK_SOURCE/workflow/GBS test -d "$datadir" || { echo >&2 "fatal error: missing data directory $datadir - unpack the data tarball" diff --git a/science/tassel3-kgd/run_KGD_1.R b/workflow/GBS/run_KGD_1.R similarity index 100% rename from science/tassel3-kgd/run_KGD_1.R rename to workflow/GBS/run_KGD_1.R diff --git a/workflow/TODO.md b/workflow/TODO.md deleted file mode 100644 index 9f3ef43..0000000 --- a/workflow/TODO.md +++ /dev/null @@ -1 +0,0 @@ -Create subdirectories for each workflow benchmark.