Skip to content

Commit

Permalink
Enable docker.
Browse files Browse the repository at this point in the history
  • Loading branch information
linh35-rss committed Nov 10, 2021
1 parent 8e940e0 commit b7ff740
Show file tree
Hide file tree
Showing 54 changed files with 636 additions and 235 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
__pycache__/
.mypy_cache/
.vscode/
19 changes: 19 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
FROM continuumio/miniconda3:4.10.3

RUN apt-get update \
&& apt-get install -y build-essential procps

#RUN useradd -ms /bin/bash daedalus
#USER daedalus
#WORKDIR /home/daedalus

COPY environment.yml .
COPY install.sh .

RUN conda env create -f environment.yml

RUN chmod 755 /root
RUN echo "conda activate Daedalus" >> ~/.bashrc
SHELL ["/bin/bash", "--login", "-c"]

RUN export GIT_SSL_NO_VERIFY=1 && bash install.sh
11 changes: 11 additions & 0 deletions Dockerfile_bcl2fastq
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
FROM continuumio/miniconda3:4.10.3

RUN apt-get update \
&& apt-get install -y build-essential procps

COPY environment_bcl2fastq.yml .

RUN conda env create -f environment_bcl2fastq.yml

RUN chmod 755 /root
RUN echo "conda activate bcl2fastq" >> ~/.bashrc
122 changes: 71 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,78 +3,106 @@
Nextflow pipeline for analysis of libraries prepared using the ImmunoPETE assay.

- [Daedalus](#daedalus)
- [Download](#download)
- [Build Conda Environment](#build-conda-environment)
- [Test Pipeline](#test-pipeline)
- [Run Pipeline](#run-pipeline)
- [Load Environment](#load-the-environment)
- [Generate Manifest from Sample Sheet](#generate-manifest-from-sample-sheet)
- [Submit Pipeline Run](#submit-pipeline-run)
- [Output](#output)
- [Workflow](#workflow)
- [Methods](#methods)
- [Install and Configure](#install-and-configure)
- [Software Requirements](#software-requirements)
- [Download git repo](#download-git-repo)
- [Build Conda Environment (Optional)](#build-conda-environment-optional)
- [Install](#install)
- [Build Docker images](#build-docker-images)
- [Configure images](#configure-images)
- [Configure the pipeline](#configure-the-pipeline)
- [Test Pipeline on a single sample](#test-pipeline-on-a-single-sample)
- [Running Pipeline](#running-pipeline)
- [Generate Manifest from Sample Sheet](#generate-manifest-from-sample-sheet)
- [Submit Pipeline Run](#submit-pipeline-run)
- [Output](#output)
- [Workflow](#workflow)
- [Methods](#methods)

## Install and Configure

Note... The Nextflow Config file must be configured for the queue.

## Software Requirements
- built on a linux server: CentOS Linux release 7.7.1908 (Core)
- miniconda3, for package management
- nextflow 19.07.0, to run the pipeline
- uge, for cluster job submission
### Software Requirements

- Python 3.6
- Java 8
- Nextflow 19.07.0, to run the pipeline
- UGE, for cluster job submission

### Download git repo

## Download git rep
```bash
git clone [email protected]:bioinform/Daedalus.git
cd Daedalus
git checkout tags/${release-version}
```

### Build Conda Environment (Optional)

It's recommended to create a conda environment:

```bash
conda create -n Daedalus python=3.6
conda activate Daedalus
```

## Install SWIFR aligner
A smith waterman alignment implemention (c++) was developed and is used to identfy primers and V/J gene segements from fastq formatted reads. Please read the full README for swifr in the packages folder `./packages/swifr/` for instructions how to install.
### Install

## Build Conda Environment
Within Daedalus directory, execute the following command.

Build the conda environment for running the pipeline:
```bash
conda env create -f environment.yml
pip install .
```

##install python packages in the loaded conda ENV
### Build Docker images

Due to license restriction, you will have to build the Bcl2fastq image using the Docker file.
Please refer to [Dockerhub](https://docs.docker.com/docker-hub/) for creating repo and pushing images.

```bash
conda activate Daedalus_env
./install_packages.sh
docker build -t {dockerhub_username}/bcl2fastq:{version} -f Dockerfile_bcl2fastq .
docker push {dockerhub_username}/bcl2fastq:{version}
```

## Nextflow configuration
Nextflow must be configured for each system. The ipete profile in the nextflow config file `./nextflow/nextflow.config` should be updated accordingly.
### Configure images

After building your own images, set the following params in the `nextflow/defaults-ipete.config` with your own images.

```javascript
params.bcl2fastq_docker = "{dockerhub_username}/bcl2fastq:{version}"
```

### Configure the pipeline

The pipeline runs on UGE cluster by default. If you install it on a different machine, modify the cluster settings in the `nextflow/nextflow.config` accordingly.

```javascript
ipete_docker {
process.clusterOptions = { "-l h_vmem=${task.ext.vmem} -S /bin/bash -l docker_version=new -V" }
}
docker.runOptions = "-u=\$UID --rm -v /path/to/input_and_output:/path/to/input_and_output"
```

## Test Pipeline on a single sample
### Test Pipeline on a single sample
Once all the software has been installed and nextflow has been configured the pipeline bats test can be run. The bats test runs the pipeline on a single sample, from the paired fastq files provided:
- PBMC_1000ng_25ul_2_S6_R1_001.fastq.gz
- PBMC_1000ng_25ul_2_S6_R2_001.fastq.gz

In order the run the test, download both files from dropbox and move them into the data folder `Daedalus/data`. Once the data is available, run the test using the following commands:

```bash
conda activate Daedalus_env
cd test
bats single-sample-ipete.bats
```

An example of the pipeline output has also been provided: `PBMC_1000ng_25ul_2.tar.gz`


## Running Pipeline
Running the pipeline requires a complete flowcell worth of immunoPETE libraries.

### Load the Environment

```bash
conda activate Daedalus_env
```
Running the pipeline requires a complete flowcell worth of immunoPETE libraries.

### Generate Manifest for ImmunoPETE Run from the Sample Sheet
### Generate Manifest from Sample Sheet

```bash
manifestGenerator = /path/to/Daedalus/pipeline_runner/manifest_generator.py
Expand All @@ -84,48 +112,40 @@ sampleSheet = /path/to/sampleSheet.csv
python ${manifestGenerator} \
--pipeline_run_id Daedalus_example_run \
--sequencing_run_folder ${illuminaDir} \
--sequencing_platform NextSeq \
--output Daedalus_example_manifest.csv \
--subsample 1 \
--umi_mode True \
--umi2 'NNNNNNNNN' \
--umi_type R2 \
${sampleSheet}

${sampleSheet}
```

The manifest file contains all parameters needed for the pipeline to run. Sample specific tuning of parameters or any updates to the parameters can be acheived by editing the manifest file generated. After edits are complete, the pipeline can be submitted using the manifest file alone.

### Submit the Pipeline Run on the cluster
### Submit Pipeline Run

Using the output from Manifest Generator `Daedalus_example_manifest.csv` pipeline runs are submitted using the script: pipeline_runner.py.
Using the output from Manifest Generator `Daedalus_example_manifest.csv` pipeline runs can be submitted using the script: pipeline_runner.py.

```bash
pipelineRunner=/path/to/Daedalus/pipeline_runner/pipeline_runner.py
outDir=/path/to/analysis/output

python ${pipelineRunner} -g rssprbf --wait --resume -o ${outDir} Daedalus_example_manifest.csv
``````
python ${pipelineRunner} -g rssprbfprj --wait --resume -o ${outDir} Daedalus_example_manifest.csv
```

A `-g $group` needs to be provided to submit jobs to SGE cluster on SC1.

### Pipeline Output
### Output

At the specified output directory `${outDir}`, the analysis folder will be written using the `pipeline_run_id` "Daedalus_example_run"

```bash
${outDir}/Daedalus_example_run
```

## Nextflow Workflow DAG
## Workflow

![workflow](docs/img/flowchart.png)

## Methods
Overview of the [Pipeline Methods](docs/Daedalus_methods.md) for key processing steps.






5 changes: 3 additions & 2 deletions database/daedalus_db/pipeline_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,9 @@ def check_log_status(self, log_file):
if re.search("Execution status: failed", line):
return("fail")
elif re.search("Execution status: OK", line):
return("pass")

return("pass")
return("fail")

def get_analysis_info(self):
"""
Summary of Pipeline/Analysis status
Expand Down
4 changes: 4 additions & 0 deletions database/daedalus_db/run_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,14 @@ def _parse(self, fname):
self.flowcell_id = run.find('Flowcell').text
self.instrument = run.find('Instrument').text
self.sequencing_date = run.find('Date').text
if len(self.sequencing_date) > 10:
self.sequencing_date = self.sequencing_date.split(' ')[0]
if len(self.sequencing_date) == 6:
self.sequencing_date = datetime.strptime(self.sequencing_date, '%y%m%d').date()
elif len(self.sequencing_date) == 8:
self.sequencing_date = datetime.strptime(self.sequencing_date, '%Y%m%d').date()
elif len(self.sequencing_date) == 9:
self.sequencing_date = datetime.strptime(self.sequencing_date, '%m/%d/%Y').date()
else:
self.logger.warning(
'Unrecognized sequencing date format: {}. Record raw string instead.'.format(self.sequencing_date)
Expand Down
4 changes: 2 additions & 2 deletions docs/Daedalus_methods.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Methods
- [V-J segment detection](alignment.md)
- [Alignment Parsing, CDR3 detection](alignment_parsing.md)
- [Alignment](alignment.md)
- [V-D-J recombinant detection](alignment_parsing.md)
- [UMI-CDR3 deduplication](deduplication.md)
- [UMI-CDR3 consensus](consensus.md)

2 changes: 1 addition & 1 deletion docs/alignment.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## V-J segment detection
## V-D-J recombinant detection
V and J gene segments are aligned against Reads using [swifr](http://ghe-rss.roche.com/plsRED-Bioinformatics/swifr).
SWIFR (Smith Waterman Implementation for Fast Read identification) utilizes a `kmer similarity index` and the `Smith Waterman` alignment algorithm to identify gene segements within reads.

Expand Down
2 changes: 1 addition & 1 deletion docs/alignment_parsing.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Alignment Parsing
## V-D_J recombinant detection
After V and J gene alignment, read alignments are parsed along with the reference annotation to find the CDR3 sequence.

CDR3 boundary elements are annotated in the ImmunoDB reference: http://ghe-rss.roche.com/plsRED-Bioinformatics/immunoDB
Expand Down
5 changes: 4 additions & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ channels:
- defaults
- bioconda
- conda-forge
- dranew
dependencies:
- bats=0.4.0=1
- blas=1.0=mkl
Expand Down Expand Up @@ -49,9 +50,11 @@ dependencies:
- wrapt=1.10.11=py36_0
- xz=5.2.3=0
- zlib=1.2.11=0
- samtools=1.9=h46bd0b3_0
- bzip2=1.0.6=3
- pip:
- argparse==1.4.0
- bio==0.1.0
- bio==1.2.8
- boto==2.49.0
- boto3==1.12.19
- botocore==1.15.19
Expand Down
49 changes: 49 additions & 0 deletions environment_bcl2fastq.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: bcl2fastq
channels:
- bioconda/label/cf201901
- https://repo.continuum.io/pkgs/free
- conda-forge/label/cf201901
- anaconda
- guillaumecharbonnier
- defaults
- bioconda
- conda-forge
- dranew
dependencies:
- blas=1.0=mkl
- boltons=19.2.0=py_0
- ca-certificates=2018.11.29=ha4d7672_0
- certifi=2018.11.29=py36_1000
- contextlib2=0.6.0.post1=py_0
- debtcollector=1.21.0=py36_0
- decorator=4.4.1=py_0
- dit=1.2.3=py_1
- httplib2=0.12.0=py36_1000
- libgcc-ng=9.1.0=hdf63c60_0
- libgfortran=3.0.0=1
- libgfortran-ng=7.3.0=hdf63c60_0
- libstdcxx-ng=9.1.0=hdf63c60_0
- mkl=2017.0.3=0
- ncurses=5.9=10
- networkx=2.4=py_0
- numpy=1.13.3=py36ha266831_3
- openssl=1.0.2p=h470a237_2
- pandas=0.25.3=py36he6710b0_0
- pbr=5.4.2=py_0
- pip=19.3.1=py36_0
- python=3.6.2=0
- python-dateutil=2.8.1=py_0
- pytz=2019.3=py_0
- readline=6.2=2
- requests=2.12.5=py36_0
- setuptools=36.4.0=py36_1
- six=1.13.0=py36_0
- sqlite=3.13.0=0
- tk=8.5.18=0
- wheel=0.29.0=py36_0
- wrapt=1.10.11=py36_0
- xz=5.2.3=0
- zlib=1.2.11=0
- samtools=1.9=h46bd0b3_0
- bcl2fastq=2.19.0=1
- bzip2=1.0.6=3
15 changes: 15 additions & 0 deletions install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
pip install --upgrade git+http://ghe-rss.roche.com/pls-red-packages/fastq-streamer#v0.1.0
pip install --upgrade git+http://ghe-rss.roche.com/pls-red-packages/bam-streamer#v0.1.0
pip install --upgrade git+http://ghe-rss.roche.com/pls-red-packages/VDJ-recombinant-detection#v0.1.0
pip install --upgrade git+http://ghe-rss.roche.com/pls-red-packages/parse-umi#v0.1.0
pip install --upgrade git+http://ghe-rss.roche.com/pls-red-packages/extract-umi#v0.1.0
pip install --upgrade git+http://ghe-rss.roche.com/pls-red-packages/SeqNetworks#v0.1.0
pip install --upgrade git+http://ghe-rss.roche.com/pls-red-packages/ipete-dedup#v0.1.1
pip install --upgrade git+http://ghe-rss.roche.com/pls-red-packages/ipete-metrics#v0.1.2
pip install --upgrade git+http://ghe-rss.roche.com/pls-red-packages/ipete-reporter#v0.1.2
pip install --upgrade git+http://ghe-rss.roche.com/pls-red-packages/spikein-split#v0.1.0
pip install --upgrade git+http://ghe-rss.roche.com/pls-red-packages/trim-primers#v0.1.0




8 changes: 4 additions & 4 deletions nextflow/config/VDJ_detect/VDJ_detect.config
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
process {
withName: VDJ_detect {
memory = '30 GB'
cpus = 1
module = [
'miniconda3'
]
cpus = 1
container = {
"${params.daedalus_docker}"
}
ext {
command = {"$workflow.projectDir/config/VDJ_detect/VDJ_detect.sh"}
vmem = '30G'
Expand Down
6 changes: 4 additions & 2 deletions nextflow/config/VDJ_detect/VDJ_detect.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#!/bin/bash
source activate Daedalus_env
#!/bin/bash -e

# Load conda env within Docker
source /root/.bashrc || echo "Failed to source /root/.bashrc" >&2

VDJdetector -v $vSortBam -j $jSortBam -b ${sample} -i ${percentId} -r ${referenceData}

Expand Down
Loading

0 comments on commit b7ff740

Please sign in to comment.