Skip to content

Commit

Permalink
Add the -i flag to the instructions so that they will work with The…
Browse files Browse the repository at this point in the history
…misto v2.0.0 or newer.
  • Loading branch information
tmaklin committed Nov 19, 2021
1 parent 0a99031 commit 65aae01
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 6 deletions.
27 changes: 22 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,9 +87,18 @@ tree presented in Mäklin et al. 2020 using mGEMS is available in the
[docs folder of this repository](docs/TUTORIAL.md).

### Quickstart — full pipeline
#### Index the reference sequences
Build a [Themisto](https://github.com/algbio/themisto) index to
align against.

__Themisto version v2.0.0 or newer__

```
mkdir themisto_index
mkdir themisto_index/tmp
themisto build -k 31 -i example.fasta -o themisto_index/index --temp-dir themisto_index/tmp
```

__Themisto versions v0.1.1 to v1.2.0__

```
Expand All @@ -98,8 +107,16 @@ mkdir themisto_index/tmp
build_index --k 31 --input-file example.fasta --auto-colors --index-dir themisto_index --temp-dir themisto_index/tmp
```

#### Pseudoalign the reads
Align paired-end reads 'reads_1.fastq.gz' and 'reads_2.fastq.gz' with Themisto (note the **--sort-output** flag must be used!)

__Themisto version v2.0.0 or newer__

```
themisto pseudoalign -q reads_1.fastq.gz -o pseudoalignments_1.aln -i themisto_index/index --temp-dir themisto_index/tmp --rc --n-threads 16 --sort-output --gzip-output
themisto pseudoalign -q reads_2.fastq.gz -o pseudoalignments_2.aln -i themisto_index/index --temp-dir themisto_index/tmp --rc --n-threads 16 --sort-output --gzip-output
```

__Themisto versions v0.1.1 to v1.2.0__

```
Expand All @@ -117,7 +134,7 @@ mSWEEP --themisto-1 pseudoalignments_1.aln.gz --themisto-2 pseudoalignments_2.al
Bin the reads and write all bins to the 'mGEMS-out' folder
```
mkdir mGEMS-out
mGEMS -r reads_1.fastq.gz,reads_2.fastq.gz --themisto-alns pseudoalignments_1.aln.gz,pseudoalignments_2.aln.gz -o mGEMS-out --probs mSWEEP_probs.csv -a mSWEEP_abundances.txt --index themisto_index
mGEMS -r reads_1.fastq.gz,reads_2.fastq.gz -i reference_grouping.txt --themisto-alns pseudoalignments_1.aln.gz,pseudoalignments_2.aln.gz -o mGEMS-out --probs mSWEEP_probs.csv -a mSWEEP_abundances.txt --index themisto_index
```
This will write the binned paired-end reads for *all groups* in the
mSWEEP_abundances.txt file in the mGEMS-out folder (compressed with
Expand All @@ -128,25 +145,25 @@ You can also extract the read-to-group assignments table that mGEMS
uses internally by adding the `--write-assignment-table` toggle to the
call to `mGEMS` or `mGEMS bin`:
```
mGEMS --groups group-3,group-4 -r reads_1.fastq.gz,reads_2.fastq.gz --themisto-alns pseudoalignments_1.aln.gz,pseudoalignments_2.aln.gz -o mGEMS-out --probs mSWEEP_probs.csv -a mSWEEP_abundances.txt --index themisto_index --write-assignment-table
mGEMS --groups group-3,group-4 -r reads_1.fastq.gz,reads_2.fastq.gz -i reference_grouping.txt --themisto-alns pseudoalignments_1.aln.gz,pseudoalignments_2.aln.gz -o mGEMS-out --probs mSWEEP_probs.csv -a mSWEEP_abundances.txt --index themisto_index --write-assignment-table
```

... or bin and write only the reads that are assigned to "group-3" or
"group-4" by adding the '--groups group-3,group-4' flag
```
mGEMS --groups group-3,group-4 -r reads_1.fastq.gz,reads_2.fastq.gz --themisto-alns pseudoalignments_1.aln.gz,pseudoalignments_2.aln.gz -o mGEMS-out --probs mSWEEP_probs.csv -a mSWEEP_abundances.txt --index themisto_index
mGEMS --groups group-3,group-4 -r reads_1.fastq.gz,reads_2.fastq.gz -i reference_grouping.txt --themisto-alns pseudoalignments_1.aln.gz,pseudoalignments_2.aln.gz -o mGEMS-out --probs mSWEEP_probs.csv -a mSWEEP_abundances.txt --index themisto_index
```

... write the reads that pseudoaligned to a reference sequence but were not assigned to any group by adding the `--write-unassigned` flag:
```
mGEMS --groups group-3,group-4 -r reads_1.fastq.gz,reads_2.fastq.gz --themisto-alns pseudoalignments_1.aln.gz,pseudoalignments_2.aln.gz -o mGEMS-out --probs mSWEEP_probs.csv -a mSWEEP_abundances.txt --index themisto_index --write-unassigned
mGEMS --groups group-3,group-4 -r reads_1.fastq.gz,reads_2.fastq.gz -i reference_grouping.txt --themisto-alns pseudoalignments_1.aln.gz,pseudoalignments_2.aln.gz -o mGEMS-out --probs mSWEEP_probs.csv -a mSWEEP_abundances.txt --index themisto_index --write-unassigned
```

Alternatively, find and write only the read bins for "group-3",
"group-4", and the reads that pseudoaligned but were not assigned to
any group; skipping extracting the reads
```
mGEMS bin --groups group-3,group-4 --themisto-alns pseudoalignments_1.aln.gz,pseudoalignments_2.aln.gz -o mGEMS-out --probs mSWEEP_probs.csv -a mSWEEP_abundances.txt --index themisto_index --write-unassigned
mGEMS bin --groups group-3,group-4 --themisto-alns pseudoalignments_1.aln.gz,pseudoalignments_2.aln.gz -i reference_grouping.txt -o mGEMS-out --probs mSWEEP_probs.csv -a mSWEEP_abundances.txt --index themisto_index --write-unassigned
```

... and extract the reads when feeling like it
Expand Down
21 changes: 20 additions & 1 deletion docs/TUTORIAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,14 @@ tar -zxvf mGEMS-ecoli-reference-v1.0.0.tar.gz
Create a *31*-mer pseudoalignment index with Themisto using two
threads and maximum 8192 megabytes of RAM.

__Themisto version v2.0.0 or newer__

```
mkdir mGEMS-ecoli-reference
mkdir mGEMS-ecoli-reference/tmp
themisto build -k 31 -i mGEMS-ecoli-reference-sequences-v1.0.0.fasta.gz -o mGEMS-ecoli-reference/index --temp-dir mGEMS-ecoli-reference/tmp --mem-megas 8192 --n-threads 2
```

__Themisto versions v0.1.1 to v1.2.0__

```
Expand Down Expand Up @@ -131,6 +139,17 @@ gzip $oldid""_2.fastq
### <a name="pseudoalignment"></a>Pseudoalignment
Align the mixed sample files against the index using two threads

__Themisto version v2.0.0 or newer__

```
for f1 in *_1.fastq.gz; do
f=${f1%_1.fastq.gz}
f2=$f""_2.fastq.gz
themisto pseudoalign -q $f1 -o $f""_1.aln -i mGEMS-ecoli-reference/index --temp-dir mGEMS-ecoli-reference/tmp --n-threads 2 --rc --sort-output --gzip-output
themisto pseudoalign -q $f2 -o $f""_2.aln -i mGEMS-ecoli-reference/index --temp-dir mGEMS-ecoli-reference/tmp --n-threads 2 --rc --sort-output --gzip-output
done
```

__Themisto versions v0.1.1 to v1.2.0__

```
Expand Down Expand Up @@ -160,7 +179,7 @@ Bin the reads with mGEMS and write the binned samples to the
while read line; do
id=$(echo $line | cut -f3 -d' ')
cluster=$(echo $line | cut -f2 -d' ')
mGEMS --groups $cluster -r $id""_1.fastq.gz,$id""_2.fastq.gz --themisto-alns $id""_1.aln.gz,$id""_2.aln.gz -o $id --probs $id/$id""_probs.csv.gz -a $id/$id""_abundances.txt --index mGEMS-ecoli-reference
mGEMS --groups $cluster -r $id""_1.fastq.gz,$id""_2.fastq.gz -i mGEMS-ecoli-reference-grouping-v1.0.0.txt --themisto-alns $id""_1.aln.gz,$id""_2.aln.gz -o $id --probs $id/$id""_probs.csv.gz -a $id/$id""_abundances.txt --index mGEMS-ecoli-reference
done < mixed_samples.tsv
```
Note that by default mGEMS creates bins for **all** reference lineages. If know
Expand Down

0 comments on commit 65aae01

Please sign in to comment.