Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 12 additions & 24 deletions docs/advanced/assembly.rst → docs/advanced/assembly.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
Pre-Assambly-processing
------------------------
# Pre-Assambly-processing

Normalization Parameters
``````````````````````````
## Normalization Parameters

To improve assembly time and often assemblies themselves, coverage is
normalized across kmers to a target depth and can be set using::
normalized across kmers to a target depth and can be set using:

# kmer length over which we calculated coverage
normalization_kmer_length: 21
Expand All @@ -14,30 +12,22 @@ normalized across kmers to a target depth and can be set using::
# reads must have at least this many kmers over min depth to be retained
normalization_minimum_kmers: 8

## Error Correction


Error Correction
``````````````````````````

Optionally perform error correction using ``tadpole.sh`` from BBTools::
Optionally perform error correction using `tadpole.sh` from BBTools:

perform_error_correction: true

# Assembly Parameters

## Assembler

Assembly Parameters
------------------------


Assembler
``````````````````````````

Currently, the supported assemblers are 'spades' and 'megahit' with the
default setting of::
Currently, the supported assemblers are \'spades\' and \'megahit\' with
the default setting of:

assembler: megahit

Both assemblers have settings that can be altered in the configuration::
Both assemblers have settings that can be altered in the configuration:

# minimum multiplicity for filtering (k_min+1)-mers
megahit_min_count: 2
Expand All @@ -58,11 +48,9 @@ Both assemblers have settings that can be altered in the configuration::
# comma-separated list of k-mer sizes (must be odd and less than 128)
spades_k: auto

## Contig Filtering

Contig Filtering
``````````````````````````

After assembly, contigs can be filtered based on several metrics::
After assembly, contigs can be filtered based on several metrics:

# Discard contigs with lower average coverage.
minimum_average_coverage: 5
Expand Down
8 changes: 8 additions & 0 deletions docs/advanced/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Advanced Usage

```{toctree}
:maxdepth: 2

assembly
qc
```
104 changes: 104 additions & 0 deletions docs/advanced/qc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Quality control of reads

## Adapter Trimming

FASTA file paths for adapter sequences to be trimmed from the sequence
ends.

We provide the adapter reference FASTA included in `bbmap`
for various

preprocess_adapters: /database_dir/adapters.fa

## Quality Trimming

Trim regions with an average quality below this threshold. Higher is
more stringent.

preprocess_minimum_base_quality: 10

## Adapter Trimming at Read Tips

Allow shorter kmer matches down to `mink` at the read ends.
0 disables.

preprocess_adapter_min_k: 8

## Allowable Mismatches in Adapter Hits

Maximum number of substitutions between the target adapter kmer and the
query sequence kmer. Lower is more stringent.

preprocess_allowable_kmer_mismatches: 1

## Contaminant Kmer Length

Kmer length used for finding contaminants. Contaminant matches shorter
than this length will not be found.

preprocess_reference_kmer_match_length: 27

## Read Length Threshold

This is applied after quality and adapter trimming have been applied to
the sequence.

preprocess_minimum_passing_read_length: 51

## Sequence Complexity Filter

Require this fraction of each nucleotide per sequence to eliminate low
complexity reads.

preprocess_minimum_base_frequency: 0.05

## Contamination Parameters

Contamination reference sequences in the form of nucleotide FASTA files
can be provided and filtered from the reads using the following
parameters.

If \'rRNA\' is defined, it will be added back to metagenomes but not to
metatranscriptomes. Additional references can be added arbitrarily, such
as:: :

contaminant_references:
rRNA: /database_dir/silva_rfam_all_rRNAs.fa
phiX: /database_dir/phiX174_virus.fa

Don\'t look for indels longer than this:

contaminant_max_indel: 20

Fraction of max alignment score required to keep a site:

contaminant_min_ratio: 0.65

mapping kmer length; range 8-15; longer is faster but uses more memory;
shorter is more sensitive:

contaminant_kmer_length: 12

Minimum number of seed hits required for candidate sites:

contaminant_minimum_hits: 1

Set behavior on ambiguously-mapped reads (with multiple top-scoring
mapping locations):

- best (use the first best site)
- toss (consider unmapped, retain in reads for assembly)
- random (select one top-scoring site randomly)
- all (retain all top-scoring sites)

contaminant_ambiguous: best

For host decontamination we suggest the following genomes, where
contaminants and low complexity regions were masked.

Many thanks to Brian Bushnell for providing the genomes of
\[human\](<https://drive.google.com/file/d/0B3llHR93L14wd0pSSnFULUlhcUk/edit?resourcekey=0-PsIKmg2q4EvTGWGOUjsKGQ>),\[mouse\](<https://drive.google.com/file/d/0B3llHR93L14wYmJYNm9EbkhMVHM/view?resourcekey=0-jSsdejBncqPu4eiFfJvf1w>),
\[dog\](<https://drive.google.com/file/d/0B3llHR93L14wTHdWRG55c2hPUXM/view?resourcekey=0-nJ2WQzTQYrTizK0pllVRZg>),
and
\[cat\](<https://drive.google.com/file/d/0B3llHR93L14wOXJhWXRlZjBpVUU/view?resourcekey=0-xxh33oYWp5FGBpRzobD_uw>).
\[Source\](<https://www.seqanswers.com/forum/bioinformatics/bioinformatics-aa/37175-introducing-removehuman-human-contaminant-removal?p=286481#post286481>)
127 changes: 0 additions & 127 deletions docs/advanced/qc.rst

This file was deleted.

10 changes: 7 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))

Expand All @@ -36,8 +36,12 @@
"sphinx.ext.todo",
"sphinx.ext.viewcode",
"sphinx.ext.napoleon",
"myst_parser",
"sphinx.ext.autosectionlabel",
]

autosectionlabel_prefix_document = True

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

Expand Down Expand Up @@ -69,12 +73,12 @@
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
language = "en"

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store", "old"]
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store", "old", os.path.abspath("../CHANGELOG.md")]

# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "sphinx"
Expand Down
42 changes: 42 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
[![image](https://anaconda.org/bioconda/metagenome-atlas/badges/version.svg)](https://anaconda.org/bioconda/metagenome-atlas)

[![image](https://img.shields.io/conda/dn/bioconda/metagenome-atlas.svg?label=Bioconda)](https://bioconda.github.io/recipes/metagenome-atlas/README.html)

[![image](https://img.shields.io/twitter/follow/SilasKieser.svg?style=social&label=Follow)](https://twitter.com/search?f=tweets&q=%40SilasKieser%20%23metagenomeAtlas&src=typd)

# Metagenome-Atlas

![Metagenome-atlas logo](../resources/images/atlas_image.png)

Metagenome-Atlas is a easy-to-use metagenomic pipeline based on
[snakemake](https://snakemake.github.io/). It handles all steps from QC,
Assembly, Binning, to Annotation.

You can start using atlas with three commands:

mamba install -c bioconda -c conda-forge metagenome-atlas={latest_version}
atlas init --db-dir databases path/to/fastq/files
atlas run

where `{latest_version}` should be replaced by

[![image](https://anaconda.org/bioconda/metagenome-atlas/badges/version.svg)](https://anaconda.org/bioconda/metagenome-atlas)

## Publication

> ATLAS: a Snakemake workflow for assembly, annotation, and genomic
> binning of metagenome sequence data. Kieser, S., Brown, J., Zdobnov,
> E. M., Trajkovski, M. & McCue, L. A. BMC Bioinformatics 21, 257
> (2020). doi:
> [10.1186/s12859-020-03585-4](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03585-4)

```{toctree}
:maxdepth: 2
:caption: Documentation

usage/getting_started
usage/output
usage/configuration
advanced/index
usage/changelog
```
Loading
Loading