metagenome-atlas · google-labs-jules · Sep 8, 2025
diff --git a/docs/advanced/assembly.rst → docs/advanced/assembly.md b/docs/advanced/assembly.rst → docs/advanced/assembly.md
@@ -1,11 +1,9 @@
-Pre-Assambly-processing
-------------------------
+# Pre-Assambly-processing
 
-Normalization Parameters
-``````````````````````````
+## Normalization Parameters
 
 To improve assembly time and often assemblies themselves, coverage is
-normalized across kmers to a target depth and can be set using::
+normalized across kmers to a target depth and can be set using:
 
     # kmer length over which we calculated coverage
     normalization_kmer_length: 21
@@ -14,30 +12,22 @@ normalized across kmers to a target depth and can be set using::
     # reads must have at least this many kmers over min depth to be retained
     normalization_minimum_kmers: 8
 
+## Error Correction
 
-
-Error Correction
-``````````````````````````
-
-Optionally perform error correction using ``tadpole.sh`` from BBTools::
+Optionally perform error correction using `tadpole.sh` from BBTools:
 
     perform_error_correction: true
 
+# Assembly Parameters
 
+## Assembler
 
-Assembly Parameters
-------------------------
-
-
-Assembler
-``````````````````````````
-
-Currently, the supported assemblers are 'spades' and 'megahit' with the
-default setting of::
+Currently, the supported assemblers are \'spades\' and \'megahit\' with
+the default setting of:
 
     assembler: megahit
 
-Both assemblers have settings that can be altered in the configuration::
+Both assemblers have settings that can be altered in the configuration:
 
     # minimum multiplicity for filtering (k_min+1)-mers
     megahit_min_count: 2
@@ -58,11 +48,9 @@ Both assemblers have settings that can be altered in the configuration::
     # comma-separated list of k-mer sizes (must be odd and less than 128)
     spades_k: auto
 
+## Contig Filtering
 
-Contig Filtering
-``````````````````````````
-
-After assembly, contigs can be filtered based on several metrics::
+After assembly, contigs can be filtered based on several metrics:
 
     # Discard contigs with lower average coverage.
     minimum_average_coverage: 5

diff --git a/docs/advanced/index.md b/docs/advanced/index.md
@@ -0,0 +1,8 @@
+# Advanced Usage
+
+```{toctree}
+:maxdepth: 2
+
+assembly
+qc
+```
diff --git a/docs/advanced/qc.md b/docs/advanced/qc.md
@@ -0,0 +1,104 @@
+# Quality control of reads
+
+## Adapter Trimming
+
+FASTA file paths for adapter sequences to be trimmed from the sequence
+ends.
+
+We provide the adapter reference FASTA included in `bbmap`
+for various
+
+    preprocess_adapters: /database_dir/adapters.fa
+
+## Quality Trimming
+
+Trim regions with an average quality below this threshold. Higher is
+more stringent.
+
+    preprocess_minimum_base_quality: 10
+
+## Adapter Trimming at Read Tips
+
+Allow shorter kmer matches down to `mink` at the read ends.
+0 disables.
+
+    preprocess_adapter_min_k: 8
+
+## Allowable Mismatches in Adapter Hits
+
+Maximum number of substitutions between the target adapter kmer and the
+query sequence kmer. Lower is more stringent.
+
+    preprocess_allowable_kmer_mismatches: 1
+
+## Contaminant Kmer Length
+
+Kmer length used for finding contaminants. Contaminant matches shorter
+than this length will not be found.
+
+    preprocess_reference_kmer_match_length: 27
+
+## Read Length Threshold
+
+This is applied after quality and adapter trimming have been applied to
+the sequence.
+
+    preprocess_minimum_passing_read_length: 51
+
+## Sequence Complexity Filter
+
+Require this fraction of each nucleotide per sequence to eliminate low
+complexity reads.
+
+    preprocess_minimum_base_frequency: 0.05
+
+## Contamination Parameters
+
+Contamination reference sequences in the form of nucleotide FASTA files
+can be provided and filtered from the reads using the following
+parameters.
+
+If \'rRNA\' is defined, it will be added back to metagenomes but not to
+metatranscriptomes. Additional references can be added arbitrarily, such
+as:: :
+
+    contaminant_references:
+        rRNA: /database_dir/silva_rfam_all_rRNAs.fa
+        phiX: /database_dir/phiX174_virus.fa
+
+Don\'t look for indels longer than this:
+
+    contaminant_max_indel: 20
+
+Fraction of max alignment score required to keep a site:
+
+    contaminant_min_ratio: 0.65
+
+mapping kmer length; range 8-15; longer is faster but uses more memory;
+shorter is more sensitive:
+
+    contaminant_kmer_length: 12
+
+Minimum number of seed hits required for candidate sites:
+
+    contaminant_minimum_hits: 1
+
+Set behavior on ambiguously-mapped reads (with multiple top-scoring
+mapping locations):
+
+-   best (use the first best site)
+-   toss (consider unmapped, retain in reads for assembly)
+-   random (select one top-scoring site randomly)
+-   all (retain all top-scoring sites)
+
+    contaminant_ambiguous: best
+
+For host decontamination we suggest the following genomes, where
+contaminants and low complexity regions were masked.
+
+Many thanks to Brian Bushnell for providing the genomes of
+\[human\](<https://drive.google.com/file/d/0B3llHR93L14wd0pSSnFULUlhcUk/edit?resourcekey=0-PsIKmg2q4EvTGWGOUjsKGQ>),\[mouse\](<https://drive.google.com/file/d/0B3llHR93L14wYmJYNm9EbkhMVHM/view?resourcekey=0-jSsdejBncqPu4eiFfJvf1w>),
+\[dog\](<https://drive.google.com/file/d/0B3llHR93L14wTHdWRG55c2hPUXM/view?resourcekey=0-nJ2WQzTQYrTizK0pllVRZg>),
+and
+\[cat\](<https://drive.google.com/file/d/0B3llHR93L14wOXJhWXRlZjBpVUU/view?resourcekey=0-xxh33oYWp5FGBpRzobD_uw>).
+\[Source\](<https://www.seqanswers.com/forum/bioinformatics/bioinformatics-aa/37175-introducing-removehuman-human-contaminant-removal?p=286481#post286481>)
diff --git a/docs/advanced/qc.rst b/docs/advanced/qc.rst
diff --git a/docs/conf.py b/docs/conf.py
@@ -17,7 +17,7 @@
 # add these directories to sys.path here. If the directory is relative to the
 # documentation root, use os.path.abspath to make it absolute, like shown here.
 #
-# import os
+import os
 # import sys
 # sys.path.insert(0, os.path.abspath('.'))
 
@@ -36,8 +36,12 @@
     "sphinx.ext.todo",
     "sphinx.ext.viewcode",
     "sphinx.ext.napoleon",
+    "myst_parser",
+    "sphinx.ext.autosectionlabel",
 ]
 
+autosectionlabel_prefix_document = True
+
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ["_templates"]
 
@@ -69,12 +73,12 @@
 #
 # This is also used if you do content translation via gettext catalogs.
 # Usually you set "language" from the command line for these cases.
-language = None
+language = "en"
 
 # List of patterns, relative to source directory, that match files and
 # directories to ignore when looking for source files.
 # This patterns also effect to html_static_path and html_extra_path
-exclude_patterns = ["_build", "Thumbs.db", ".DS_Store", "old"]
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store", "old", os.path.abspath("../CHANGELOG.md")]
 
 # The name of the Pygments (syntax highlighting) style to use.
 pygments_style = "sphinx"

diff --git a/docs/index.md b/docs/index.md
@@ -0,0 +1,42 @@
+[![image](https://anaconda.org/bioconda/metagenome-atlas/badges/version.svg)](https://anaconda.org/bioconda/metagenome-atlas)
+
+[![image](https://img.shields.io/conda/dn/bioconda/metagenome-atlas.svg?label=Bioconda)](https://bioconda.github.io/recipes/metagenome-atlas/README.html)
+
+[![image](https://img.shields.io/twitter/follow/SilasKieser.svg?style=social&label=Follow)](https://twitter.com/search?f=tweets&q=%40SilasKieser%20%23metagenomeAtlas&src=typd)
+
+# Metagenome-Atlas
+
+![Metagenome-atlas logo](../resources/images/atlas_image.png)
+
+Metagenome-Atlas is a easy-to-use metagenomic pipeline based on
+[snakemake](https://snakemake.github.io/). It handles all steps from QC,
+Assembly, Binning, to Annotation.
+
+You can start using atlas with three commands:
+
+    mamba install -c bioconda -c conda-forge metagenome-atlas={latest_version}
+    atlas init --db-dir databases path/to/fastq/files
+    atlas run
+
+where `{latest_version}` should be replaced by
+
+[![image](https://anaconda.org/bioconda/metagenome-atlas/badges/version.svg)](https://anaconda.org/bioconda/metagenome-atlas)
+
+## Publication
+
+> ATLAS: a Snakemake workflow for assembly, annotation, and genomic
+> binning of metagenome sequence data. Kieser, S., Brown, J., Zdobnov,
+> E. M., Trajkovski, M. & McCue, L. A. BMC Bioinformatics 21, 257
+> (2020). doi:
+> [10.1186/s12859-020-03585-4](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03585-4)
+
+```{toctree}
+:maxdepth: 2
+:caption: Documentation
+
+usage/getting_started
+usage/output
+usage/configuration
+advanced/index
+usage/changelog
+```