Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 37 additions & 13 deletions README.org → README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,21 @@
* proovframe: frame-shift correction for long read (meta)genomics
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/proovframe/README.html)
[![Anaconda-Server Badge](https://img.shields.io/conda/dn/bioconda/proovframe.svg?style=flat)](https://anaconda.org/bioconda/proovframe)
[![DOI](https://img.shields.io/badge/DOI-10.1101/2021.08.23.457338-blue)](https://doi.org/10.1101/2021.08.23.457338)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/proovframe/badges/license.svg)](https://anaconda.org/bioconda/proovframe)


proovframe: frame-shift correction for long read (meta)genomics
=========================================

Gene prediction on long reads, aka PacBio and Nanopore, is often impaired by
indels causing frameshift. Proovframe detects and corrects frameshifts in coding
sequences from raw long reads or long-read derived assemblies.

#+ATTR_HTML: :width 600px
[[file:implementation.png]]
<img src="implementation.png" width="1000px" height="600px" />

Proovframe uses frameshift-aware alignments to reference proteins as guides, and
conservatively restores frame-fidelity by 1/2-base deletions or insertions of
"N/NN"s, and masking of premature stops ("NNN").
`N/NN`s, and masking of premature stops (`NNN`).

Good results can already be obtained with distantly related guide proteins-
successfully tested with sets with <60% amino acid identity.
Expand All @@ -20,22 +26,40 @@ consensus-polishing approaches for assemblies.
It can be used on raw reads directly, which means it can be used on data lacking
sequencing depth for consensus polishing - a common problem for a lot of rare
things from environmental metagenomic samples, for example.


** Usage
## Install

### bioconda

Requires [[https://github.com/bbuchfink/diamond][DIAMOND v2.0.3]] or newer for mapping.
```
conda install -c bioconda proovframe
```

#+begin_src sh
# install
### Manual

Requires [DIAMOND v2.0.3](https://github.com/bbuchfink/diamond) or newer for mapping.

```
git clone https://github.com/thackl/proovframe
# map proteins to reads
```

It is ready to be used. The tool lives in `proovframe/bin/proovframe`


## Usage

map proteins to reads:
```
proovframe/bin/proovframe map -a proteins.faa -o raw-seqs.tsv raw-seqs.fa
# fix frameshifts in reads
```

fix frameshifts in reads:
```
proovframe/bin/proovframe fix -o corrected-seqs.fa raw-seqs.fa raw-seqs.tsv
#+end_src
```


** Citing
## Citing

If you use proovframe and DIAMOND please cite:

Expand Down