Skip to content

Commit c27d7e0

Browse files
authored
fix: pip install mikado now works (#443)
* Update requiremnts.txt and environment.yml and fix scipy import issue. * update README
1 parent c8f6b7c commit c27d7e0

File tree

4 files changed

+48
-22
lines changed

4 files changed

+48
-22
lines changed

README.md

Lines changed: 39 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,25 +5,48 @@
55

66
# Mikado - pick your transcript: a pipeline to determine and select the best RNA-Seq prediction
77

8-
Mikado is a lightweight Python3 pipeline to identify the most useful or “best” set of transcripts from multiple transcript assemblies. Our approach leverages transcript assemblies generated by multiple methods to define expressed loci, assign a representative transcript and return a set of gene models that selects against transcripts that are chimeric, fragmented or with short or disrupted CDS. Loci are first defined based on overlap criteria and each transcript therein is scored based on up to 50 available metrics relating to ORF and cDNA size, relative position of the ORF within the transcript, UTR length and presence of multiple ORFs. Mikado can also utilize blast data to score transcripts based on proteins similarity and to identify and split chimeric transcripts. Optionally, junction confidence data as provided by [Portcullis] can be used to improve the assessment. The best-scoring transcripts are selected as the primary transcripts of their respective gene loci; additionally, Mikado can bring back other valid splice variants that are compatible with the primary isoform.
8+
Mikado is a lightweight Python3 pipeline to identify the most useful or “best” set of transcripts from multiple transcript assemblies. Our approach leverages transcript assemblies generated by multiple methods to define expressed loci, assign a representative transcript and return a set of gene models that selects against transcripts that are chimeric, fragmented or with short or disrupted CDS. Loci are first defined based on overlap criteria and each transcript therein is scored based on up to 50 available metrics relating to ORF and cDNA size, relative position of the ORF within the transcript, UTR length and presence of multiple ORFs. Mikado can also utilize blast data to score transcripts based on proteins similarity and to identify and split chimeric transcripts. Optionally, junction confidence data as provided by [Portcullis][Portcullis] can be used to improve the assessment. The best-scoring transcripts are selected as the primary transcripts of their respective gene loci; additionally, Mikado can bring back other valid splice variants that are compatible with the primary isoform.
99

10-
Mikado uses GTF or GFF files as mandatory input. Non-mandatory but highly recommended input data can be generated by obtaining a set of reliable splicing junctions with Portcullis_, by locating coding ORFs on the transcripts using either [Transdecoder] or [Prodigal], and by obtaining homology information through either [BLASTX][Blast+] or [DIAMOND].
10+
Mikado uses GTF or GFF files as mandatory input. Non-mandatory but highly recommended input data can be generated by obtaining a set of reliable splicing junctions with Portcullis_, by locating coding ORFs on the transcripts using either [Transdecoder][Transdecoder] or [Prodigal][Prodigal], and by obtaining homology information through either [BLASTX][Blast+] or [DIAMOND][DIAMOND].
1111

1212
Our approach is amenable to include sequences generated by *de novo* Illumina assemblers or reads generated from long read technologies such as Pacbio.
1313

1414
Extended documentation is hosted on ReadTheDocs: http://mikado.readthedocs.org/
1515

1616
## Installation
1717

18-
Mikado can be installed from PyPI with pip:
18+
Using mamba
1919

20-
```pip3 install mikado```
20+
download mamba using pip
21+
22+
```bash
23+
pip install mamba=0.27.0
24+
```
25+
26+
Create a mamba environment using the environment.yml file
27+
28+
```bash
29+
mamba env create -f environment.yml
30+
conda activate mikado2
31+
```
32+
33+
Check and run mikado
34+
35+
```bash
36+
mikado --help
37+
```
38+
39+
40+
41+
Mikado can also be be installed from PyPI with pip (**deprecated**):
42+
43+
``pip3 install mikado``
2144

2245
Alternatively, you can clone the repository from source and install with:
2346

2447
pip wheel -w dist .
25-
pip install dist/*whl
26-
48+
pip install dist/*whl
49+
2750
You can verify the correctness of the installation with the unit tests (*outside of the source folder*, as otherwise Python will get confused and try to use the `Mikado` source folder instead of the system installation):
2851

2952
python -c "import Mikado; Mikado.test(); Mikado.test(label='slow')"
@@ -40,29 +63,29 @@ The steps above will ensure that any additional python dependencies will be inst
4063
### Additional dependencies
4164

4265
Mikado by itself does require only the presence of a database solution, such as SQLite (although we do support MySQL and PostGRESQL as well).
43-
However, the Daijin pipeline requires additional programs to run.
66+
However, the Daijin pipeline requires additional programs to run.
4467

4568
For driving Mikado through Daijin, the following programs are required:
4669

47-
- [DIAMOND] or [Blast+] to provide protein homology. DIAMOND is preferred for its speed.
48-
- [Prodigal] or [Transdecoder] to calculate ORFs. The versions of Transdecoder that we tested scale poorly in terms of runtime and disk usage, depending on the size of the input dataset. Prodigal is much faster and lighter, however, the data on our paper has been generated through Transdecoder - not Prodigal. Currently we set Prodigal as default.
49-
- Mikado also makes use of a dataset of RNA-Seq high-quality junctions. We are using [Portcullis] to calculate this data alongside the alignments and assemblies.
70+
- [DIAMOND][DIAMOND] or [Blast+][Blast+] to provide protein homology. DIAMOND is preferred for its speed.
71+
- [Prodigal][Prodigal] or [Transdecoder][Transdecoder] to calculate ORFs. The versions of Transdecoder that we tested scale poorly in terms of runtime and disk usage, depending on the size of the input dataset. Prodigal is much faster and lighter, however, the data on our paper has been generated through Transdecoder - not Prodigal. Currently we set Prodigal as default.
72+
- Mikado also makes use of a dataset of RNA-Seq high-quality junctions. We are using [Portcullis][Portcullis] to calculate this data alongside the alignments and assemblies.
5073

5174
If you plan to generate the alignment and assembly part as well through Daijin, the pipeline requires the following:
5275

5376
- SAMTools
5477
- If you have short-read RNA-Seq data:
55-
- At least one short-read RNA-Seq aligner, choice between [GSNAP], [GMAP], [STAR], [TopHat2], [HISAT2]
56-
- At least one RNA-Seq assembler, choice between [StringTie], [Trinity], [Cufflinks], [CLASS2]. Trinity additionally requires [GMAP].
57-
- [Portcullis] is optional, but highly recommended to retrieve high-quality junctions from the data
78+
- At least one short-read RNA-Seq aligner, choice between [GSNAP], [GMAP][GMAP], [STAR][STAR], [TopHat2][TopHat2], [HISAT2][HISAT2]
79+
- At least one RNA-Seq assembler, choice between [StringTie][StringTie], [Trinity][Trinity], [Cufflinks], [CLASS2][CLASS2]. Trinity additionally requires [GMAP][GMAP].
80+
- [Portcullis][Portcullis] is optional, but highly recommended to retrieve high-quality junctions from the data
5881
- If you have long-read RNA-Seq data:
59-
- At least one long-read RNA-Seq aligner, current choice between [STAR] and [GMAP]
82+
- At least one long-read RNA-Seq aligner, current choice between [STAR][STAR] and [GMAP][GMAP]
6083

6184
## Development guide
6285

6386
We provide source trail files ([https://www.sourcetrail.com/](https://www.sourcetrail.com/)) to aid in development.
6487
As required by the SourceTrail application, these files are present in the master directory, as "Mikado.srctrl*".
65-
88+
6689
## Citing Mikado
6790

6891
If you use Mikado in your work, please consider to cite:
@@ -84,4 +107,4 @@ If you also use Portcullis to provide reliable junctions to Mikado, either indep
84107
[HISAT2]: http://ccb.jhu.edu/software/hisat2
85108
[StringTie]: https://ccb.jhu.edu/software/stringtie/
86109
[Trinity]: https://github.com/trinityrnaseq/trinityrnaseq
87-
[CLASS2]: http://ccb.jhu.edu/people/florea/research/CLASS2/
110+
[CLASS2]: http://ccb.jhu.edu/people/florea/research/CLASS2/

environment.yml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,13 @@ dependencies:
1919
- samtools>=1.11
2020
- htslib>=1.11
2121
- pysam>=0.15.3
22-
- defaults::python>=3.6
22+
- python>=3.6,<3.10
2323
- pyyaml>=5.1.2
2424
- scipy>=1.3.1
25+
- gmap==2021.08.25
2526
- snakemake-minimal>=5.7.0
26-
- sqlalchemy>=1.4.0
27-
- sqlalchemy-utils>=0.34.1
27+
- sqlalchemy>1.4.0,<2
28+
- sqlalchemy-utils==0.34.1
2829
- sqlite
2930
- tabulate>=0.8.5
3031
- wheel

requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ pysam>=0.15.3
1111
pyyaml>=5.1.2
1212
scipy>=1.3.1
1313
snakemake>=5.7.0
14-
sqlalchemy>=1.4.0
15-
sqlalchemy-utils>=0.37
14+
sqlalchemy>1.4.0,<2
15+
sqlalchemy-utils==0.37
1616
tabulate>=0.8.5
1717
pytest>=5.4.1
1818
python-rapidjson>=1.0.0

setup.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,10 @@
1212
import re
1313
import sys
1414
import numpy as np
15-
from scipy._build_utils import numpy_nodepr_api
1615

16+
### See comment here https://github.com/cython/cython/issues/2498
17+
numpy_nodepr_api = dict(define_macros=[("NPY_NO_DEPRECATED_API",
18+
"NPY_1_9_API_VERSION")])
1719

1820
here = path.abspath(path.dirname("__file__"))
1921

0 commit comments

Comments
 (0)