The monorepo provides a grammar based algorithm to predict the pseudoknots patters of the secondary structure of any RNA sequence.
Authors: Andrikos Christos, Makris Evaggelos, Pavlatos Christos, Rassias Georgios, Aggelos Kolaitis
The algorighm consits of the two following steps:
1. Parse mulitple RNA subsequences to define the potenital pseudoknot structures
2. Choose the pseudoknot structure that is considered to be the most stable one. This steps is based on the concept of energy minimization concept.
The core algorithm was initially implemented in python based on the wide-known NLTK package. Due to serious performance issues we moved the parsing into c utilizing the yaep parser which is able to parse ambient grammars. The operation goes as it follows:
1.
2.
2.
2.
2.
2.
Compare prediction dot bracket with ground truth. Create confusion matrix
| Definition | Description |
|---|---|
| true positive | Ground truth has a stem here, and I have correctly found that stem (matching the pair, TODO: distance +- 1) |
| true negative | Ground truth does not have a stem, and I do not have a stem |
| false positive | I have predicted a stem here, but there is no stem in ground truth |
| false negative | I have predicted no stem, but ground truth has a stem here |
Building the code consists of 2 parts: Setting up a Python 3 virtual environment and building the C parser library.
$ make deps # install package dependencies
$ make # build the parser and setup the virtual environment at ./.venvRun for all cases and save result to result.yaml:
$ ./.venv/bin/rna_benchmark --cases cases.yaml --grammar ./libpseudoknot.so --max-dd-size 2 --max-stem-allow-smaller 1 --allow-ug --prune-early > result.yamlFor a single sequence. See --help for a complete list of options:
$ ./.venv/bin/rna_analysis --grammar ./libpseudoknot.so AAAAAACUAAUAGAGGGGGGACUUAGCGCCCCCCAAACCGUAACCCCRun benchmark for a number of cases, print output in YAML file. See cases.yaml for an example YAML file:
$ ./.venv/bin/rna_benchmark --grammar ./libpseudoknot.so [OPTIONS] --cases cases.yaml > results.yamlActivate virtual environment and run with Pytest:
$ ./.venv/bin/pytest- inclusive_start_index = left_core_inner + left_loop_stems + 1
- unused_right_loop_size = right_core_outer - inclusive_start_index
- inclusive_end_index = right_core_outer - 1
- unused_left_loop_size = left_loop_size - right_core_stems
- inclusive_start_index = left_core_outer + 1
- inclusive_end_index = inclusive_start_index + left_loop_size - right_core_stems - 1