LSRPI (Lightweight Statistical and RPI-based Inference) is a Python package designed for building and evaluating machine learning models, with a focus on both classification and regression tasks. This package provides tools for data handling, model definition, training, testing, and a command-line interface for ease of use.
- Modular Design: Clearly separated modules for data utilities, model definitions, and training/testing logic.
- Classification & Regression Support: Built-in functionalities for both common machine learning paradigms.
- Command-Line Interface (CLI): Easy interaction with the package for common tasks like testing.
- Pretrained Models: Access to pretrained models for quick inference.
- Custom Model Support: Ability to test custom models with specified configurations.
To install LSRPI, follow these steps:
- Clone the repository:
git clone <repository_url_here> cd lsrpi
- Install in editable mode (for development):
This will install the package and its dependencies, making the
pip install -e .lsrpicommand available in your terminal.
To test if the installation was successful, you can run the following command in your terminal:
(lsrpi-env) aaron@LSRPI_project:~/lsrpi$ lsrpi test --mode bothThis will run the tests for both classification and regression modes. If everything is set up correctly, you should see output indicating that the tests passed successfully. If you encounter any issues, please check the installation steps and ensure all dependencies are correctly installed.
You can run the tests for classification and regression modes separately by using:
(lsrpi-env) aaron@LSRPI_project:~/lsrpi$ lsrpi test --mode classification
(lsrpi-env) aaron@LSRPI_project:~/lsrpi$ lsrpi test --mode regressionA command line utility, lsrpi, is provided for testing models. The utility supports both pretrained models and custom models.
(lsrpi-env) aaron@LSRPI_project:~/lsrpi$ lsrpi -h
usage: lsrpi [-h] {test,run} ...
LSRPI: A tool for predicting RNA-Protein interactions.
What it does:
- Creates maps showing how RNA and proteins interact with each other
- Works in two modes: classification (yes/no interaction) or regression (distance prediction)
What you need:
- MP3Vec files for proteins
- RNAVec files for RNA
- A CSV file listing your protein and RNA pairs (with columns 'prot_id' and 'rna_id')
The file names in your MP3Vec and RNAVec folders should match the IDs in your CSV.
Results:
- Files will be saved as .npy (numpy) arrays
- 2 sub directories will be created in the specified directory:
A. 'npy_files' for the numpy arrays of outputs (thresholding is not applied)
B. 'prediction_images' for visualizations of interaction or distance predictions
- Classification mode: files named 'imat_[protein]_[rna].npy', 'prot_y_[protein].npy', 'rna_y_[rna].npy'
- Regression mode: files named 'dist_[protein]_[rna].npy'
Usage example:
lsrpi -len 320 -m classification -o /path/to/save_dir -p /path/to/pretrained_model.pt -c /path/to/csv_file.csv -mp3vec /path/to/mp3vec_dir -rnavec /path/to/rnavec_dir
You can use our pre-trained models or specify your own with the -p option.
positional arguments:
{test,run} Available commands
test Run tests for LSRPI
run Run LSRPI inference
options:
-h, --help show this help message and exitTo generate MP3Vec and RNAVec files, you need to prepare your protein and RNA sequences in the appropriate format. The package provides utilities to convert these sequences into vector representations.
For generating of MP3Vec vectors, please use the mp3vec module as mentioned in text. Please note that MP3Vec is a separate package and needs to be installed separately.
For RNAVec, you can use the rnavec module as mentioned in text. Similar to MP3Vec, RNAVec is a separate package and needs to be installed separately.
Once you have the MP3Vec and RNAVec files ready, you can proceed to test your models using the lsrpi command.
The CSV file should contain two columns: prot_id and rna_id, which represent the identifiers for the protein and RNA sequences, respectively. The file should look like this:
prot_id,rna_id
PROT1,RNA1
PROT2,RNA2
PROT3,RNA8
PROT1,RNA3
Ensure that the IDs in the CSV file match the filenames of the MP3Vec and RNAVec files in their respective directories. The filenames should be in the format prot_id.npy for protein vectors and rna_id.npy for RNA vectors.
To test a pretrained model, you can use the lsrpi command without the -p option. The package will automatically load the default pretrained model for the specified mode (classification or regression).
lsrpi -len 320 -m classification -o /path/to/save_dir -c /path/to/csv_file.csv -mp3vec /path/to/mp3vec_dir -rnavec /path/to/rnavec_dirThis command will save the results in the specified directory, creating two subdirectories: npy_files for the numpy arrays of outputs and prediction_images for visualizations of interaction or distance predictions.
To test a custom model, you can use the lsrpi command with the -p option to specify the path to your pretrained model file. Ensure that your model is compatible with the expected input format.
lsrpi -len 320 -m classification -o /path/to/save_dir -c /path/to/csv_file.csv -mp3vec /path/to/mp3vec_dir -rnavec /path/to/rnavec_dir -p /path/to/custom_model.pthThis project is licensed under the MIT License. See the LICENSE file for details.