Skip to content

aaronrockmenezes/LS-RPI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LSRPI

LSRPI (Lightweight Statistical and RPI-based Inference) is a Python package designed for building and evaluating machine learning models, with a focus on both classification and regression tasks. This package provides tools for data handling, model definition, training, testing, and a command-line interface for ease of use.

Table of Contents

Features

  • Modular Design: Clearly separated modules for data utilities, model definitions, and training/testing logic.
  • Classification & Regression Support: Built-in functionalities for both common machine learning paradigms.
  • Command-Line Interface (CLI): Easy interaction with the package for common tasks like testing.
  • Pretrained Models: Access to pretrained models for quick inference.
  • Custom Model Support: Ability to test custom models with specified configurations.

Installation

To install LSRPI, follow these steps:

  1. Clone the repository:
    git clone <repository_url_here>
    cd lsrpi
  2. Install in editable mode (for development):
    pip install -e .
    This will install the package and its dependencies, making the lsrpi command available in your terminal.

Testing installation

To test if the installation was successful, you can run the following command in your terminal:

(lsrpi-env) aaron@LSRPI_project:~/lsrpi$ lsrpi test --mode both

This will run the tests for both classification and regression modes. If everything is set up correctly, you should see output indicating that the tests passed successfully. If you encounter any issues, please check the installation steps and ensure all dependencies are correctly installed.

You can run the tests for classification and regression modes separately by using:

(lsrpi-env) aaron@LSRPI_project:~/lsrpi$ lsrpi test --mode classification
(lsrpi-env) aaron@LSRPI_project:~/lsrpi$ lsrpi test --mode regression

Usage

A command line utility, lsrpi, is provided for testing models. The utility supports both pretrained models and custom models.

(lsrpi-env) aaron@LSRPI_project:~/lsrpi$ lsrpi -h
usage: lsrpi [-h] {test,run} ...

LSRPI: A tool for predicting RNA-Protein interactions.

What it does:
- Creates maps showing how RNA and proteins interact with each other
- Works in two modes: classification (yes/no interaction) or regression (distance prediction)

What you need:
- MP3Vec files for proteins
- RNAVec files for RNA
- A CSV file listing your protein and RNA pairs (with columns 'prot_id' and 'rna_id')

The file names in your MP3Vec and RNAVec folders should match the IDs in your CSV.

Results:
- Files will be saved as .npy (numpy) arrays
- 2 sub directories will be created in the specified directory:
    A. 'npy_files' for the numpy arrays of outputs (thresholding is not applied)
    B. 'prediction_images' for visualizations of interaction or distance predictions
- Classification mode: files named 'imat_[protein]_[rna].npy', 'prot_y_[protein].npy', 'rna_y_[rna].npy'
- Regression mode: files named 'dist_[protein]_[rna].npy'

Usage example:
lsrpi -len 320 -m classification -o /path/to/save_dir -p /path/to/pretrained_model.pt -c /path/to/csv_file.csv -mp3vec /path/to/mp3vec_dir -rnavec /path/to/rnavec_dir

You can use our pre-trained models or specify your own with the -p option.

positional arguments:
  {test,run}  Available commands
    test      Run tests for LSRPI
    run       Run LSRPI inference

options:
  -h, --help  show this help message and exit

Generating MP3Vec and RNAVec files

To generate MP3Vec and RNAVec files, you need to prepare your protein and RNA sequences in the appropriate format. The package provides utilities to convert these sequences into vector representations. For generating of MP3Vec vectors, please use the mp3vec module as mentioned in text. Please note that MP3Vec is a separate package and needs to be installed separately.

For RNAVec, you can use the rnavec module as mentioned in text. Similar to MP3Vec, RNAVec is a separate package and needs to be installed separately.

Once you have the MP3Vec and RNAVec files ready, you can proceed to test your models using the lsrpi command.

CSV File Format

The CSV file should contain two columns: prot_id and rna_id, which represent the identifiers for the protein and RNA sequences, respectively. The file should look like this:

prot_id,rna_id
PROT1,RNA1
PROT2,RNA2
PROT3,RNA8
PROT1,RNA3

Ensure that the IDs in the CSV file match the filenames of the MP3Vec and RNAVec files in their respective directories. The filenames should be in the format prot_id.npy for protein vectors and rna_id.npy for RNA vectors.

Testing a pretrained model

To test a pretrained model, you can use the lsrpi command without the -p option. The package will automatically load the default pretrained model for the specified mode (classification or regression).

lsrpi -len 320 -m classification -o /path/to/save_dir -c /path/to/csv_file.csv -mp3vec /path/to/mp3vec_dir -rnavec /path/to/rnavec_dir

This command will save the results in the specified directory, creating two subdirectories: npy_files for the numpy arrays of outputs and prediction_images for visualizations of interaction or distance predictions.

Testing a custom model

To test a custom model, you can use the lsrpi command with the -p option to specify the path to your pretrained model file. Ensure that your model is compatible with the expected input format.

lsrpi -len 320 -m classification -o /path/to/save_dir -c /path/to/csv_file.csv -mp3vec /path/to/mp3vec_dir -rnavec /path/to/rnavec_dir -p /path/to/custom_model.pth

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages