Skip to content

Wang-Bioinformatics-Lab/spec_align_index_rust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Python to Rust: Indexing-Based Spectral Alignment

This repository is dedicated to converting a Python-based spectral alignment algorithm into Rust for improved efficiency and performance when working with mass spectrometry (MS) data.

Overview

The original Python implementation (located in src/python/) reads an .mgf file containing multiple spectra and uses an indexing-based approach to efficiently compute pairwise alignments and cosine similarity scores between spectra. This process is essential for clustering, searching, or comparing MS/MS data.

To push the performance boundary further, this project focuses on re-implementing the same logic in Rust, leveraging Rust's speed and memory safety.

Project Structure

  • src/python/: Contains the original Python source code.
  • data/input_spectra/specs_ms_test.mgf: Example input file in MGF format.
  • data/output/: Contains example outputs generated by the Python implementation.

Input

Example input is provided in data/input_spectra/specs_ms_test.mgf, containing MS/MS spectra in the common MGF format.

Output

Example outputs are located in the data/output/ folder. These include the alignment results and cosine similarity scores produced by the Python implementation.

Getting Started

  1. Take a look at the Python code in src/python/ to understand the logic.
  2. Review the example input and output files to understand the expected functionality.
  3. Begin translating the core alignment and scoring logic into Rust.
  4. Feel free to structure your Rust code however you see fit.

How to Run the Python Version

You can run the original Python implementation with the following command:

python PATH_TO_PYTHON_SOURCE_CODE/spec_align_index.py -t PATH_TO_INPUT_FILES -o P

Replace the placeholders as needed:

  • PATH_TO_PYTHON_SOURCE_CODE: Path to the src/python/ folder
  • PATH_TO_INPUT_FILES: Path to the .mgf input file (e.g., data/input_spectra/specs_ms_test.mgf)
  • PATH_TO_OUTPUT_FILE: Desired path for saving the output results
  • NUM_OF_THREADS: Number of threads to use for parallel computation

Goal

Go crazy and see how far you can get.

The goal of this project is to reproduce and optimize the alignment and scoring algorithm in Rust. While maintaining correctness and consistency with the Python results, you are encouraged to:

  • Explore performance improvements (e.g., faster execution, lower memory usage)
  • Implement multi-threading or parallelism where appropriate
  • Improve code structure and reusability (e.g., modular design, CLI interface, library support)
  • Extend functionality or robustness where feasible

This is an open-ended task — use your best judgment and creativity to push the Rust implementation as far as possible.

Good luck 🚀

About

The GitHub repository for converting the Python-based indexing alignment for mass spectrometry data to Rust

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages