This repository is dedicated to converting a Python-based spectral alignment algorithm into Rust for improved efficiency and performance when working with mass spectrometry (MS) data.
The original Python implementation (located in src/python/
) reads an .mgf
file containing multiple spectra and uses an indexing-based approach to efficiently compute pairwise alignments and cosine similarity scores between spectra. This process is essential for clustering, searching, or comparing MS/MS data.
To push the performance boundary further, this project focuses on re-implementing the same logic in Rust, leveraging Rust's speed and memory safety.
src/python/
: Contains the original Python source code.data/input_spectra/specs_ms_test.mgf
: Example input file in MGF format.data/output/
: Contains example outputs generated by the Python implementation.
Example input is provided in data/input_spectra/specs_ms_test.mgf
, containing MS/MS spectra in the common MGF format.
Example outputs are located in the data/output/
folder. These include the alignment results and cosine similarity scores produced by the Python implementation.
- Take a look at the Python code in
src/python/
to understand the logic. - Review the example input and output files to understand the expected functionality.
- Begin translating the core alignment and scoring logic into Rust.
- Feel free to structure your Rust code however you see fit.
You can run the original Python implementation with the following command:
python PATH_TO_PYTHON_SOURCE_CODE/spec_align_index.py -t PATH_TO_INPUT_FILES -o P
Replace the placeholders as needed:
PATH_TO_PYTHON_SOURCE_CODE
: Path to thesrc/python/
folderPATH_TO_INPUT_FILES
: Path to the.mgf
input file (e.g.,data/input_spectra/specs_ms_test.mgf
)PATH_TO_OUTPUT_FILE
: Desired path for saving the output resultsNUM_OF_THREADS
: Number of threads to use for parallel computation
Go crazy and see how far you can get.
The goal of this project is to reproduce and optimize the alignment and scoring algorithm in Rust. While maintaining correctness and consistency with the Python results, you are encouraged to:
- Explore performance improvements (e.g., faster execution, lower memory usage)
- Implement multi-threading or parallelism where appropriate
- Improve code structure and reusability (e.g., modular design, CLI interface, library support)
- Extend functionality or robustness where feasible
This is an open-ended task — use your best judgment and creativity to push the Rust implementation as far as possible.
Good luck 🚀