DemuxHMM

DemuxHMM is a highly performant demultiplexing method for scRNA-seq and other omics. Under the hood, it uses a Hidden Markov Model to capture structure on SNPs when individuals are created with a breeding scheme. DemuxHMM is most powerful with this structure, but it can also work on unstructured datasets. For fast computation, we leverage parallelization on GPU to achieve significantly faster performance than competing methods on large datasets. The manuscript for the method can be seen on bioRxiv. Please cite us if you use the code in your work.

Installation

We offer two installation paths, depending on whether you want to run the benchmarks presented in the paper or not. For simple installation, clone this repo and install it with pip:

git clone https://github.com/antonafana/DemuxHMM && cd DemuxHMM
pip install .

If you want to run benchmarks from the paper, install the optional dependencies with:

pip install .[paper]

Usage

As input, the method expects an array of single-cell variant call matrices A and D. Currently, these should be formatted as an array of length corresponding to the number of chromosomes. A[i] and D[i] should numpy matrices of shape (n_cells, n_snps_chromosome_i) containing allele counts and overall depth for each cell and snp position combination on chromosome i. Cells should already be quality controlled and cleaned of doublets. We suggest CellSNP-lite or similar for variant calling. The method also requires an upper bound on the number of individuals, and an estimate on the number of state changes per chromosome (ex. homozygous reference -> heterozygous). We suggest repeating inference multiple times and using the run with the best model probability, like the below example:

import numpy as np
from demuxHMM import HMMModel

num_repeats = 30
tolerance = 1
num_individuals = 100 
mean_transitions = 10
best_model_score = -np.inf
best_model = None

for _ in range(num_repeats):
    model = HMMModel(A, D, num_individuals, mean_transitions)
    model.solve(tolerance, num_inits=0)
    model_out = model.get_output()

    if model_out['probability'] > best_model_score:
        best_model_score = model_out['probability']
        best_model = model_out

For more complex examples and practical applications, please see the "paper" directory.

Requirements and Future Changes

Currently, the model's GPU mode requires a CUDA capable GPU with 9GB+ of VRAM. It is a high priority for us to release a VRAM-scalable version. The model can be run on CPU, but is slower. We also plan to release several quality-of-life improvements to the API to simplify usage. If you have any suggestions, or notice any problems, please open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
demuxHMM		demuxHMM
paper		paper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DemuxHMM

Installation

Usage

Requirements and Future Changes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DemuxHMM

Installation

Usage

Requirements and Future Changes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages