MutClust is a Python package designed for RNA-seq gene coexpression analyses. It performs mutual rank (MR)-based clustering of coexpressed genes and identifies enriched Gene Ontology (GO) terms for the resulting clusters. The package is optimized for speed, able to run a whole-genome coexpression analysis in minutes.
- Mutual Rank Analysis: Calculates MR from Pearson correlation coefficients to identify coexpressed genes.
- Leiden Clustering: Groups genes into clusters based on mutual rank and exponential decay weights.
- Gene Annotations: Merge cluster members with gene annotations, if provided.
- GO Enrichment Analysis: Identifies enriched GO terms for each cluster using GOATOOLS.
- Highly Configurable: Supports adjustable thresholds, resolution parameters, and multi-threading for performance optimization.
- Calculate correlation matrix and mutual rank from RNA-seq data
- Filter and apply exponential decay to mutual rank values
- Perform Leiden clustering to identify co-expressed gene clusters
- Calculate eigen-genes for each cluster (first principal component)
- Perform GO enrichment analysis on gene clusters
- Annotate clusters with gene information
You can install MutClust directly from PyPI:
pip install mutclust
Note: Because of a known dependency issue with PyNetCor, MutClust is not currently available on MacOS through PyPI but installs properly on Linux.
Alternatively, you can clone the repository and install it locally:
git clone https://github.com/eporetsky/mutclust.git
cd mutclust
pip install .
You can optionally use a conda environment for easier dependency management. This is especially useful for installing clusterone
(required for some workflows) from bioconda:
conda env create -f environment.yml
conda activate mutclust
This will install all core dependencies, bioconda::clusterone
, and set up MutClust in editable mode. You can still update your code and use the CLI immediately.
For users who prefer containerized deployment, MutClust is available as a Docker container:
# Build the container
docker build -t mutclust .
# Run MutClust with your data
docker run -v /path/to/your/data:/data mutclust mutclust mr -i /data/your_expression.tsv -o /data/results
The container uses Ubuntu 20.04 and includes all necessary dependencies. Mount your data directory to /data
inside the container to access your files.
MutClust now provides a Click-based command-line interface (CLI) with three main subcommands:
mutclust mr
: Calculate mutual rank from an expression datasetmutclust cls
: Run clustering analysis on a given MR tablemutclust enr
: Run GO enrichment analysis on clusters
# Calculate mutual rank from expression data
mutclust mr -i input.tsv -o output_prefix
# Run clustering analysis on a mutual rank table
mutclust cls -i output_prefix.mrs.tsv -o output_prefix
# Run GO enrichment analysis on clusters
mutclust enr -c output_prefix.clusters.tsv -go go-basic.obo -gf tair.gaf -o output_prefix
Argument | Short | Description | Default |
---|---|---|---|
--input |
-i |
Path to the RNA-seq dataset (TSV format). | Required |
--output |
-o |
Output prefix for the results. | Required |
--mr-threshold |
-m |
Mutual rank threshold for filtering. | 100 |
--e-value |
-e |
Exponential decay constant. | 10 |
--threads |
-t |
Number of threads for correlation calculation. | 4 |
--save-intermediate |
Save intermediate files (PCC, MR, filtered pairs). | Optional |
Argument | Short | Description | Default |
---|---|---|---|
--input |
-i |
Path to Mutual Rank (MR) table (TSV format). | Required |
--output |
-o |
Output prefix for the results. | Required |
--annotations |
-a |
Path to the gene annotation file. | Optional |
--resolution |
-r |
Resolution parameter for Leiden clustering. | 0.1 |
--eigengene/--no-eigengene |
Calculate eigen-genes for clusters. | True |
|
--expression |
Path to RNA-seq dataset for eigen-gene calculation. | Required if --eigengene |
Argument | Short | Description | Default |
---|---|---|---|
--clusters |
-c |
Path to clusters file (TSV format). | Required |
--go-obo |
-go |
Path to the Gene Ontology (GO) OBO file. | Required |
--go-gaf |
-gf |
Path to the GO annotation file (GAF format). | Required |
--output |
-o |
Output prefix for the results. | Required |
--expression |
Path to RNA-seq dataset for background gene set. | Optional |
# Step 1: Calculate mutual rank
tab="data/AtCol-0.cpm.tsv"
mutclust mr -i $tab -o results/atcol0
# Step 2: Cluster genes
mutclust cls -i results/atcol0.mrs.tsv -o results/atcol0 --annotations annotations/AtCol-0.annot.tsv --expression $tab
# Step 3: GO enrichment
mutclust enr -c results/atcol0.clusters.tsv -go go-basic.obo -gf tair.gaf -o results/atcol0 --expression $tab
- Format: Tab-separated values (TSV).
- Columns: Gene IDs as row indices and samples as columns.
- Example:
geneID Sample1 Sample2 Sample3
GeneA 1.23 2.34 3.45
GeneB 4.56 5.67 6.78
- Format: Tab-separated values (TSV).
- Columns:
geneID
and additional annotation fields. - Example:
geneID description
GeneA Photosynthesis-related protein
GeneB Transcription factor
- Description: The Gene Ontology (GO) OBO file contains the ontology structure.
- Source: Download from Gene Ontology.
- Description: The Gene Annotation File (GAF) maps genes to GO terms.
- Source: Download from Gene Ontology.
-
Filtered MR and e-values (
<output_prefix>.mrs.tsv
):- Lists of coexpressed genes with MR and e-values.
- Columns:
Gene1
,Gene2
,MR
,ED
.
Example:
Gene1 Gene2 MR ED GeneA GeneB 10.2 0.39 GeneB GeneC 6 0.6
-
Clustered Genes (
<output_prefix>.clusters.tsv
):- Lists genes in each cluster.
- Annotation columns if provided.
- Columns:
clusterID
,geneID
.
Example:
clusterID geneID Annotations c1 GeneA ... c1 GeneB ...
-
GO Enrichment Results (
<output_prefix>_go_enrichment_results.tsv
):- Contains enriched GO terms for each cluster.
- Columns:
cluster
,type
,size
,term
,p-val
,FC
,desc
.
Example:
cluster type size term p-val FC desc c1 BP 25 GO:0008150 0.00123 3.5 Biological Process
-
Eigen-gene values (
<output_prefix>.eigen.tsv
):- Eigen-gene values for each cluster.
- Columns:
geneID
and sample columns.
Example:
geneID Sample1 Sample2 Sample3 c1 0.707107 0.707107 0.707107 c2 0.577350 0.577350 0.577350 c3 0.500000 0.500000 0.500000
The following Python libraries are required and will be installed automatically:
numpy
pandas
pynetcor
python-igraph
goatools
scikit-learn
click
Other dependencies (such as clusterone
) can be installed via conda/bioconda as needed.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions, suggestions and issues are welcome!