MutClust: Mutual Rank-Based Clustering and GO Enrichment Analysis

MutClust is a Python package designed for RNA-seq gene coexpression analyses. It performs mutual rank (MR)-based clustering of coexpressed genes and identifies enriched Gene Ontology (GO) terms for the resulting clusters. The package is optimized for speed, able to run a whole-genome coexpression analysis in minutes.

Features

Mutual Rank Analysis: Calculates MR from Pearson correlation coefficients to identify coexpressed genes.
Leiden Clustering: Groups genes into clusters based on mutual rank and exponential decay weights.
Gene Annotations: Merge cluster members with gene annotations, if provided.
GO Enrichment Analysis: Identifies enriched GO terms for each cluster using GOATOOLS.
Highly Configurable: Supports adjustable thresholds, resolution parameters, and multi-threading for performance optimization.
Calculate correlation matrix and mutual rank from RNA-seq data
Filter and apply exponential decay to mutual rank values
Perform Leiden clustering to identify co-expressed gene clusters
Calculate eigen-genes for each cluster (first principal component)
Perform GO enrichment analysis on gene clusters
Annotate clusters with gene information

Installation

You can install MutClust directly from PyPI:

pip install mutclust

Note: Because of a known dependency issue with PyNetCor, MutClust is not currently available on MacOS through PyPI but installs properly on Linux.

Alternatively, you can clone the repository and install it locally:

git clone https://github.com/eporetsky/mutclust.git
cd mutclust
pip install .

Conda Environment (Recommended)

You can optionally use a conda environment for easier dependency management. This is especially useful for installing clusterone (required for some workflows) from bioconda:

conda env create -f environment.yml
conda activate mutclust

This will install all core dependencies, bioconda::clusterone, and set up MutClust in editable mode. You can still update your code and use the CLI immediately.

Docker Installation

For users who prefer containerized deployment, MutClust is available as a Docker container:

# Build the container
docker build -t mutclust .

# Run MutClust with your data
docker run -v /path/to/your/data:/data mutclust mutclust mr -i /data/your_expression.tsv -o /data/results

The container uses Ubuntu 20.04 and includes all necessary dependencies. Mount your data directory to /data inside the container to access your files.

Usage

MutClust now provides a Click-based command-line interface (CLI) with three main subcommands:

mutclust mr: Calculate mutual rank from an expression dataset
mutclust cls: Run clustering analysis on a given MR table
mutclust enr: Run GO enrichment analysis on clusters

Basic Usage

# Calculate mutual rank from expression data
mutclust mr -i input.tsv -o output_prefix

# Run clustering analysis on a mutual rank table
mutclust cls -i output_prefix.mrs.tsv -o output_prefix

# Run GO enrichment analysis on clusters
mutclust enr -c output_prefix.clusters.tsv -go go-basic.obo -gf tair.gaf -o output_prefix

Subcommand Arguments

`mutclust mr`

Argument	Short	Description	Default
`--input`	`-i`	Path to the RNA-seq dataset (TSV format).	Required
`--output`	`-o`	Output prefix for the results.	Required
`--mr-threshold`	`-m`	Mutual rank threshold for filtering.	`100`
`--e-value`	`-e`	Exponential decay constant.	`10`
`--threads`	`-t`	Number of threads for correlation calculation.	`4`
`--save-intermediate`		Save intermediate files (PCC, MR, filtered pairs).	Optional

`mutclust cls`

Argument	Short	Description	Default
`--input`	`-i`	Path to Mutual Rank (MR) table (TSV format).	Required
`--output`	`-o`	Output prefix for the results.	Required
`--annotations`	`-a`	Path to the gene annotation file.	Optional
`--resolution`	`-r`	Resolution parameter for Leiden clustering.	`0.1`
`--eigengene/--no-eigengene`		Calculate eigen-genes for clusters.	`True`
`--expression`		Path to RNA-seq dataset for eigen-gene calculation.	Required if --eigengene

`mutclust enr`

Argument	Short	Description	Default
`--clusters`	`-c`	Path to clusters file (TSV format).	Required
`--go-obo`	`-go`	Path to the Gene Ontology (GO) OBO file.	Required
`--go-gaf`	`-gf`	Path to the GO annotation file (GAF format).	Required
`--output`	`-o`	Output prefix for the results.	Required
`--expression`		Path to RNA-seq dataset for background gene set.	Optional

Example Workflow

# Step 1: Calculate mutual rank
tab="data/AtCol-0.cpm.tsv"
mutclust mr -i $tab -o results/atcol0

# Step 2: Cluster genes
mutclust cls -i results/atcol0.mrs.tsv -o results/atcol0 --annotations annotations/AtCol-0.annot.tsv --expression $tab

# Step 3: GO enrichment
mutclust enr -c results/atcol0.clusters.tsv -go go-basic.obo -gf tair.gaf -o results/atcol0 --expression $tab

Input File Formats

RNA-seq Dataset

Format: Tab-separated values (TSV).
Columns: Gene IDs as row indices and samples as columns.
Example:

geneID    Sample1    Sample2    Sample3
GeneA     1.23       2.34       3.45
GeneB     4.56       5.67       6.78

Gene Annotation File

Format: Tab-separated values (TSV).
Columns: geneID and additional annotation fields.
Example:

geneID    description
GeneA     Photosynthesis-related protein
GeneB     Transcription factor

GO OBO File

Description: The Gene Ontology (GO) OBO file contains the ontology structure.
Source: Download from Gene Ontology.

GO GAF File

Description: The Gene Annotation File (GAF) maps genes to GO terms.
Source: Download from Gene Ontology.

Output Files

Filtered MR and e-values (<output_prefix>.mrs.tsv):
- Lists of coexpressed genes with MR and e-values.
- Columns: Gene1, Gene2, MR, ED.
Example:
```
Gene1    Gene2    MR    ED
GeneA    GeneB    10.2  0.39
GeneB    GeneC    6     0.6
```
Clustered Genes (<output_prefix>.clusters.tsv):
- Lists genes in each cluster.
- Annotation columns if provided.
- Columns: clusterID, geneID.
Example:
```
clusterID    geneID    Annotations
c1           GeneA     ...
c1           GeneB     ...
```
GO Enrichment Results (<output_prefix>_go_enrichment_results.tsv):
- Contains enriched GO terms for each cluster.
- Columns: cluster, type, size, term, p-val, FC, desc.
Example:
```
cluster    type    size    term       p-val       FC    desc
c1         BP      25      GO:0008150 0.00123     3.5   Biological Process
```

Eigen-gene values (<output_prefix>.eigen.tsv):

Eigen-gene values for each cluster.
Columns: geneID and sample columns.

Example:

geneID    Sample1    Sample2    Sample3
c1        0.707107   0.707107   0.707107
c2        0.577350   0.577350   0.577350
c3        0.500000   0.500000   0.500000

Dependencies

The following Python libraries are required and will be installed automatically:

numpy
pandas
pynetcor
python-igraph
goatools
scikit-learn
click

Other dependencies (such as clusterone) can be installed via conda/bioconda as needed.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions, suggestions and issues are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
docker		docker
mutclust		mutclust
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MutClust: Mutual Rank-Based Clustering and GO Enrichment Analysis

Features

Installation

Conda Environment (Recommended)

Docker Installation

Usage

Basic Usage

Subcommand Arguments

`mutclust mr`

`mutclust cls`

`mutclust enr`

Example Workflow

Input File Formats

RNA-seq Dataset

Gene Annotation File

GO OBO File

GO GAF File

Output Files

Dependencies

License

Contributing

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

License

eporetsky/MutClust

Folders and files

Latest commit

History

Repository files navigation

MutClust: Mutual Rank-Based Clustering and GO Enrichment Analysis

Features

Installation

Conda Environment (Recommended)

Docker Installation

Usage

Basic Usage

Subcommand Arguments

mutclust mr

mutclust cls

mutclust enr

Example Workflow

Input File Formats

RNA-seq Dataset

Gene Annotation File

GO OBO File

GO GAF File

Output Files

Dependencies

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

`mutclust mr`

`mutclust cls`

`mutclust enr`

Packages