🔬 Single-cell RNA-seq Analysis Pipeline

📌 Version History

v2.0 – Modular Python pipeline + legacy R (current main branch)
v1.0 – Original Python + R pipeline (non-modular)

Version 2 – Modular Python + Legacy R

This project provides a complete and reproducible single-cell RNA-seq (scRNA-seq) analysis pipeline implemented in both Python and R.

Version 1 (feature_qc branch): Original pipeline in Python and R, without modular design.
Version 2 (main branch): Modular, robust Python pipeline with R scripts unchanged from Version 1.

Datasets processed in this pipeline

CROP-seq data (CRISPRi + 10x Genomics) from A549 lung cancer cells – GSE149383
Retina datasets:
- SRA559821 (from PanglaoDB)
- GSE137537 – from "Single-cell Transcriptomic Atlas of the Human Retina Identifies Cell Types Associated with Age-Related Macular Degeneration"

📊 Dataset

1. CROP-seq A549 Perturbation

Study: Replogle et al. (2020). Direct capture of CRISPR guides enables scalable, multiplexed, and multi-omic Perturb-seq. Cell
GEO Accession: GSE149383
Cell line: A549 (lung adenocarcinoma)
Technology: CRISPRi + 10x Genomics
Platform: CROP-seq
Objective: Identify transcriptional changes in response to gene knockdowns

2. Retina scRNA-seq Datasets

SRA559821 (PanglaoDB) – Reference retina dataset for cell type annotation
GSE137537 – Human Retina Transcriptomic Atlas (Age-related Macular Degeneration)
Objective: Identify and compare retina cell populations and disease-associated transcriptional signatures

🧰 Tech Stack

Python (Version 2 – Modular)

Python 3.13.3
Scanpy for scRNA-seq analysis
gseapy for pathway enrichment
pandas, numpy, matplotlib, seaborn, anndata
python-igraph, leidenalg

R (Unchanged from Version 1)

R 4.5.0
Seurat, SeuratObject
dplyr, ggplot2, patchwork, readr, tibble, Matrix
fgsea, msigdbr, pheatmap, knitr

Note: R scripts remain from Version 1 and are fully functional, but not modularized yet. Future updates will align the R workflow with the robust Python structure.

🚀 How to Run the Pipelines

Each version can be run independently. Output folders and filenames are standardized.

🔷 Python Pipeline

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Then run each step:

./python_scripts/01_download_data_cropseq.sh                                              # Download CROP-seq data
python python_scripts/01_download_GEOretina.py                                            # Download retina GSE137537 data
python python_scripts/01_convert_panglao_to_10x.py                                        # Convert Panglaodb data to 10x format
python python_scripts/02_preprocessing.py cropseq python_scripts/config.yaml              # Load, filter, and merge datasets
python python_scripts/03_qc.py cropseq python_scripts/config.yaml                         # Perform quality control
python python_scripts/04_normalization_dimred.py cropseq python_scripts/config.yaml       # Normalize and run PCA/UMAP
python python_scripts/05_clustering.py cropseq python_scripts/config.yaml                 # Clustering (Leiden)
python python_scripts/06_DE.py cropseq python_scripts/config.yaml                         # Differential expression
python python_scripts/07_GSEA.py cropseq python_scripts/config.yaml                       # Pathway enrichment (GO/KEGG)

🟣 R Pipeline

source("install_packages.R")

Run R scripts in RStudio or VS Code:

./R_scripts/00_setup.sh                           # Set up directories
./R_scripts/01_download_data.sh                   # Download data
source("R_scripts/02_preprocessing.R")            # Merge datasets with metadata
source("R_scripts/03_qc.R")                       # Perform quality control
source("R_scripts/04_normalization_dimred.R")     # Normalize and run PCA/UMAP
source("R_scripts/05_clustering.R")               # Clustering
source("R_scripts/06_DE.R")                       # DE analysis using Seurat
source("R_scripts/07_GSEA.R")                     # Enrichment analysis using fgsea

📂 Folder Structure

scRNAseq_pipeline/
├── README.md              # This file
├── .gitignore             # Ignored files/folders
├── requirements.txt       # Python packages
├── install_packages.R     # R packages
|
├── figures/               # Output visualizations
├── results/               # Output data files
├── data/                  # Input data files
├── R_scripts/             # R scripts for each pipeline step
├── python_scripts/        # Python scripts for each pipeline step

🧪 Key Results

UMAP visualization of perturbation and retina cell states

Identification of differentially expressed genes (DEGs) across multiple datasets

Functional enrichment (GO/KEGG) of DEGs

Modular, maintainable design in Python (Version 2)

Legacy R scripts kept for reproducibility (Version 1)

📘 License

MIT License – feel free to use, adapt, and share.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔬 Single-cell RNA-seq Analysis Pipeline

📌 Version History

Version 2 – Modular Python + Legacy R

Datasets processed in this pipeline

📊 Dataset

1. CROP-seq A549 Perturbation

2. Retina scRNA-seq Datasets

🧰 Tech Stack

Python (Version 2 – Modular)

R (Unchanged from Version 1)

🚀 How to Run the Pipelines

🔷 Python Pipeline

🟣 R Pipeline

📂 Folder Structure

🧪 Key Results

📘 License

About

Uh oh!

Releases 2

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
R_scripts		R_scripts
data		data
figures		figures
python_scripts		python_scripts
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install_packages.R		install_packages.R
requirements.txt		requirements.txt

License

XuejianXiong/scRNAseq_pipeline

Folders and files

Latest commit

History

Repository files navigation

🔬 Single-cell RNA-seq Analysis Pipeline

📌 Version History

Version 2 – Modular Python + Legacy R

Datasets processed in this pipeline

📊 Dataset

1. CROP-seq A549 Perturbation

2. Retina scRNA-seq Datasets

🧰 Tech Stack

Python (Version 2 – Modular)

R (Unchanged from Version 1)

🚀 How to Run the Pipelines

🔷 Python Pipeline

🟣 R Pipeline

📂 Folder Structure

🧪 Key Results

📘 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages