- workflow was implemented and last executed successfully with
R v4.4.1 with Bioc 3.20, and Python v3.11.3 with Snakemake v7.26.0 - R version and library have to be specified in the
config.yaml
file
(e.g.,R: "R_LIBS_USER=/path/to/library /path/to/R/executable"
) .Rprofile
is used for handling and printing command line argumentslogs/
capture.Rout
files fromR CMD BATCH
executionsdata/
contains any synthetic and real data- intermediate results are generated in
outs/
- visualizations are generated in
plts/
-
<x>
denotes a wildcard, namely:t
ype,s
tate,b
atch,
sim
ulation,sco
re,sel
ection,sta
tistic,
das
= differential state analysis method -
00-get_sim/dat.R
- out: for simulations,
data/sim/00-raw/t<t>,s<s>,b<b>.rds
,
for real data,data/dat/00-raw/<did>.rds
(<did>
= dataset identifier) - synthetic data generation (
splatter::splatPopSimulate()
) - hereafter,
t<t>,s<s>,b<b>
=<sim>
- out: for simulations,
-
01-pro_sim/dat.R
- in:
data/sim|dat/00-raw/<sim|dat>.rds
out:data/sim|dat/01-fil/<sim|dat>.rds
- minimal filtering keeping genes with count > 1
in ≥ 10 cells, and cells with ≥ 10 detected genes - log-library size normalization (
scater::logNormCounts()
) - highly variable gene (HVG) selection (
scran::modelGeneVar()
) - principal component analysis (PCA) using HVGs (
scater::runPCA()
)
- in:
-
02-sco.R
- in:
data/sim|dat/01-fil/<sim|dat>.rds
out:outs/sim|dat/sco-<sim|dat>,<sco>.rds
- source method from one of
02-sco-<sco>.R
- compute gene-level metrics to quantify type-/state-specificity
- in:
-
03-sel.R
- in:
outs/sim|dat/sco-<sim|dat>,<sco>.rds
out:outs/sim|dat/sel-<sim|dat>,<sco>.rds
- source method from one of
03-sel-<sel>.R
- select genes for reprocessing
- in:
-
04-rep.R
- in:
outs/sim|dat/sco-<sim|dat>,<sco>.rds
out:data/sim|dat/02-rep/<sim|dat>,<sel>.rds
- data reprocessing (PCA, clustering, reduction)
- in:
-
05-sta.R
- in:
data/sim|dat/02-rep/<sim|dat>,<sel>.rds
out:outs/sim|dat/sta-<sim|dat>,<sel>,<sta>.rds
- source method from on of
05-sta-<sta>.R
- compute evaluation statistics
- in:
-
06-das.R
- in:
data/sim|dat/02-rep/<sim|dat>,<sel>.rds
out:outs/sim|dat/das-<sim|dat>,<sel>,<das>.rds
- source method from one of
06-das-<das>.R
- perform differential state analysis (DSA)
- in:
-
07-eva.R
- standalone script applied to experimental data only
- collects results across all feature selection strategies,
selects [10, 20, ..., 90%] for top-rank features, and recomputes
evaluation statistics for accordingly reprocessed data (PCA, clustering)
-
08-plt_<out>-<plt>.R
- in:
outs/sim/<out>.rds
out:plt/sim/<out>-<plt>.pdf
- e.g.,
08-plt_das-F1.pdf
collects all DSA results
(outs/sim/das-<sim>,<sel>,<das>.rds
) and plots F1 scores - visualization of synthetic data analysis results
- in:
-
08-qlt_<out>-<qlt>.R
- in:
outs/dat/<out>.rds
out:plt/dat/<out>-<qlt>.pdf
- visualization of experimental data analysis results
- in:
-
09-aes.R
- sourced to fix the order of feature scores (
SCO
),
ground truth-based (DES
) and other selections (SEL
),
and differential state analysis methods (DAS
) across plots
- sourced to fix the order of feature scores (
-
10-session_info.R
- lists and may be used to install all R packages used
(across CRAN, GitHub, and Bioconductor), and writes the
correspondingsessionInfo()
output tosession_info.txt
- lists and may be used to install all R packages used