Setting up chrom-seek pipeline

OpenOmics · Jun 20, 2023 · 235f242 · 235f242
1 parent 4df90ae
commit 235f242
Show file tree

Hide file tree

Showing 13 changed files with 135 additions and 108 deletions.
diff --git a/.github/workflows/main.yaml b/.github/workflows/main.yaml
@@ -18,7 +18,7 @@ jobs:
     - name: Dry Run with test data
       run: |
         docker run -v $PWD:/opt2 snakemake/snakemake:v5.24.2 \
-        /opt2/baseline run --input \
+        /opt2/chrom-seek run --input \
         /opt2/.tests/WT_S1.R1.fastq.gz /opt2/.tests/WT_S1.R2.fastq.gz \
         /opt2/.tests/WT_S2_R1.fastq.gz /opt2/.tests/WT_S2_R2.fastq.gz \
         /opt2/.tests/WT_S3_1.fastq.gz /opt2/.tests/WT_S3_2.fastq.gz \

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,6 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## [0.1.0] - 2022-08-22
+## [0.1.0] - 2023-06-20
 ### Added
   - Recommended [scaffold](https://github.com/OpenOmics/baseline) for building a snakemake pipeline
diff --git a/README.md b/README.md
@@ -1,33 +1,33 @@
 <div align="center">
 
-  <h1>baseline 🔬</h1>
+  <h1>chrom-seek 🔬</h1>
 
-  **_long pipeline name_**
+  **_An awesome set of eipgenetic pipelines_**
 
-  [![tests](https://github.com/OpenOmics/baseline/workflows/tests/badge.svg)](https://github.com/OpenOmics/baseline/actions/workflows/main.yaml) [![docs](https://github.com/OpenOmics/baseline/workflows/docs/badge.svg)](https://github.com/OpenOmics/baseline/actions/workflows/docs.yml) [![GitHub issues](https://img.shields.io/github/issues/OpenOmics/baseline?color=brightgreen)](https://github.com/OpenOmics/baseline/issues)  [![GitHub license](https://img.shields.io/github/license/OpenOmics/baseline)](https://github.com/OpenOmics/baseline/blob/main/LICENSE) 
+  [![tests](https://github.com/OpenOmics/chrom-seek/workflows/tests/badge.svg)](https://github.com/OpenOmics/chrom-seek/actions/workflows/main.yaml) [![docs](https://github.com/OpenOmics/chrom-seek/workflows/docs/badge.svg)](https://github.com/OpenOmics/chrom-seek/actions/workflows/docs.yml) [![GitHub issues](https://img.shields.io/github/issues/OpenOmics/chrom-seek?color=brightgreen)](https://github.com/OpenOmics/chrom-seek/issues)  [![GitHub license](https://img.shields.io/github/license/OpenOmics/chrom-seek)](https://github.com/OpenOmics/chrom-seek/blob/main/LICENSE) 
 
   <i>
-    This is the home of the pipeline, baseline. Its long-term goals: to accurately ...insert goal, to infer ...insert goal, and to boldly ...insert goal like no pipeline before!
+    This is the home of the pipeline, chrom-seek. Its long-term goals: to accurately call and annotate peaks, to infer cell types in cell-free samples, and to boldly quantify diferential binding or accessibility like no pipeline before!
   </i>
 </div>
 
 ## Overview
-Welcome to baseline! Before getting started, we highly recommend reading through [baseline's documentation](https://openomics.github.io/baseline/).
+Welcome to chrom-seek! Before getting started, we highly recommend reading through [chrom-seek's documentation](https://openomics.github.io/chrom-seek/).
 
-The **`./baseline`** pipeline is composed several inter-related sub commands to setup and run the pipeline across different systems. Each of the available sub commands perform different functions: 
+The **`./chrom-seek`** pipeline is composed several inter-related sub commands to setup and run the pipeline across different systems. Each of the available sub commands perform different functions: 
 
- * [<code>baseline <b>run</b></code>](https://openomics.github.io/baseline/usage/run/): Run the baseline pipeline with your input files.
- * [<code>baseline <b>unlock</b></code>](https://openomics.github.io/baseline/usage/unlock/): Unlocks a previous runs output directory.
- * [<code>baseline <b>install</b></code>](https://openomics.github.io/baseline/usage/install/): Download reference files locally.
- * [<code>baseline <b>cache</b></code>](https://openomics.github.io/baseline/usage/cache/): Cache remote resources locally, coming soon!
+ * [<code>chrom-seek <b>run</b></code>](https://openomics.github.io/chrom-seek/usage/run/): Run the chrom-seek pipeline with your input files.
+ * [<code>chrom-seek <b>unlock</b></code>](https://openomics.github.io/chrom-seek/usage/unlock/): Unlocks a previous runs output directory.
+ * [<code>chrom-seek <b>install</b></code>](https://openomics.github.io/chrom-seek/usage/install/): Download reference files locally.
+ * [<code>chrom-seek <b>cache</b></code>](https://openomics.github.io/chrom-seek/usage/cache/): Cache remote resources locally, coming soon!
 
-**baseline** is a comprehensive ...insert long description. It relies on technologies like [Singularity<sup>1</sup>](https://singularity.lbl.gov/) to maintain the highest-level of reproducibility. The pipeline consists of a series of data processing and quality-control steps orchestrated by [Snakemake<sup>2</sup>](https://snakemake.readthedocs.io/en/stable/), a flexible and scalable workflow management system, to submit jobs to a cluster.
+**chrom-seek** is an an awesome set of pipelines designed specfically for cell-free ChIP-seq, bulk ChIP-seq, and bulk ATAC-seq sequencing data. It relies on technologies like [Singularity<sup>1</sup>](https://singularity.lbl.gov/) to maintain the highest-level of reproducibility. The pipeline consists of a series of data processing and quality-control steps orchestrated by [Snakemake<sup>2</sup>](https://snakemake.readthedocs.io/en/stable/), a flexible and scalable workflow management system, to submit jobs to a cluster.
 
 The pipeline is compatible with data generated from Illumina short-read sequencing technologies. As input, it accepts a set of FastQ files and can be run locally on a compute instance or on-premise using a cluster. A user can define the method or mode of execution. The pipeline can submit jobs to a cluster using a job scheduler like SLURM (more coming soon!). A hybrid approach ensures the pipeline is accessible to all users.
 
-Before getting started, we highly recommend reading through the [usage](https://openomics.github.io/baseline/usage/run/) section of each available sub command.
+Before getting started, we highly recommend reading through the [usage](https://openomics.github.io/chrom-seek/usage/run/) section of each available sub command.
 
-For more information about issues or trouble-shooting a problem, please checkout our [FAQ](https://openomics.github.io/baseline/faq/questions/) prior to [opening an issue on Github](https://github.com/OpenOmics/baseline/issues).
+For more information about issues or trouble-shooting a problem, please checkout our [FAQ](https://openomics.github.io/chrom-seek/faq/questions/) prior to [opening an issue on Github](https://github.com/OpenOmics/chrom-seek/issues).
 
 ## Dependencies
 **Requires:** `singularity>=3.5`  `snakemake>=6.0`
@@ -38,18 +38,18 @@ At the current moment, the pipeline uses a mixture of enviroment modules and doc
 Please clone this repository to your local filesystem using the following command:
 ```bash
 # Clone Repository from Github
-git clone https://github.com/OpenOmics/baseline.git
+git clone https://github.com/OpenOmics/chrom-seek.git
 # Change your working directory
-cd baseline/
+cd chrom-seek/
 # Add dependencies to $PATH
 # Biowulf users should run
 module load snakemake singularity
 # Get usage information
-./baseline -h
+./chrom-seek -h
 ```
 
 ## Contribute 
-This site is a living document, created for and by members like you. baseline is maintained by the members of OpenOmics and is improved by continous feedback! We encourage you to contribute new content and make improvements to existing content via pull request to our [GitHub repository](https://github.com/OpenOmics/baseline).
+This site is a living document, created for and by members like you. chrom-seek is maintained by the members of OpenOmics and is improved by continous feedback! We encourage you to contribute new content and make improvements to existing content via pull request to our [GitHub repository](https://github.com/OpenOmics/chrom-seek).
 
 
 ## Cite

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-0.1.0
+0.1.0-beta
diff --git a/baseline → chrom-seek b/baseline → chrom-seek
@@ -25,9 +25,9 @@ merchantability or fitness for any particular purpose.
 Please cite the author and NIH resources like the "Biowulf Cluster" 
 in any work or product based on this material.
 USAGE:
-  $ baseline <command> [OPTIONS]
+  $ chrom-seek <command> [OPTIONS]
 EXAMPLE:
-  $ baseline run --input *.R?.fastq.gz --output output/
+  $ chrom-seek run --input *.R?.fastq.gz --output output/
 """
 
 # Python standard library
@@ -49,16 +49,17 @@ from src.utils import (
     hashed,
     permissions,
     check_cache,
-    require)
+    require
+)
 
 
 # Pipeline Metadata
 __version__ = version
-__authors__ = 'Skyler Kuhn'
-__email__ = '[email protected]'
+__authors__ = 'Skyler Kuhn, Tovah Markowitz'
+__email__ = '[email protected], [email protected]'
 __home__  =  os.path.dirname(os.path.abspath(__file__))
 _name = os.path.basename(sys.argv[0])
-_description = 'An awesome baseline pipeline'
+_description = 'An awesome set of epigenetic pipelines'
 
 
 def unlock(sub_args):
@@ -217,7 +218,7 @@ def parsed_arguments(name, description):
     """
     # Add styled name and description
     c = Colors
-    styled_name = "{0}{1}{2}baseline{3}".format(c.bold, c.bg_black, c.cyan, c.end)
+    styled_name = "{0}{1}{2}chrom-seek{3}".format(c.bold, c.bg_black, c.cyan, c.end)
     description = "{0}{1}{2}".format(c.bold, description, c.end)
 
     # Create a top-level parser
@@ -243,17 +244,39 @@ def parsed_arguments(name, description):
                 [--dry-run] [--job-name JOB_NAME] [--mode {{slurm,local}}] \\
                 [--sif-cache SIF_CACHE] [--singularity-cache SINGULARITY_CACHE] \\
                 [--silent] [--threads THREADS] [--tmp-dir TMP_DIR] \\
+                --assay {{cfChIP,ChIP,ATAC}} \\
+                --genome GENOME \\
                 --input INPUT [INPUT ...] \\
-                --output OUTPUT
+                --output OUTPUT 
 
         Optional arguments are shown in square brackets above.
 
         {3}{4}Description:{5}
-          To run the ...long pipeline name with your data raw data, please
-        provide a space seperated list of FastQ (globbing is supported) and an output 
-        directory to store results.
+          To run an available pipeline with your data raw data, please provide a space 
+        seperated list of FastQ (globbing is supported), an output directory to store 
+        results, a reference genome for alignment and annotation, and an assay type to 
+        call a specific data-processing pipeline.
 
         {3}{4}Required arguments:{5}
+          --assay {{cfChIP,ChIP,ATAC}}
+                                Assay type or data-processing pipeline. This option
+                                defines which pipeline will be run. chrom-seek supports
+                                the processing of bulk ChIP-seq (ChIP), cell-free DNA 
+                                ChIP-seq (cfChIP), and ATAC-seq (ATAC) samples. Select 
+                                from one of the following data-processing pipelines:
+                                    • ChIP
+                                    • cfChIP
+                                    • ATAC
+                                  Example: --assay ChIP
+          --genome GENOME       
+                                Reference genome. This option defines the reference
+                                genome of the samples. modr does comes bundled with
+                                prebuilt reference files from GENCODE for human and 
+                                mouse samples. Select one of the following options:
+                                    • hg19
+                                    • hg38
+                                    • mm10
+                                  Example: --genome hg19
           --input INPUT [INPUT ...]
                                 Input FastQ file(s) to process. The pipeline does NOT
                                 support single-end data. FastQ files for one or more  
@@ -356,7 +379,9 @@ def parsed_arguments(name, description):
           module load singularity snakemake
 
           # Step 2A.) Dry-run the pipeline
-          ./{0} run --input .tests/*.R?.fastq.gz \\
+          ./{0} run --assay ChIP \\
+                         --genome hg19 \\
+                         --input .tests/*.R?.fastq.gz \\
                          --output /data/$USER/output \\
                          --mode slurm \\
                          --dry-run
@@ -365,9 +390,11 @@ def parsed_arguments(name, description):
           # The slurm mode will submit jobs to 
           # the cluster. It is recommended running 
           # the pipeline in this mode.
-          ./{0} run --input .tests/*.R?.fastq.gz \\
+          ./{0} run --assay ChIP \\
+                         --genome hg19 \\
+                         --input .tests/*.R?.fastq.gz \\
                          --output /data/$USER/output \\
-                         --mode slurm
+                         --mode slurm 
 
         {2}{3}Version:{4}
           {1}

diff --git a/docs/README.md b/docs/README.md
@@ -1,12 +1,12 @@
 # Build documentation  
 
-> **Please Note:** When a commit is pushed to the `docs/` directory, it triggers a [github actions workflow](https://github.com/OpenOmics/baseline/actions) to build the static-site and push it to the gh-pages branch.
+> **Please Note:** When a commit is pushed to the `docs/` directory, it triggers a [github actions workflow](https://github.com/OpenOmics/chrom-seek/actions) to build the static-site and push it to the gh-pages branch.
 
 ### Installation
 ```bash
 # Clone the Repository
-git clone https://github.com/OpenOmics/baseline.git
-cd baseline/
+git clone https://github.com/OpenOmics/chrom-seek.git
+cd chrom-seek/
 # Create a virtual environment
 python3 -m venv .venv
 # Activate the virtual environment

diff --git a/docs/faq/questions.md b/docs/faq/questions.md
@@ -1,4 +1,4 @@
 # Frequently Asked Questions
 
-This page is still under construction. If you need immediate help, please [open an issue](https://github.com/OpenOmics/baseline/issues) on Github!
+This page is still under construction. If you need immediate help, please [open an issue](https://github.com/OpenOmics/chrom-seek/issues) on Github!