Skip to content

Commit

Permalink
Setting up chrom-seek pipeline
Browse files Browse the repository at this point in the history
  • Loading branch information
skchronicles committed Jun 20, 2023
1 parent 4df90ae commit 235f242
Show file tree
Hide file tree
Showing 13 changed files with 135 additions and 108 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
- name: Dry Run with test data
run: |
docker run -v $PWD:/opt2 snakemake/snakemake:v5.24.2 \
/opt2/baseline run --input \
/opt2/chrom-seek run --input \
/opt2/.tests/WT_S1.R1.fastq.gz /opt2/.tests/WT_S1.R2.fastq.gz \
/opt2/.tests/WT_S2_R1.fastq.gz /opt2/.tests/WT_S2_R2.fastq.gz \
/opt2/.tests/WT_S3_1.fastq.gz /opt2/.tests/WT_S3_2.fastq.gz \
Expand Down
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.0] - 2022-08-22
## [0.1.0] - 2023-06-20
### Added
- Recommended [scaffold](https://github.com/OpenOmics/baseline) for building a snakemake pipeline
34 changes: 17 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,33 @@
<div align="center">

<h1>baseline 🔬</h1>
<h1>chrom-seek 🔬</h1>

**_long pipeline name_**
**_An awesome set of eipgenetic pipelines_**

[![tests](https://github.com/OpenOmics/baseline/workflows/tests/badge.svg)](https://github.com/OpenOmics/baseline/actions/workflows/main.yaml) [![docs](https://github.com/OpenOmics/baseline/workflows/docs/badge.svg)](https://github.com/OpenOmics/baseline/actions/workflows/docs.yml) [![GitHub issues](https://img.shields.io/github/issues/OpenOmics/baseline?color=brightgreen)](https://github.com/OpenOmics/baseline/issues) [![GitHub license](https://img.shields.io/github/license/OpenOmics/baseline)](https://github.com/OpenOmics/baseline/blob/main/LICENSE)
[![tests](https://github.com/OpenOmics/chrom-seek/workflows/tests/badge.svg)](https://github.com/OpenOmics/chrom-seek/actions/workflows/main.yaml) [![docs](https://github.com/OpenOmics/chrom-seek/workflows/docs/badge.svg)](https://github.com/OpenOmics/chrom-seek/actions/workflows/docs.yml) [![GitHub issues](https://img.shields.io/github/issues/OpenOmics/chrom-seek?color=brightgreen)](https://github.com/OpenOmics/chrom-seek/issues) [![GitHub license](https://img.shields.io/github/license/OpenOmics/chrom-seek)](https://github.com/OpenOmics/chrom-seek/blob/main/LICENSE)

<i>
This is the home of the pipeline, baseline. Its long-term goals: to accurately ...insert goal, to infer ...insert goal, and to boldly ...insert goal like no pipeline before!
This is the home of the pipeline, chrom-seek. Its long-term goals: to accurately call and annotate peaks, to infer cell types in cell-free samples, and to boldly quantify diferential binding or accessibility like no pipeline before!
</i>
</div>

## Overview
Welcome to baseline! Before getting started, we highly recommend reading through [baseline's documentation](https://openomics.github.io/baseline/).
Welcome to chrom-seek! Before getting started, we highly recommend reading through [chrom-seek's documentation](https://openomics.github.io/chrom-seek/).

The **`./baseline`** pipeline is composed several inter-related sub commands to setup and run the pipeline across different systems. Each of the available sub commands perform different functions:
The **`./chrom-seek`** pipeline is composed several inter-related sub commands to setup and run the pipeline across different systems. Each of the available sub commands perform different functions:

* [<code>baseline <b>run</b></code>](https://openomics.github.io/baseline/usage/run/): Run the baseline pipeline with your input files.
* [<code>baseline <b>unlock</b></code>](https://openomics.github.io/baseline/usage/unlock/): Unlocks a previous runs output directory.
* [<code>baseline <b>install</b></code>](https://openomics.github.io/baseline/usage/install/): Download reference files locally.
* [<code>baseline <b>cache</b></code>](https://openomics.github.io/baseline/usage/cache/): Cache remote resources locally, coming soon!
* [<code>chrom-seek <b>run</b></code>](https://openomics.github.io/chrom-seek/usage/run/): Run the chrom-seek pipeline with your input files.
* [<code>chrom-seek <b>unlock</b></code>](https://openomics.github.io/chrom-seek/usage/unlock/): Unlocks a previous runs output directory.
* [<code>chrom-seek <b>install</b></code>](https://openomics.github.io/chrom-seek/usage/install/): Download reference files locally.
* [<code>chrom-seek <b>cache</b></code>](https://openomics.github.io/chrom-seek/usage/cache/): Cache remote resources locally, coming soon!

**baseline** is a comprehensive ...insert long description. It relies on technologies like [Singularity<sup>1</sup>](https://singularity.lbl.gov/) to maintain the highest-level of reproducibility. The pipeline consists of a series of data processing and quality-control steps orchestrated by [Snakemake<sup>2</sup>](https://snakemake.readthedocs.io/en/stable/), a flexible and scalable workflow management system, to submit jobs to a cluster.
**chrom-seek** is an an awesome set of pipelines designed specfically for cell-free ChIP-seq, bulk ChIP-seq, and bulk ATAC-seq sequencing data. It relies on technologies like [Singularity<sup>1</sup>](https://singularity.lbl.gov/) to maintain the highest-level of reproducibility. The pipeline consists of a series of data processing and quality-control steps orchestrated by [Snakemake<sup>2</sup>](https://snakemake.readthedocs.io/en/stable/), a flexible and scalable workflow management system, to submit jobs to a cluster.

The pipeline is compatible with data generated from Illumina short-read sequencing technologies. As input, it accepts a set of FastQ files and can be run locally on a compute instance or on-premise using a cluster. A user can define the method or mode of execution. The pipeline can submit jobs to a cluster using a job scheduler like SLURM (more coming soon!). A hybrid approach ensures the pipeline is accessible to all users.

Before getting started, we highly recommend reading through the [usage](https://openomics.github.io/baseline/usage/run/) section of each available sub command.
Before getting started, we highly recommend reading through the [usage](https://openomics.github.io/chrom-seek/usage/run/) section of each available sub command.

For more information about issues or trouble-shooting a problem, please checkout our [FAQ](https://openomics.github.io/baseline/faq/questions/) prior to [opening an issue on Github](https://github.com/OpenOmics/baseline/issues).
For more information about issues or trouble-shooting a problem, please checkout our [FAQ](https://openomics.github.io/chrom-seek/faq/questions/) prior to [opening an issue on Github](https://github.com/OpenOmics/chrom-seek/issues).

## Dependencies
**Requires:** `singularity>=3.5` `snakemake>=6.0`
Expand All @@ -38,18 +38,18 @@ At the current moment, the pipeline uses a mixture of enviroment modules and doc
Please clone this repository to your local filesystem using the following command:
```bash
# Clone Repository from Github
git clone https://github.com/OpenOmics/baseline.git
git clone https://github.com/OpenOmics/chrom-seek.git
# Change your working directory
cd baseline/
cd chrom-seek/
# Add dependencies to $PATH
# Biowulf users should run
module load snakemake singularity
# Get usage information
./baseline -h
./chrom-seek -h
```

## Contribute
This site is a living document, created for and by members like you. baseline is maintained by the members of OpenOmics and is improved by continous feedback! We encourage you to contribute new content and make improvements to existing content via pull request to our [GitHub repository](https://github.com/OpenOmics/baseline).
This site is a living document, created for and by members like you. chrom-seek is maintained by the members of OpenOmics and is improved by continous feedback! We encourage you to contribute new content and make improvements to existing content via pull request to our [GitHub repository](https://github.com/OpenOmics/chrom-seek).


## Cite
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.1.0
0.1.0-beta
55 changes: 41 additions & 14 deletions baseline → chrom-seek
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ merchantability or fitness for any particular purpose.
Please cite the author and NIH resources like the "Biowulf Cluster"
in any work or product based on this material.
USAGE:
$ baseline <command> [OPTIONS]
$ chrom-seek <command> [OPTIONS]
EXAMPLE:
$ baseline run --input *.R?.fastq.gz --output output/
$ chrom-seek run --input *.R?.fastq.gz --output output/
"""

# Python standard library
Expand All @@ -49,16 +49,17 @@ from src.utils import (
hashed,
permissions,
check_cache,
require)
require
)


# Pipeline Metadata
__version__ = version
__authors__ = 'Skyler Kuhn'
__email__ = '[email protected]'
__authors__ = 'Skyler Kuhn, Tovah Markowitz'
__email__ = '[email protected], [email protected]'
__home__ = os.path.dirname(os.path.abspath(__file__))
_name = os.path.basename(sys.argv[0])
_description = 'An awesome baseline pipeline'
_description = 'An awesome set of epigenetic pipelines'


def unlock(sub_args):
Expand Down Expand Up @@ -217,7 +218,7 @@ def parsed_arguments(name, description):
"""
# Add styled name and description
c = Colors
styled_name = "{0}{1}{2}baseline{3}".format(c.bold, c.bg_black, c.cyan, c.end)
styled_name = "{0}{1}{2}chrom-seek{3}".format(c.bold, c.bg_black, c.cyan, c.end)
description = "{0}{1}{2}".format(c.bold, description, c.end)

# Create a top-level parser
Expand All @@ -243,17 +244,39 @@ def parsed_arguments(name, description):
[--dry-run] [--job-name JOB_NAME] [--mode {{slurm,local}}] \\
[--sif-cache SIF_CACHE] [--singularity-cache SINGULARITY_CACHE] \\
[--silent] [--threads THREADS] [--tmp-dir TMP_DIR] \\
--assay {{cfChIP,ChIP,ATAC}} \\
--genome GENOME \\
--input INPUT [INPUT ...] \\
--output OUTPUT
--output OUTPUT
Optional arguments are shown in square brackets above.
{3}{4}Description:{5}
To run the ...long pipeline name with your data raw data, please
provide a space seperated list of FastQ (globbing is supported) and an output
directory to store results.
To run an available pipeline with your data raw data, please provide a space
seperated list of FastQ (globbing is supported), an output directory to store
results, a reference genome for alignment and annotation, and an assay type to
call a specific data-processing pipeline.
{3}{4}Required arguments:{5}
--assay {{cfChIP,ChIP,ATAC}}
Assay type or data-processing pipeline. This option
defines which pipeline will be run. chrom-seek supports
the processing of bulk ChIP-seq (ChIP), cell-free DNA
ChIP-seq (cfChIP), and ATAC-seq (ATAC) samples. Select
from one of the following data-processing pipelines:
• ChIP
• cfChIP
• ATAC
Example: --assay ChIP
--genome GENOME
Reference genome. This option defines the reference
genome of the samples. modr does comes bundled with
prebuilt reference files from GENCODE for human and
mouse samples. Select one of the following options:
• hg19
• hg38
• mm10
Example: --genome hg19
--input INPUT [INPUT ...]
Input FastQ file(s) to process. The pipeline does NOT
support single-end data. FastQ files for one or more
Expand Down Expand Up @@ -356,7 +379,9 @@ def parsed_arguments(name, description):
module load singularity snakemake
# Step 2A.) Dry-run the pipeline
./{0} run --input .tests/*.R?.fastq.gz \\
./{0} run --assay ChIP \\
--genome hg19 \\
--input .tests/*.R?.fastq.gz \\
--output /data/$USER/output \\
--mode slurm \\
--dry-run
Expand All @@ -365,9 +390,11 @@ def parsed_arguments(name, description):
# The slurm mode will submit jobs to
# the cluster. It is recommended running
# the pipeline in this mode.
./{0} run --input .tests/*.R?.fastq.gz \\
./{0} run --assay ChIP \\
--genome hg19 \\
--input .tests/*.R?.fastq.gz \\
--output /data/$USER/output \\
--mode slurm
--mode slurm
{2}{3}Version:{4}
{1}
Expand Down
6 changes: 3 additions & 3 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Build documentation

> **Please Note:** When a commit is pushed to the `docs/` directory, it triggers a [github actions workflow](https://github.com/OpenOmics/baseline/actions) to build the static-site and push it to the gh-pages branch.
> **Please Note:** When a commit is pushed to the `docs/` directory, it triggers a [github actions workflow](https://github.com/OpenOmics/chrom-seek/actions) to build the static-site and push it to the gh-pages branch.
### Installation
```bash
# Clone the Repository
git clone https://github.com/OpenOmics/baseline.git
cd baseline/
git clone https://github.com/OpenOmics/chrom-seek.git
cd chrom-seek/
# Create a virtual environment
python3 -m venv .venv
# Activate the virtual environment
Expand Down
2 changes: 1 addition & 1 deletion docs/faq/questions.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Frequently Asked Questions

This page is still under construction. If you need immediate help, please [open an issue](https://github.com/OpenOmics/baseline/issues) on Github!
This page is still under construction. If you need immediate help, please [open an issue](https://github.com/OpenOmics/chrom-seek/issues) on Github!

Loading

0 comments on commit 235f242

Please sign in to comment.