GitHub - zjohnson001/RNAseq

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__scripts		__scripts
README		README

Repository files navigation

# RNA Sequencing Environment
The RNAseq_env folder acts as an environment for all RNAseq processing/storage for raw reads, reference sequences, and metadata

## 3 core commands:
gen_refseqs.sh
initialize.sh
run_pipeline.sh

## gen_refseqs.sh <folder_name> <organism_name> <database>
Requires inputs
folder name: will be specified when the directory structure for a new RNAseq project is initialized
organism name: used as a prefix to call indexed .fasta files
database: where sequences came from, a date is added so that all metadata is retained improving reproducibility.

# The user must add additional reference sequences:
A gene transfer file (.gtf/.gff) is added to the _gtf_file directory
One or more .fasta files containing reference sequences the RNAseq reads will be aligned against are added to the __fasta_map_sequences directory

#A directory in __metadata/ref_seqs/ is created under the folder_name
reference sequences are moved into the new directory, .fasta sequences into the subdirectory database_date_mapped_sequences
.fasta sequences are concatenated together and indexed, saving the indexed files under a subdirectory labelled by bowtie2_organism. 

## initialize.sh <project_name> <ref_seqs>
Required inputs
project name: The name of the project directory that will be created
ref_seqs: The name of a reference sequence folder generated by the gen_refseqs.sh

#Additional reference sequences:
fastq files located in the _unprocessed_fastq_files directory 

#A new directory is created in the RNAseq_env directory under the project name provided
#Provides the apppropriate directory structure to run the pipeline and transfers all necessary reference sequences

## run_pipeline.sh <SE/PE> <feature>
Required inputs
SE/PE: Specify whether the RNA seq was generated using Single-end or Paired-end adapters
feature: the genetic feature to be counted, must be a feature on the GTF file (transcript/gene/CDS)

#Pipeline implementation
1. Generate a pre-report using FastQC and MultiQC report is output to Output/rawQC_report
1.1 Check for adapter content & trim adapters with trimmomatic if adapter content is present
2. Map reads to the reference sequences using Bowtie2 
2.1 Convert SAM files to sorted BAM files with Samtools
2.2 Map specific QC generated using Qualimap
3. Assign and count creates using HTSeq reads counts are output to Output/gene_counts
4. Generate a MultiQC report for all processing steps to assess the quality of reads output to Output/finalQC_report