-
Notifications
You must be signed in to change notification settings - Fork 0
zjohnson001/RNAseq
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# RNA Sequencing Environment The RNAseq_env folder acts as an environment for all RNAseq processing/storage for raw reads, reference sequences, and metadata ## 3 core commands: gen_refseqs.sh initialize.sh run_pipeline.sh ## gen_refseqs.sh <folder_name> <organism_name> <database> Requires inputs folder name: will be specified when the directory structure for a new RNAseq project is initialized organism name: used as a prefix to call indexed .fasta files database: where sequences came from, a date is added so that all metadata is retained improving reproducibility. # The user must add additional reference sequences: A gene transfer file (.gtf/.gff) is added to the _gtf_file directory One or more .fasta files containing reference sequences the RNAseq reads will be aligned against are added to the __fasta_map_sequences directory #A directory in __metadata/ref_seqs/ is created under the folder_name reference sequences are moved into the new directory, .fasta sequences into the subdirectory database_date_mapped_sequences .fasta sequences are concatenated together and indexed, saving the indexed files under a subdirectory labelled by bowtie2_organism. ## initialize.sh <project_name> <ref_seqs> Required inputs project name: The name of the project directory that will be created ref_seqs: The name of a reference sequence folder generated by the gen_refseqs.sh #Additional reference sequences: fastq files located in the _unprocessed_fastq_files directory #A new directory is created in the RNAseq_env directory under the project name provided #Provides the apppropriate directory structure to run the pipeline and transfers all necessary reference sequences ## run_pipeline.sh <SE/PE> <feature> Required inputs SE/PE: Specify whether the RNA seq was generated using Single-end or Paired-end adapters feature: the genetic feature to be counted, must be a feature on the GTF file (transcript/gene/CDS) #Pipeline implementation 1. Generate a pre-report using FastQC and MultiQC report is output to Output/rawQC_report 1.1 Check for adapter content & trim adapters with trimmomatic if adapter content is present 2. Map reads to the reference sequences using Bowtie2 2.1 Convert SAM files to sorted BAM files with Samtools 2.2 Map specific QC generated using Qualimap 3. Assign and count creates using HTSeq reads counts are output to Output/gene_counts 4. Generate a MultiQC report for all processing steps to assess the quality of reads output to Output/finalQC_report
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published