-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathongoing_updates
101 lines (81 loc) · 4.65 KB
/
ongoing_updates
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
Tutorial Future Commits.
Computational assumptions:
- you already have singularity and virtualenv available on your system
- you are using a computational cluster that can submit ~100 jobs simultaneously. They don't have to all run at the same time but can at least be scheduled at once.
- Access to at least one node with >100Gb of ram that you can use for >150h runtime. This is needed for EarlGrey and Braker
LiftOff reference file:
- If LiftOff reference is a gff made from mikado, convert "ncRNA_gene" to "gene" so that lncRNA lift over
- R script to exctract ncRNA models from liftOff
- Add extra documentation for preparing a gtf file for liftoff.
- LiftOff is able to transfer all gene models from a reference to a target (coding, noncoding, etc.)
- Our tutorial pulls the "biotype" column of the output of liftOff to seed additional ncRNA models.
- The "biotype" column will have different syntax depending on the reference gtf file, however we have accounted for RefSeq and NCBI indputs.
- Some custom gtf files (e.g., GRC39) in NCBI have enough syntax differences that liftoff can have trouble using it as a reference. For these model organisms, we reccomend using ENSEMBL annotations.
- If you are having trouble with your reference gtf file, we have a script to clean it while keeping biotype in /utils
1. extract biotype from reference.gff
2. run gffread on reference.gff
3. add biotype back into reference.gffread.gff
Final annotation:
- Add biotype column & convert "type" to the NCBI/ENSEMBL format?
- TCR/IGG genes
- lncRNA gene symbols
Communication:
- IGV examples and mammalian regions to check
ISO-seq:
- workflow for when --mixed, --long, and --short are all viable data options
- Add regtools to required tools to get junctions from ISOseq.
Pipeline flexibility
- replace cactus binary + venv with singularity
Future directions:
- multiple reference species
- add custom gtf & reference mode option
TOGA installation:
- Add to the readme that we need to update next flow instructions for different servers. The TOGA github highlights this too but TOGA won't run and won't have a clear error message if the config isnt fixed.
- TOGA uses slurm by default.
This is our labs (sge) with the addtional parameter that we need to submit the job origin `-P`
You will probably need to see how nextflow works on your system.
- extract_chain_features_config.nf
process {
executor = "sge"
penv = "smp"
memory = '10G'
cpus = '1'
time = '10h'
clusterOptions = { "-V -l h_vmem=10G -V -P simpsonlab -l h_stack=32M -l h_rt=10:00:00" }
}
executor {
name = "sge"
queueSize = 1000
queueStatInterval = "10s"
}
- call_cesar_config_template.nf
process {
executor = "sge"
penv = "smp"
memory = "${_MEMORY_}G"
cpus = '1'
time = '24h'
clusterOptions = { "-V -l h_vmem=64G -V -P simpsonlab -l h_stack=32M -l h_rt=48:00:00" }
}
executor {
name = "sge"
queueSize = 1000
queueStatInterval = "10s"
}
Mikado installation: We found instances where the singularity image provided by mikado will not work depending on pandas and SQLAlchemy compatibility. This shows that the sif file relies on your python setup, which is unsustainable.
risserlin's mikado is stable regardless of your conda env. python install. etc.
singularity pull mikado_tutorial.sif docker://baderlab/mikado:latest
Snakemake earl-grey: add `-r` parameter for EarlGrey into config options
Add long-read into the step-by-step educational tutorial (it's already in the pipelines):.
- We use braker and stringtie to incororpate ISO-seq data.
- Stringite runs in two modes
- If there is ISOseq for the tissue, but no RNA-seq, then run stringtie with the `-L` flag, otherwise it's basically the same.
- e.g., `$externalDir/stringtie/stringtie $outDir/ISOseq_alignment/$i -l $b -L -o $outDir/stringtie_out/$i".lr.gtf" -p 8 --conservative`
- If there is ISO-seq and RNA-seq for the tissue, run stringtie with the --mix flag.
e.g., `$externalDir/stringtie/stringtie --mix $outDir/RNAseq_alignment/$i $outDir/ISOseq_alignment/$i -l $b -o $outDir/stringtie_out/$i".mix.gtf" -p 8 --conservative`
- All three versions are in run_stringtie_flexible.sh
- Braker runs in an extra mode for long-reads, and is identical for the script.
- The actual braker script is identical; the only change you need is to use the long-read docker (singularity) image instead.
- BRAKER_SIF=$externalDir/singularity_images/braker3_lr.sif
- the output will be braker_lr.gffread.gff.
- braker_sr and braker_lr can be integrated as their own items in mikado. We tested mikado vs. TSEBRA for integrating braker_sr and braker_lr and the resulting gff file was nearly identical, which is why we just use mikado here.