-
Notifications
You must be signed in to change notification settings - Fork 5
Add support for Cell Ranger #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
a9daf79
to
89ec252
Compare
run: | | ||
conda env update --file environment.yml --name base | ||
activate-environment: rnaseq-pipeline | ||
environment-file: environment.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be included immediately in the trunk.
9e01957
to
f250d94
Compare
Parse SRA metadata from its XML format so that we can infer the role that each file plays in fastq-dump output. Add typing and fix many bugs. Retrieve the SRA public dir from a configuration Improve layout detection from SRA metadata Detect bcl2fastq standard filenames and also commonly used names. Add a fallback that checks for the presence of I1/I2/R1/R2, but warns since this is very unreliable. Track issues encountered in runs using an enumerated flag. Make resolution of test resources relative Allow some of the parameters for filtering cells to be overwritten if needed. Use CellRangerCount task from bioluigi Fix unpacking of singleton for single-run experiments Remove cell_ranger_bin from config, it's declared in bioluigi
43c04ed
to
a987102
Compare
Add more metadata.
a987102
to
1d0932d
Compare
Rename fastq_file_types to read_types and add an enumerated type for possible values.
Detect which pipeline branch to take by looking up the assay type of a dataset. Add a special case for FAC-sorted single-cell datasets that should be treated as bulk. Add support for 10x BAM SRA submissions. This is done by looking up the header of the BAM files to infer the sequencing layout and calling bamtofastq downstream on the original submission. Temporarily use the branch of bioluigi with improved sratools support and Cell Ranger.
[rnaseq_pipeline.sources.sra] | ||
# location where tools like prefetch and fastq-dump will store downloaded SRA files | ||
# you can get this value with vdb-config -p | ||
ncbi_public_dir=/cosmos/scratch/ncbi/public |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've encountered issues with parsing the output of vdb-config, so this is a more robust solution overall.
is_single_end: bool = False, is_paired: bool = False): | ||
"""Detects the layout of the sequencing run files based on their names and various additional information. | ||
:param run_id: Identifier for the run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention here that a run is akin to a lane.
TODO
--output-dir
option and change the directory for the executionbamtofastq
for BAM-file SRA submissions