Skip to content

Conversation

arteymix
Copy link
Member

@arteymix arteymix commented Sep 13, 2025

TODO

  • integrate CellRanger in bioluigi
  • detect layout of runs in SRA
  • prevent Cell Ranger from creating MRO files in the current directory, we should probably ditch the --output-dir option and change the directory for the execution
  • run bamtofastq for BAM-file SRA submissions

@arteymix arteymix force-pushed the feature-cell-ranger branch 3 times, most recently from a9daf79 to 89ec252 Compare September 15, 2025 22:31
run: |
conda env update --file environment.yml --name base
activate-environment: rnaseq-pipeline
environment-file: environment.yml
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be included immediately in the trunk.

Parse SRA metadata from its XML format so that we can infer the role
that each file plays in fastq-dump output.

Add typing and fix many bugs.

Retrieve the SRA public dir from a configuration

Improve layout detection from SRA metadata

Detect bcl2fastq standard filenames and also commonly used names. Add a
fallback that checks for the presence of I1/I2/R1/R2, but warns since
this is very unreliable.

Track issues encountered in runs using an enumerated flag.

Make resolution of test resources relative

Allow some of the parameters for filtering cells to be overwritten if
needed.

Use CellRangerCount task from bioluigi

Fix unpacking of singleton for single-run experiments

Remove cell_ranger_bin from config, it's declared in bioluigi
Add more metadata.
arteymix and others added 4 commits September 24, 2025 11:09
Rename fastq_file_types to read_types and add an enumerated type for
possible values.
Detect which pipeline branch to take by looking up the assay type of a
dataset. Add a special case for FAC-sorted single-cell datasets that
should be treated as bulk.

Add support for 10x BAM SRA submissions. This is done by looking up the
header of the BAM files to infer the sequencing layout and calling
bamtofastq downstream on the original submission.

Temporarily use the branch of bioluigi with improved sratools support
and Cell Ranger.
[rnaseq_pipeline.sources.sra]
# location where tools like prefetch and fastq-dump will store downloaded SRA files
# you can get this value with vdb-config -p
ncbi_public_dir=/cosmos/scratch/ncbi/public
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've encountered issues with parsing the output of vdb-config, so this is a more robust solution overall.

is_single_end: bool = False, is_paired: bool = False):
"""Detects the layout of the sequencing run files based on their names and various additional information.
:param run_id: Identifier for the run
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention here that a run is akin to a lane.

@arteymix arteymix self-assigned this Oct 9, 2025
@arteymix arteymix added this to the 2.2.0 milestone Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment