Skip to content

Optimized compare_ref function from loop logic to dataframe logic#226

Open
TrangNg-Th wants to merge 1 commit into
adamewing:masterfrom
TrangNg-Th:improve_replace_reads
Open

Optimized compare_ref function from loop logic to dataframe logic#226
TrangNg-Th wants to merge 1 commit into
adamewing:masterfrom
TrangNg-Th:improve_replace_reads

Conversation

@TrangNg-Th
Copy link
Copy Markdown

Description

This pull request optimizes the compare_ref function for faster run of bam merging step after creating mutated bam file. In this version, I

  • Modified the function to use dataframe logic instead of loop logic, using the pandas package to handle and compare references faster.
  • Added a bamsurgeon.def Singularity definition file, allowing users to build a Singularity image for executing the code.
  • The command to build the Singularity image is: singularity build bamsurgeon.sif bamsurgeon.def

Additional Dependencies

  • pandas
  • (optional) Singularity

Benefits

  • Significantly reduced runtime by processing references more efficiently.
  • Improved code readability and maintainability by using dataframe operations.

Benchmark

Test Setup:

  • Donor BAM file size: 3GB
  • Reference FASTA file: Mmul 8.0.1, consisting of 284,727 contigs (including 22 chromosomes)
  • Single SNP

Performance Improvement:

  • The addsnv.py script execution time has been reduced from approximately 4 hours to 2-3 minutes with the optimized compare_ref function.

@TrangNg-Th TrangNg-Th changed the title Optimized compare_ref function from loop logic to dataframe logic. Also Optimized compare_ref function from loop logic to dataframe logic. May 23, 2024
@TrangNg-Th TrangNg-Th changed the title Optimized compare_ref function from loop logic to dataframe logic. Optimized compare_ref function from loop logic to dataframe logic May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant