Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metaspades.py run long time #1436

Open
1 task done
Niohuruzh opened this issue Jan 9, 2025 · 4 comments
Open
1 task done

metaspades.py run long time #1436

Niohuruzh opened this issue Jan 9, 2025 · 4 comments

Comments

@Niohuruzh
Copy link

Niohuruzh commented Jan 9, 2025

Description of bug

The 296 sample takes more than 24 hours to process, whereas other samples typically take only a few hours. I would like to know if there’s any issue causing the program to get stuck.

spades.log

296
Command line: /anaconda3/envs/assemble/bin/metaspades.py -1 /MyBookDuo/clean_data/296_1.rd.fastq -2 /MyBookDuo/clean_data/296_2.rd.fastq -o MyBookDuo/spades/296

System information:
SPAdes version: 4.0.0
Python version: 3.12.3
OS: Linux-6.8.0-51-generic-x86_64-with-glibc2.39

Output dir: MyBookDuo/spades/296
Mode: read error correction and assembling
Debug mode is turned OFF

Dataset parameters:
Metagenomic mode
Reads:
Library number: 1, library type: paired-end
orientation: fr
left reads: ['/MyBookDuo/clean_data/296_1.rd.fastq']
right reads: ['/MyBookDuo/clean_data/296_2.rd.fastq']
interlaced reads: not specified
single reads: not specified
merged reads: not specified
Read error correction parameters:
Iterations: 1
PHRED offset will be auto-detected
Corrected reads will be compressed
Assembly parameters:
k: [21, 33, 55]
Repeat resolution is enabled
Mismatch careful mode is turned OFF
MismatchCorrector will be SKIPPED
Coverage cutoff is turned OFF
Assembly graph output will use GFA v1.2 format
Other parameters:
Dir for temp files: /media/leigod/MyBookDuo/szh/endomicrobiome/spades/296/tmp
Threads: 16
Memory limit (in Gb): 125

======= SPAdes pipeline started. Log can be found here: /MyBookDuo/spades/296/spades.log

/MyBookDuo/clean_data/296_1.rd.fastq: max reads length: 135
/MyBookDuo/clean_data/296_2.rd.fastq: max reads length: 135

Reads length: 135

===== Before start started.

===== Read error correction started.

===== Read error correction started.

== Running: /anaconda3/envs/assemble/bin/spades-hammer /MyBookDuo/spades/296/corrected/configs/config.info

0:00:00.000 1M / 19M INFO General (main.cpp : 76) Starting BayesHammer, built from N/A, git revision N/A
0:00:00.059 1M / 19M INFO General (main.cpp : 77) Loading config from "/MyBookDuo/spades/296/corrected/configs/config.info"
0:00:00.104 1M / 19M INFO General (main.cpp : 79) Maximum # of threads to use (adjusted due to OMP capabilities): 16
0:00:00.105 1M / 19M INFO General (memory_limit.cpp : 55) Memory limit set to 125 Gb
0:00:00.105 1M / 19M INFO General (main.cpp : 87) Trying to determine PHRED offset
0:00:00.113 1M / 19M INFO General (main.cpp : 93) Determined value is 33
0:00:00.114 1M / 19M INFO General (hammer_tools.cpp : 40) Hamming graph threshold tau=1, k=21, subkmer positions = [ 0 10 ]
0:00:00.114 1M / 19M INFO General (main.cpp : 114) Size of aux. kmer data 24 bytes
=== ITERATION 0 begins ===
0:00:00.114 1M / 19M INFO K-mer Counting (kmer_data.cpp : 284) Estimating k-mer count
0:00:00.399 257M / 261M INFO K-mer Counting (kmer_data.cpp : 289) Processing "/MyBookDuo/clean_data/296_1.rd.fastq"
mimalloc: warning: thread 0x7ffa79c006c0: unable to allocate aligned OS memory directly, fall back to over-allocation (67108864 bytes, address: 0x7ffa6b400000, alignment: 67108864, commit: 0)
mimalloc: warning: thread 0x7ffa756006c0: unable to allocate aligned OS memory directly, fall back to over-allocation (67108864 bytes, address: 0x7ffa65a00000, alignment: 67108864, commit: 0)
mimalloc: warning: thread 0x7ffa76a006c0: unable to allocate aligned OS memory directly, fall back to over-allocation (67108864 bytes, address: 0x7ffa5d800000, alignment: 67108864, commit: 0)
mimalloc: warning: thread 0x7ffa788006c0: unable to allocate aligned OS memory directly, fall back to over-allocation (67108864 bytes, address: 0x7ffa59800000, alignment: 67108864, commit: 0)
mimalloc: warning: thread 0x7ffa774006c0: unable to allocate aligned OS memory directly, fall back to over-allocation (67108864 bytes, address: 0x7ffa55800000, alignment: 67108864, commit: 0)
0:00:53.203 257M / 261M INFO K-mer Counting (kmer_data.cpp : 298) Processed 12422937 reads
0:00:53.204 257M / 261M INFO K-mer Counting (kmer_data.cpp : 289) Processing "/MyBookDuo/clean_data/296_2.rd.fastq"
0:01:39.585 257M / 261M INFO K-mer Counting (kmer_data.cpp : 298) Processed 24845874 reads
0:01:39.598 257M / 261M INFO K-mer Counting (kmer_data.cpp : 303) Total 24845874 reads processed
0:01:40.150 257M / 261M INFO K-mer Counting (kmer_data.cpp : 306) Estimated 254609252 distinct kmers
0:01:40.154 1M / 261M INFO K-mer Counting (kmer_data.cpp : 310) Filtering singleton k-mers
0:01:40.865 653M / 705M INFO K-mer Counting (kmer_data.cpp : 316) Processing "/endomicrobiome/clean_data/296_1.rd.fastq"
0:09:00.481 654M / 705M INFO K-mer Counting (kmer_data.cpp : 325) Processed 12422937 reads
0:09:00.482 653M / 705M INFO K-mer Counting (kmer_data.cpp : 316) Processing "/MyBookDuo/clean_data/296_2.rd.fastq"

params.txt

metaspades.py -1 $data_path/${a}_1.rd.fastq -2 $data_path/${a}_2.rd.fastq -o ${a}

SPAdes version

4.0.0

Operating System

Ubuntu24

Python Version

3.12.3

Method of SPAdes installation

conda

No errors reported in spades.log

  • Yes
@asl
Copy link
Member

asl commented Jan 9, 2025

This looks like a known issue with CQF implementation. The only workaround is to skip read correction via --only-assembler

@Niohuruzh
Copy link
Author

Thank you for your quick reply. I was wondering if skipping read correction would lead to any mistakes in the final results?

@asl
Copy link
Member

asl commented Jan 9, 2025

Thank you for your quick reply. I was wondering if skipping read correction would lead to any mistakes in the final results?

The genome assembly process is always heuristic and imprecise. So you should always expect that there will be non-negligible amount of issues in the results. Plus, the whole process is quite unstable: small change in the input could lead to very different results in the output. So, skipping read error correction might influence results, yes.

@Niohuruzh
Copy link
Author

Thanks for your quick reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants