Given a set
A minimal (
>original_header 20 6.13
TGGATAAAAAGGCTGACGAAAGGTCTAGCTAAAATTGTCAGGTGCTCTCAGATAAAGCAGTAAGCGAGTTGGTGTTCGCTGAGCGTCGACTAGGCAACGTTAAAGCTATTTTAGGC...
In this case 20 kmers are shared with the indexed kmers. This represents 6.13% of the kmers in the sequence.
Please see https://b2s-doc.readthedocs.io/en/latest/usage.html#installation
back_to_sequences --in-kmers kmers.fasta --in-sequences reads.fasta --out-sequences filtered_reads.fasta --out-kmers counted_kmers.txtThe filtered_reads.fasta file contains the original sequences (here reads) from reads.fasta that contain at least one of the kmers from kmers.fasta. The headers of each read is the same as in reads.fasta, plus the estimated ratio of shared kmers and number of shared kmers.
As the --out-kmers option is used, the file counted_kmers.txt contains for each kmer in kmers.fasta the number of times it was found in filtered_reads.fasta.
Example results obtained on
- the GenOuest platform on a node with 32 threads Xeon 2.2 GHz, denoted by "genouest" in the table below.
- a MacBook, Apple M2 pro, 16 GB RAM, with 10 threads, denoted by "mac" in the table below.
- AMD Ryzen 7 4.2 GHz 5800X 64 GB RAM, with 16 threads, denoted by "AMD" in the table below.
Indexed: one million kmers eacho of length 31. We queried: from 10,000 reads to 200 million reads each of length 100.
| Number of reads | Time genouest | Time mac | Time AMD | max RAM |
|---|---|---|---|---|
| 10,000 | 0.7s | 0.54s | 0.4s | 0.13 GB |
| 100,000 | 0.8s | 0.8s | 1.2s | 0.13 GB |
| 1,000,000 | 3.0s | 3.5s | 7.1s | 0.13 GB |
| 10,000,000 | 7.1s | 11s | 16s | 0.13 GB |
| 100,000,000 | 32s | 58s | 48s | 0.13 GB |
| 200,000,000 | 1m01s | 1m52s | 1m44 | 0.13 GB |
See this page for details
Please reafer the specific documentation for
Please check out How to contribute
Baire et al., (2024). Back to sequences: Find the origin of k-mers. Journal of Open Source Software, 9(101), 7066, https://doi.org/10.21105/joss.07066
bibtex:
@article{Baire2024,
author = {Anthony Baire and Pierre Marijon and Francesco Andreace and Pierre Peterlongo},
title = {Back to sequences: Find the origin of k-mers}, journal = {Journal of Open Source Software},
doi = {10.21105/joss.07066},
url = {https://doi.org/10.21105/joss.07066},
year = {2024},
publisher = {The Open Journal},
volume = {9},
number = {101},
pages = {7066}
}Full documentation is available at https://b2s-doc.readthedocs.io/en/latest/
