-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can genmap map run with very large values #19
Comments
I got |
Thanks for checking out GenMap! It was originally developed for short reads and up to 4 mismatches. I would like to continue the work and implement an algorithm for more mismatches (or even arbitrary error rates), but I am not sure whether there are applications for more mismatches without considering indels. Supporting indels is another point on my todo list but I don't think I can find time to implement it anytime soon and also the performance might drop significantly for indels. If you want to know how mappable a read of 1'000 bp might be (by computing the mappability of 1'000 mers, although mapping and mappability are different concepts), an error rate lower than the error profile of the sequencing technology could be sufficient but (in your case) 0.4 while considering mismatches is probably not enough. Indexing a 1 GB genome and computing the mappability for 1'000 mers should still be quite fast, so you can just run it and see what the results look like. But I'm afraid a lot of regions of the genome will be "unique" (due to the low error rate). |
Thank you Christopher for the great tool and the support. |
I don't have access to a binary of gem at the moment with a working mappability algorithm. Differences can occur due to the approximation of gem (you can turn it off explicitly to set the threshold parameter |
I am using an old version since the newer ones have lost the mappability feature and the author is not very active to re-introduce it. I have use gem-mappability following an old report by Derrien et al and I documented my commands on our wiki few years ago (I added a link to your page today :-) Below is the help, I used default settings for all non obligatory options (default: first multiple bin is not a very clear value to me !) If you tell me GenMap does something similar to gem-mappability, I trust you on that; this is accessory analysis and is of no vital importance for my work. Thanks for your nice support and feedback
|
You can disable the approximation with Roughly speaking: if the algorithm chooses Here's an example. Let's assume you compute the mappability for 3-mers with up to 1 mismatch. The first position has a mappability of 0.333 (ACG occurs 3 times with up to 1 mismatch). gem's approximation could set the mappability at the locations of ACC and AGG also to 0.333 (if the threshold parameter is
Thanks! :-) Edit: Unfortunately I don't remember anymore what they meant with |
I would like to compute mappability of long reads on a repetitive plant genome to evaluate the efficiency of long reads (Pacbio or ONT)
Can I use large values like
K 1000 E 150
on a genome of 1GB or will this kill genmap or take ages and TB's to compute?Any advice for the 'index' command options to improve my later 'map' usage are welcome.
Thanks for your advice
The text was updated successfully, but these errors were encountered: