-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Computing only binary mappability #7
Comments
The other nice side effect is that the files will be much smaller, as a BED/wig file with values of just 1's can be written much more succinctly than a file with decimal values ranging from 0 to 1. |
Hi Josh, theoretically speaking you could replace the If RAM is not a limitation, the easiest solution would be to create a binary vector before writing it to disk. Wig and bed files should work out of the box, dumping it into a binary format might have to be adjusted. If it is not urgent, I would rather consider to include it when porting to SeqAn3. |
Not urgent at all. Interesting point about bitvectors not being able to be read/written in parallel. I didn't know that! Is that a seqan3 limitation or a hardware thing? |
The C++ standard says the following about containers (§ 23.2.2):
Most implementations of std::vector will store the bit-vector in an array of integers. If you try to set/unset bits concurrently that are stored in the same integer value, you might run into problems. |
I think it is oftentimes useful to have a binary mappability of data, when you are only interested in completely unique regions. Up until now, I have been just calculating the mappability and then using another tool to floor the values, or cast them to ints, to obtain only 1's and 0's. However, it would probably be more efficient if this were possible inherently in GenMap.
I would imagine this is a fairly straightforward thing to implement? And I guess it would also make GenMap run a bit faster?
The text was updated successfully, but these errors were encountered: