Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computing only binary mappability #7

Open
joshuak94 opened this issue Jul 22, 2019 · 4 comments
Open

Computing only binary mappability #7

joshuak94 opened this issue Jul 22, 2019 · 4 comments
Labels
enhancement New feature or request

Comments

@joshuak94
Copy link
Contributor

I think it is oftentimes useful to have a binary mappability of data, when you are only interested in completely unique regions. Up until now, I have been just calculating the mappability and then using another tool to floor the values, or cast them to ints, to obtain only 1's and 0's. However, it would probably be more efficient if this were possible inherently in GenMap.

I would imagine this is a fairly straightforward thing to implement? And I guess it would also make GenMap run a bit faster?

@joshuak94
Copy link
Contributor Author

The other nice side effect is that the files will be much smaller, as a BED/wig file with values of just 1's can be written much more succinctly than a file with decimal values ranging from 0 to 1.

@cpockrandt
Copy link
Owner

Hi Josh,

theoretically speaking you could replace the std::vector<uint8_t> with a bitvector. While a vector can be read and written in parallel, the bitvector can't. If two threads write to the same region (e.g. 64-bit integer), you get into trouble without locking. I don't think you will see any speedup.

If RAM is not a limitation, the easiest solution would be to create a binary vector before writing it to disk. Wig and bed files should work out of the box, dumping it into a binary format might have to be adjusted.

If it is not urgent, I would rather consider to include it when porting to SeqAn3.

@joshuak94
Copy link
Contributor Author

Not urgent at all.

Interesting point about bitvectors not being able to be read/written in parallel. I didn't know that! Is that a seqan3 limitation or a hardware thing?

@cpockrandt
Copy link
Owner

cpockrandt commented Aug 1, 2019

The C++ standard says the following about containers (§ 23.2.2):

Notwithstanding (17.6.5.9), implementations are required to avoid data races when the contents of the contained object in different elements in the same container, excepting vector, are modified concurrently.

Most implementations of std::vector will store the bit-vector in an array of integers. If you try to set/unset bits concurrently that are stored in the same integer value, you might run into problems.

@cpockrandt cpockrandt added the enhancement New feature or request label May 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants