-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
enhancementNew feature or requestNew feature or request
Description
I'm having a major concern about the design decision to use only byte alignment. IMO, this is quite bad in many ways and would certainly affect and slowdown many applications:
- On many architectures aligned access is much faster than misaligned (so, word-size vs byte-size)
- Even on platforms such as x86, where misaligned memory access is reasonably fast, then lots of things are still much faster if data is aligned at least to word size (better – to cacheline boundary, but this is a separate issue).
Why should we bother with data alignment when we're just talking about storage format? The quick answer is: memory mapped files. Proper padding and alignment would allow great use of memory mapped files that might be used for many things, but mostly – for reducing I/O overheads and even for async I/O.
I'm proposing to think about two additions / changes:
- Allow additional padding, so the section start could be aligned to, say, page boundary. This would allow section contents to be directly mmap'ed
- Increase alignment of k-mers from single byte to, say, 8 bytes (or do some dynamic scheme, e.g. for k<=16 use 4 bytes for storage and 8 byte chunks for everything else). I think the speed / overhead tradeoff is quite clear here.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request