Skip to content

Byte-oriented format causes lots of misaligned access #12

@asl

Description

@asl

I'm having a major concern about the design decision to use only byte alignment. IMO, this is quite bad in many ways and would certainly affect and slowdown many applications:

  • On many architectures aligned access is much faster than misaligned (so, word-size vs byte-size)
  • Even on platforms such as x86, where misaligned memory access is reasonably fast, then lots of things are still much faster if data is aligned at least to word size (better – to cacheline boundary, but this is a separate issue).

Why should we bother with data alignment when we're just talking about storage format? The quick answer is: memory mapped files. Proper padding and alignment would allow great use of memory mapped files that might be used for many things, but mostly – for reducing I/O overheads and even for async I/O.

I'm proposing to think about two additions / changes:

  • Allow additional padding, so the section start could be aligned to, say, page boundary. This would allow section contents to be directly mmap'ed
  • Increase alignment of k-mers from single byte to, say, 8 bytes (or do some dynamic scheme, e.g. for k<=16 use 4 bytes for storage and 8 byte chunks for everything else). I think the speed / overhead tradeoff is quite clear here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions