Observed that bumping block-device readahead from the Linux default (128 kB) to 4 MB sped up fgumi group on a large BAM by ~31% (33.96s → 23.38s, mean of 10 trials each, same binary, 6.8 GB input, page cache dropped).
Consider either (a) documenting read_ahead_kb = 4096 as a recommended setting for BAM-heavy runs, or (b) implementing user-space prefetch in the BAM read path so users get the benefit without needing root to tune the device.
Observed that bumping block-device readahead from the Linux default (128 kB) to 4 MB sped up
fgumi groupon a large BAM by ~31% (33.96s → 23.38s, mean of 10 trials each, same binary, 6.8 GB input, page cache dropped).Consider either (a) documenting
read_ahead_kb = 4096as a recommended setting for BAM-heavy runs, or (b) implementing user-space prefetch in the BAM read path so users get the benefit without needing root to tune the device.