Skip to content

Conversation

kpd-daemon[bot]
Copy link

@kpd-daemon kpd-daemon bot commented Aug 17, 2025

Pull request for series with
subject: block: ignore underlying non-stack devices io_opt
version: 1
url: https://patchwork.kernel.org/project/linux-raid/list/?series=992286

Coly Li added 2 commits August 17, 2025 15:56
This patch adds a new BLK_FLAG_STACK_IO_OPT for stack block device. If a
stack block device like md raid5 declares its io_opt when don't want
blk_stack_limits() to change it with io_opt of underlying non-stack
block devices, BLK_FLAG_STACK_IO_OPT can be set on limits.flags. Then in
blk_stack_limits(), lcm_not_zero(t->io_opt, b->io_opt) will be avoided.

For md raid5, it is necessary to keep a proper io_opt size for better
I/O thoughput.

Signed-off-by: Coly Li <[email protected]>
Currently in md_submit_bio() the incoming request bio is split by
bio_split_to_limits() which makes sure the bio won't exceed
max_hw_sectors of a specific raid level before senting into its
.make_request method.

For raid level 4/5/6 such split method might be problematic and hurt
large read/write perforamnce. Because limits.max_hw_sectors are not
always aligned to limits.io_opt size, the split bio won't be full
stripes covered on all data disks, and will introduce extra read-in I/O.
Even the bio's bi_sector is aligned to limits.io_opt size and large
enough, the resulted split bio is not size-friendly to corresponding
raid456 level.

This patch introduces bio_split_by_io_opt() to solve the above issue,
1, If the incoming bio is not limits.io_opt aligned, split the non-
  aligned head part. Then the next one will be aligned.
2, If the imcoming bio is limits.io_opt aligned, and split is necessary,
  then try to split a by multiple of limits.io_opt but not exceed
  limits.max_hw_sectors.

Then for large bio, the sligned split part will be full-stripes covered
to all data disks, no extra read-in I/Os when rmw_level is 0. And for
rmw_level > 0 condistions, the limits.io_opt aligned bios are welcomed
for performace as well.

This patch only tests on 8 disks raid5 array with 64KiB chunk size.
By this patch, 64KiB chunk size for a 8 disks raid5 array, sequential
write performance increases from 900MiB/s to 1.1GiB/s by fio bs=10M.
If fio bs=488K (exact limits.io_opt size) the peak sequential write
throughput can reach 1.51GiB/s.

Signed-off-by: Coly Li <[email protected]>
@kpd-daemon
Copy link
Author

kpd-daemon bot commented Aug 17, 2025

Upstream branch: c17fb54
series: https://patchwork.kernel.org/project/linux-raid/list/?series=992286
version: 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants