-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Inspired by this blog post and related discussion in r/rust
When writing code which needs to be heavily optimized, people usually write their Rust/C/whatever code to try to allow the compiler to use several machine-level tricks, including SIMD instructions and partial loop unrolling. Naturally, it wouldn't be possible within clippy to ensure that these compile-time optimizations do happen, but there are a few common pitfalls that people run into that we could make lints for to help people trying to perform these optimizations.
One such pitfall that was mentioned in the linked blog post is accessing of a dynamic slice without a specified size. Here's an example of code provided in the blog post which runs afoul of that:
pub fn mix_mono_to_stereo(dst: &mut [f32], src: &[f32], gain_l: f32, gain_r: f32) {
for i in 0..src.len() {
dst[i * 2 + 0] = src[i] * gain_l;
dst[i * 2 + 1] = src[i] * gain_r;
}
}In each pass of the for-loop, the code has to check to see if there is still space in dst, or panic if the write has passed the end of the slice, twice (once for each write). By using slice indexing to ensure that the slice has enough space to copy values into before the loop, it is possible to enable the resulting binary to check once at the beginning instead of with every operation.
In the blog post, the code ends up being optimized to this:
#[repr(C)]
pub struct StereoSample {
l: f32,
r: f32,
}
#[repr(transparent)]
pub struct MonoSample(f32);
pub fn mix_mono_to_stereo_3(dst: &mut [StereoSample], src: &[MonoSample], gain_l: f32, gain_r: f32) {
let dst_known_bounds = &mut dst[0..src.len()];
for i in 0..src.len() {
dst_known_bounds[i].l = src[i].0 * gain_l;
dst_known_bounds[i].r = src[i].0 * gain_r;
}
}We could produce a lint that suggests something like the let dst_known_bounds = &mut dst[0..src.len()]; line in the final code, which is what allows the code to do that. This lint alone doesn't guarantee it (as demonstrated by the "Second Attempt" code in the blog post), but it could help people trying to do these optimizations.
Because these sorts of accesses appearing a section of code which is only repeated a few times, or in code which is bottlenecked by something other than cpu cyles wouldn't contribute much to the performance of that code, I'd think that this would be better as a lint that defaults to allowing it, but can be enabled on a specific segment of code which wants heavier optimization.