-
Notifications
You must be signed in to change notification settings - Fork 23
Add SIMD optimization for int_to_float conversion #580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add SIMD fast paths for converting custom bit-depth floats to f32: - 32-bit float passthrough: Simple bitcast using SIMD - 16-bit float (f16/half-precision): SIMD conversion with scalar fallback for subnormal values The 16-bit float SIMD path handles normal, zero, and inf/nan cases directly, falling back to scalar for the rare subnormal case which requires variable-iteration normalization. Also adds BitDepth::f16() test helper and comprehensive unit tests for the conversion functions.
Benchmark @ fca2520 |
|
|
||
| // SIMD 16-bit float (half-precision) to 32-bit float conversion | ||
| // This handles IEEE 754 binary16 format: 1 sign bit, 5 exponent bits, 10 mantissa bits | ||
| simd_function!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would prefer to have a pair of functions I32Vec::store_u16() and F32Vec::load_f16_bits() instead. Those functions function can use _mm256_cvtph_ps on AVX2 (by also requiring the F16C target feature, which is common) and vcvt_f32_f16 on NEON (although the Rust definition erroneously requires the f16 target feature, so we'd have to use inline assembly for now -- fixed in rust-lang/stdarch#1978), and fall back to scalar on SSE4.2.
We then could add store_f16() -- implemented in a similar way -- and use that to speed up the f16 conversion code..
SIMD fast paths for the
int_to_floatfunction which converts custom bit-depth floats stored as i32 back to f32.32-bit float: straightforward bitcast via SIMD.
16-bit float (f16): SIMD handles normal values, zeros, and inf/nan. Subnormals fall back to scalar since they need a variable-iteration normalization loop.
Waiting for perf CI to see the impact.