Conversation
|
Hi, thanks for your work on this PR, wondering what was the motivation (beyond image-rs) ? And I would prefer smaller PRs, but let me go through this one before anything |
I've been working on
Yeah absolutely! Some of the changes might not fit the style of the project, so I'm happy to change/discard elements. Likewise, I'm also happy to split off smaller PRs, whatever works best. |
|
Lets start with changes that introduce no new dependencies (libm) first. Then we can see if we can implement functionality we get from libm as adding a crate to get one function seems like too much for me |
|
If anything, MUSL's libm has some of the worst accuracy among system math libraries, though this is almost always negligible for media applications. However, it's also extremely slow ( all methods at least x2-x3 slower than system libraries, pow/powf x5-x10) — if there's heavy math inside tight loops, it can drive execution time to insane levels. |
This may be a concern, particularly for hdr images which need to call zune-image/crates/zune-hdr/src/encoder.rs Line 340 in 6d54a3d For png the perf is negligible as we use lookup tables, but may still be significant enough |
|
Just exactly for exp2f I could help. Technically, this is competitive implementation to the glibc in terms of high precision and performance. IMHO for png gamma it is definitely better stick to LUT, not depending on math library. Gamma LUTs in practice are worse only for extra small images like 5x5 where everything is fast anyways. exp2f benches: x86_64 bench without FMA libm::exp2f time: [37.275 µs 37.285 µs 37.295 µs]
system::exp2f time: [30.890 µs 30.895 µs 30.901 µs]
moxcms::exp2f time: [27.584 µs 27.600 µs 27.630 µs]x86_64 bench with FMA enabled libm::exp2f time: [107.39 µs 107.40 µs 107.41 µs]
change: [+187.77% +188.12% +188.37%] (p = 0.00 < 0.05)
Performance has regressed.
system::exp2f time: [28.074 µs 28.077 µs 28.081 µs]
change: [−9.4598% −9.1601% −8.9084%] (p = 0.00 < 0.05)
Performance has improved.
moxcms::exp2f time: [20.787 µs 20.801 µs 20.814 µs]
change: [−24.767% −24.607% −24.456%] (p = 0.00 < 0.05)
Performance has improved.aarch64 bench libm::exp2f time: [19.988 µs 20.210 µs 20.464 µs]
system::exp2f time: [14.126 µs 14.276 µs 14.427 µs]
moxcms::exp2f time: [10.119 µs 10.163 µs 10.217 µs] |
You pump out some amazing libraries :) |
|
Marking as draft as work will continue in separate PRs based on this. See #264 for the first follow-up. |
First off, thanks for the information here! Looking at the paper myself though, it appears that MUSL's
The outlier here is As for performance, |
Yep, and LLVM is also is not really fast. There is actually no magic here. IEEE math is well investigated topic for the last 40 years, implementations can be accurate, or fast, but not both. The issue of MUSL's, especially in Rust port of that- that it is not accurate along as not fast.
Just in math terms ULP 0.511 and ULP 0.5 is huge difference because ULP 0.511 can produce 154345310000000000000000000000000 instead of 154345300000000000000000000000000 (see that difference?) that is 2 nearest numbers that f32 can represent. But as I wrote, media applications usually don't operate on numbers of such order, so such precision isn't something really required. Additionally, there’s the concept of error density. A library might have a worst-case error of 0.511 ULP, but in 99.9% of cases, the error could remain below 0.5 ULP. On the other hand, another implementation might also have error peak at 0.511 ULP, but hit that maximum error almost every time. And MUSL's is closer to the second one :) That's just a little theory, again, as I noted in the first place, such accuraccy is not critical for media applications. Media applications sensitive to speed, and can easy tolerate some accuracy.
That's not for me to decide. But, speaking honestly I'm not completely even getting the point of no_std and IEEE math, because in the most of places where no_std is required FPU unit is just not available ( or prohibited to use because IEEE math is not determenistic and not finite ), and there is a completely different level of work on that. And here is also opposite point, where FPU unit is available usually std is also available with the math libraries. |
I'd like to phrase this more clearly: If these image processing algorithms, which rely heavily on IEEE f32/f64, are truly intended to run on something like a Raspberry Pi Zero 2 W, they need to be redesigned case by case with really dirty math, because those devices are not designed to such kind of workloads ( and those devices that truly needs no_std when they're without OS and program needs to be flashed ). Oth if they are not intented for such devices, it might be more practical to simply disable them in no_std builds, or to use any other replacement, since they will still be running on powerful systems. |
Objective
zune-imagehasno_stdsupport in many of its subcrates, but there's room for it to be extended to more of these crates. Most notably missing iszune-imageitself.Solution
zune-coreserdefeature inno_stdzune-inflatestdfeature (only used forstd::error::Error), opting for increasing MSRV to 1.81 forcore::error::Errorzune-jpegstdfeature (only used forstd::error::Error), opting for increasing MSRV to 1.81 forcore::error::Errorzune-pnglibmfeature to allow more functionality inno_std.zune-qoistdfeature (only used forstd::error::Error), opting for increasing MSRV to 1.81 forcore::error::Errorzune-bmpstdfeature (only used for documentation), opting for addingstdfeature indev-dependenciesinsteadzune-gifno_std(only minor changes required)zune-hdrlibmfeature to allowno_std.zune-jpegxlto usecore::error::Errorzune-imagestdandlibmfeatures for the same reasons as aboveno_std, with the exception ofjpeg-xl, asjxl-oxideis notno_stdcompatible at this time.zune-imageprocsstdandlibmfeatures for the same reasons as aboveNotes