Skip to content

Extend no_std Support#263

Draft
bushrat011899 wants to merge 5 commits intoetemesi254:devfrom
bushrat011899:no_std
Draft

Extend no_std Support#263
bushrat011899 wants to merge 5 commits intoetemesi254:devfrom
bushrat011899:no_std

Conversation

@bushrat011899
Copy link

Objective

zune-image has no_std support in many of its subcrates, but there's room for it to be extended to more of these crates. Most notably missing is zune-image itself.

Solution

  • Updated zune-core
    • Allowed serde feature in no_std
  • Updated zune-inflate
    • Removed std feature (only used for std::error::Error), opting for increasing MSRV to 1.81 for core::error::Error
  • Updated zune-jpeg
    • Removed std feature (only used for std::error::Error), opting for increasing MSRV to 1.81 for core::error::Error
  • Updated zune-png
    • Added optional libm feature to allow more functionality in no_std.
  • Updated zune-qoi
    • Removed std feature (only used for std::error::Error), opting for increasing MSRV to 1.81 for core::error::Error
  • Updated zune-bmp
    • Removed std feature (only used for documentation), opting for adding std feature in dev-dependencies instead
  • Made zune-gif no_std (only minor changes required)
  • Updated zune-hdr
    • Added optional libm feature to allow no_std.
  • Updated zune-jpegxl to use core::error::Error
  • Updated zune-image
    • Added std and libm features for the same reasons as above
    • Ensured all formats worked in no_std, with the exception of jpeg-xl, as jxl-oxide is not no_std compatible at this time.
  • Updated zune-imageprocs
    • Added std and libm features for the same reasons as above

Notes

  • This PR is pretty monolithic and may be a pain to review. I'm happy to split this up into a couple of smaller PRs if the maintainers would prefer!
  • This effort was inspired by this comment

@etemesi254
Copy link
Owner

Hi, thanks for your work on this PR, wondering what was the motivation (beyond image-rs) ?

And I would prefer smaller PRs, but let me go through this one before anything

@bushrat011899
Copy link
Author

Hi, thanks for your work on this PR, wondering what was the motivation (beyond image-rs) ?

I've been working on no_std support for Bevy for the last 9 or so months, and I'm now at a point where the next major item involves finding a way around std::io usage. To get a better grasp on the state of the art, I'm having a look at some of our dependencies and how they handle no_std IO, which led me to image-rs and now zune-image. I would say this contribution is largely a hobbyist curiosity rather than a personal need for this functionality.

And I would prefer smaller PRs, but let me go through this one before anything

Yeah absolutely! Some of the changes might not fit the style of the project, so I'm happy to change/discard elements. Likewise, I'm also happy to split off smaller PRs, whatever works best.

@etemesi254
Copy link
Owner

Lets start with changes that introduce no new dependencies (libm) first.

Then we can see if we can implement functionality we get from libm as adding a crate to get one function seems like too much for me

@awxkee
Copy link
Contributor

awxkee commented May 19, 2025

If anything, MUSL's libm has some of the worst accuracy among system math libraries, though this is almost always negligible for media applications. However, it's also extremely slow ( all methods at least x2-x3 slower than system libraries, pow/powf x5-x10) — if there's heavy math inside tight loops, it can drive execution time to insane levels.

@etemesi254
Copy link
Owner

if anything, MUSL's libm has some of the worst accuracy among system math libraries, though this is almost always negligible for media applications. However, it's also extremely slow ( all methods at least x2-x3 slower than system libraries, pow/powf x5-x10) — if there's heavy math inside tight loops, it can drive execution time to insane levels.

This may be a concern, particularly for hdr images which need to call exp2 rgb pixel, see comment on

// Note: This is the only reason we need the standard library
.

For png the perf is negligible as we use lookup tables, but may still be significant enough

@awxkee
Copy link
Contributor

awxkee commented May 19, 2025

Just exactly for exp2f I could help.
You could take it from here.

Technically, this is competitive implementation to the glibc in terms of high precision and performance.
The project includes plenty of other methods—most implementations offer competitive performance. However, not all of them adhere to the high-precision rule of less than 1 ULP error, and not everything is truly faster without FMA ( at the very least they are not slower ).

IMHO for png gamma it is definitely better stick to LUT, not depending on math library. Gamma LUTs in practice are worse only for extra small images like 5x5 where everything is fast anyways.

exp2f benches:

x86_64 bench without FMA

libm::exp2f             time:   [37.275 µs 37.285 µs 37.295 µs]

system::exp2f           time:   [30.890 µs 30.895 µs 30.901 µs]

moxcms::exp2f           time:   [27.584 µs 27.600 µs 27.630 µs]

x86_64 bench with FMA enabled

libm::exp2f             time:   [107.39 µs 107.40 µs 107.41 µs]
                        change: [+187.77% +188.12% +188.37%] (p = 0.00 < 0.05)
                        Performance has regressed.

system::exp2f           time:   [28.074 µs 28.077 µs 28.081 µs]
                        change: [−9.4598% −9.1601% −8.9084%] (p = 0.00 < 0.05)
                        Performance has improved.

moxcms::exp2f           time:   [20.787 µs 20.801 µs 20.814 µs]
                        change: [−24.767% −24.607% −24.456%] (p = 0.00 < 0.05)
                        Performance has improved.

aarch64 bench

libm::exp2f             time:   [19.988 µs 20.210 µs 20.464 µs]

system::exp2f           time:   [14.126 µs 14.276 µs 14.427 µs]

moxcms::exp2f           time:   [10.119 µs 10.163 µs 10.217 µs]

@etemesi254
Copy link
Owner

Just exactly for exp2f I could help. You could take it from here.

You pump out some amazing libraries :)

@bushrat011899 bushrat011899 mentioned this pull request May 19, 2025
@bushrat011899 bushrat011899 marked this pull request as draft May 19, 2025 22:33
@bushrat011899
Copy link
Author

Marking as draft as work will continue in separate PRs based on this. See #264 for the first follow-up.

@bushrat011899
Copy link
Author

If anything, MUSL's libm has some of the worst accuracy among system math libraries, though this is almost always negligible for media applications. However, it's also extremely slow ( all methods at least x2-x3 slower than system libraries, pow/powf x5-x10) — if there's heavy math inside tight loops, it can drive execution time to insane levels.

First off, thanks for the information here! Looking at the paper myself though, it appears that MUSL's libm is actually quite competitive in accuracy for the subset of functions this PR would need for no_std compatibility:

Function MUSL Best
cosf 0.501 0.500
exp2f 0.502 0.500
floorf - -
powf 0.817 0.500
roundf - -
sinf 0.501 0.500
sqrtf 0.500 0.500
truncf - -
exp 0.511 0.500
sqrt 0.500 0.500

The outlier here is powf, but all others are very close to LLVM (which had the most accurate implementation for all of the relevant functions).

As for performance, libm may be slower by an order of magnitude in the worst case as you've identified, but that is preferable to the inability to run at all in no_std environments that is the status quo (at least in my opinion). I am perfectly happy to use an alternative to libm if the tradeoff between performance, accuracy, and reliability (libm is maintained by the Rust language team after all) is better met by something else though! All I'm trying to do here is bring compatibility without sacrificing performance for existing users.

@awxkee
Copy link
Contributor

awxkee commented May 20, 2025

The outlier here is powf, but all others are very close to LLVM (which had the most accurate implementation for all of the relevant functions).

Yep, and LLVM is also is not really fast.

There is actually no magic here. IEEE math is well investigated topic for the last 40 years, implementations can be accurate, or fast, but not both. The issue of MUSL's, especially in Rust port of that- that it is not accurate along as not fast.

The outlier here is powf, but all others are very close to LLVM (which had the most accurate implementation for all of the relevant functions).

Just in math terms ULP 0.511 and ULP 0.5 is huge difference because ULP 0.511 can produce 154345310000000000000000000000000 instead of 154345300000000000000000000000000 (see that difference?) that is 2 nearest numbers that f32 can represent. But as I wrote, media applications usually don't operate on numbers of such order, so such precision isn't something really required. Additionally, there’s the concept of error density. A library might have a worst-case error of 0.511 ULP, but in 99.9% of cases, the error could remain below 0.5 ULP. On the other hand, another implementation might also have error peak at 0.511 ULP, but hit that maximum error almost every time. And MUSL's is closer to the second one :) That's just a little theory, again, as I noted in the first place, such accuraccy is not critical for media applications.

Media applications sensitive to speed, and can easy tolerate some accuracy.

As for performance, libm may be slower by an order of magnitude in the worst case as you've identified, but that is preferable to the inability to run at all in no_std environments that is the status quo (at least in my opinion). I am perfectly happy to use an alternative to libm if the tradeoff between performance, accuracy, and reliability (libm is maintained by the Rust language team after all) is better met by something else though! All I'm trying to do here is bring compatibility without sacrificing performance for existing users.

That's not for me to decide.

But, speaking honestly I'm not completely even getting the point of no_std and IEEE math, because in the most of places where no_std is required FPU unit is just not available ( or prohibited to use because IEEE math is not determenistic and not finite ), and there is a completely different level of work on that. And here is also opposite point, where FPU unit is available usually std is also available with the math libraries.

@awxkee
Copy link
Contributor

awxkee commented May 21, 2025

But, speaking honestly I'm not completely even getting the point of no_std and IEEE math, because in the most of places where no_std is required FPU unit is just not available ( or prohibited to use because IEEE math is not determenistic and not finite ), and there is a completely different level of work on that. And here is also opposite point, where FPU unit is available usually std is also available with the math libraries.

I'd like to phrase this more clearly: If these image processing algorithms, which rely heavily on IEEE f32/f64, are truly intended to run on something like a Raspberry Pi Zero 2 W, they need to be redesigned case by case with really dirty math, because those devices are not designed to such kind of workloads ( and those devices that truly needs no_std when they're without OS and program needs to be flashed ). Oth if they are not intented for such devices, it might be more practical to simply disable them in no_std builds, or to use any other replacement, since they will still be running on powerful systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants