Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: detect SHA‐1 collision attacks #1915

Merged
merged 22 commits into from
Apr 3, 2025

Conversation

emilazy
Copy link
Contributor

@emilazy emilazy commented Mar 31, 2025

Fix GHSA-2frx-2596-x5r6.

See commit messages for details, especially the last one, which has some background on the choices made here and benchmark numbers. This is best reviewed commit‐by‐commit.

Closes: #585

Tasks for @emilazy

  • split up change!: move the hashing API to gix_hashinto afeat forgix-hash, a change!forgix-featuresafterwards, and addadapt to changes in 'gix-features and 'gix-hash' commit with all the fixes in now broken crates.
    • conventional commit message drive cargo smart-release, and also the changelog generation. For the changelog messages to be precise. The idea is to not taint crates with breaking changes unnecessarily.
    • Yes, doing so means that once gix-features see their functions removed, the tree will be broken until the next commit that fixes it.
  • 'change!: make the hashing API return ObjectId' should be split up into the breaking change, and the 'adapt' commit right after
    • Very much the same as above.

More notes:

  • 'change!: migrate hash verification to the common interface' would also have been a candidate to split, but I thought these crates are tainted anyway by gix-hash, and the change is indeed breaking for all (or most) of them. So it's fine also in the changelogs.
    • This also applies to 'change!: use separate error type for I/O hashing operations'.

Byron's Tasks

  • refactor gix-hash tests
  • look at every line in detail

@emilazy emilazy force-pushed the push-qvyqmopsoltr branch 3 times, most recently from 47ad297 to cd6674e Compare March 31, 2025 01:48
@Byron
Copy link
Member

Byron commented Mar 31, 2025

Thanks so much for tackling this!

It's a happy and a sad day at the same time. Without a review, I thought I'd give it a quick performance comparison with the operation I know stresses the hasher the most.

❯ hyperfine -w 1 -M 1 'gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx' './target/release/gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx'
Benchmark 1: gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx
  Time (abs ≡):        11.399 s               [User: 70.087 s, System: 2.946 s]

Benchmark 2: ./target/release/gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx
  Time (abs ≡):        27.211 s               [User: 229.958 s, System: 2.932 s]

Summary
  gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx ran
    2.39 times faster than ./target/release/gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx

And here are single runs, in comparison:

❯ cargo run --release --bin gix -- --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx
    Finished `release` profile [optimized] target(s) in 0.67s
     Running `target/release/gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx`
 10:54:09 Hash of index 'pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx' done 212.8MB in 0.42s (504.3MB/s)
 10:54:09                                           collecting sorted index done 7.6M entries in 0.48s (15.7M entries/s)
 10:54:10                                                          indexing done 7.6M objects in 1.77s (4.3M objects/s)
 10:54:11 Hash of pack 'pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack' done 1.4GB in 2.73s (498.8MB/s)
 10:54:35                                                         Resolving done 7.6M objects in 24.47s (310.5K objects/s)
 10:54:35                                                          Decoding done 96.0GB in 24.47s (3.9GB/s)
❯ gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx
 10:54:55 Hash of index 'pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx' done 212.8MB in 0.12s (1.8GB/s)
 10:54:56                                           collecting sorted index done 7.6M entries in 0.48s (15.8M entries/s)
 10:54:56 Hash of pack 'pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack' done 1.4GB in 0.75s (1.8GB/s)
 10:54:58                                                          indexing done 7.6M objects in 1.90s (4.0M objects/s)
 10:55:05                                                         Resolving done 7.6M objects in 7.69s (988.2K objects/s)
 10:55:05                                                          Decoding done 96.0GB in 7.69s (12.5GB/s)

Judging by a bare hash-object call to Git it's so ridiculously slow that I have a feeling the implementation of hash-object is broken not a good way to evaluate the actual hashing performance.

❯ time git hash-object -t blob ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack
f9e03f3b35b34bb0c992414a94e7e14e1a30e63e
git hash-object -t blob   28.65s user 0.35s system 99% cpu 29.054 total

gitoxide ( push-qvyqmopsoltr) [$?] took 29s
❯ l ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack
.rw-r--r--@ 1.3Gi byron staff 12 Aug  2020 ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack

Because this speed would put it at about ~50MB/s which is nonsense.

I will take a closer look in the coming days.

@emilazy emilazy force-pushed the push-qvyqmopsoltr branch 6 times, most recently from c8f2692 to 34e72be Compare March 31, 2025 14:26
@emilazy
Copy link
Contributor Author

emilazy commented Mar 31, 2025

This is now passing CI.

The performance impact is definitely sad, though hopefully the performance of most operations are less directly bound by SHA‐1 hashing speed. As I detailed in the commit message, I believe that significant gains could be made with a SIMD‐accelerated implementation, possibly even using the SHA instruction set extensions for some of the computation where available. I’m no SIMD expert, but may be increasingly on the path to nerd‐sniping myself into trying to implement that, especially if this impacts Jujutsu performance too much…

@Byron
Copy link
Member

Byron commented Mar 31, 2025

Thanks so much!

I’m no SIMD expert, but may be increasingly on the path to nerd‐sniping myself into trying to implement that, especially if this impacts Jujutsu performance too much…

I don't think it will except for when cloning repositories, assuming gitoxide is used for that at all. Of course, creating a bunch of objects now is slower as well, but the time for that is probably still dominated by using DEFLATE for compression.

But there is an elephant in the room which has been camouflaged quite . git2will verify each loaded object to assure its hash actually matches what's advertised. Thus, if you had a shattered object already, it would not let you have it. Also, bitrot won't be an issue there (if it can be an issue at all given that everything is INFLATEd before).
However, gitoxide doesn't currently do that and has no option to turn it on - git2 will allow to turn it off though.

Now that I mentioned this I would think a viable next step is to make verification at least optional, and once that is enabled, then SHA1 will be in the way of every decoded object as well. Would that impact performance? Probably not much but it certainly will be measurable (ein t hours reads all commits for analysis, 145k/s is my baseline that surely would go down a bit then), but we get there when we get there.

@emilazy
Copy link
Contributor Author

emilazy commented Mar 31, 2025

I don't think it will except for when cloning repositories, assuming gitoxide is used for that at all.

Right. We shell out to git(1) for fetches and pushes for maximum compatibility currently; previously we were using git2. So no impact there. I was more thinking about snapshotting large repositories or doing large rebases (or both, if you edit a commit some way down the tree), since those could potentially create a bunch of objects. But it’s probably not too significant compared to other overhead. Still, if I end up cooking up something with SIMD and it seems to bring back some of the performance I’ll let you know :)

(Does git(1) do that verification? git2 fetch and push performance was always very bad for us, and I wonder if that’s part of why.)

@Byron
Copy link
Member

Byron commented Mar 31, 2025

Yes, I will be looking forward to that SIMD implementation of a collision-proof SHA1 :).

❯ hyperfine 'sha256 ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack' 'sha1 ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack'
Benchmark 1: sha256 ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack
  Time (mean ± σ):     810.4 ms ±   0.6 ms    [User: 613.6 ms, System: 195.6 ms]
  Range (min … max):   809.3 ms … 811.2 ms    10 runs

Benchmark 2: sha1 ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack
  Time (mean ± σ):     835.5 ms ±   4.1 ms    [User: 638.5 ms, System: 195.7 ms]
  Range (min … max):   832.5 ms … 846.7 ms    10 runs

(Does git(1) do that verification? git2 fetch and push performance was always very bad for us, and I wonder if that’s part of why.)

No, it does not.

❯ cp .git/objects/02/bfae32bb6647df0f892205edf660a2dffb421a .git/objects/02/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

git ( master) [?] via 🐍
❯ git cat-file -p 02aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
tree d124b9de1a1b1a243f85312003b7fd3d2ee7bfe6
parent a3998d426d4868eb6d49d443ad4298c0f16c2ab4
author Sebastian Thiel <[email protected]> 1691561336 +0200
committer Sebastian Thiel <[email protected]> 1691561336 +0200

refresh fix-git-mv-existing-dir-non%                                                                                                                                                                                  

Probably this is delegated to git fsck which is expected to run from time to time.

Regarding git2, you can probably experiment with some options:

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a closer look now, while still focussing on the bigger picture, commit by commit, and liked it very much.

The only reason for requesting changes is splitting commits, and that I am happy to do for you if the instructions I left in the PR description aren't clear or you think I can probably do it faster. Since this touches on retaining authorship, but creating new commits which would force me to create messages 'under your name', I thought I shouldn't just do it though 😁.

Looking at the last commit I thought for a moment that maybe it's not worth splitting commits because ultimately everything gets touched in one commit anyway (and it's good to do that with one commit message to spread the news into changelogs of many crates), but then again I think it should be done just to get better changelog message.
Again, it's kind of a detail and I am happy to do it.

Once that is decided I think I can finish the review and make remaining changes myself on top of the existing commits.

Assorted notes
  • verify(checksum) is gold
  • the performance tests are great, and I don't feel too bad about it
  • overall I am glad this is done now

@emilazy emilazy force-pushed the push-qvyqmopsoltr branch from 34e72be to bac5c78 Compare April 2, 2025 16:36
emilazy added 11 commits April 2, 2025 17:37
Fix [GHSA-2frx-2596-x5r6].

[GHSA-2frx-2596-x5r6]: GHSA-2frx-2596-x5r6

This uses the `sha1-checked` crate from the RustCrypto project. It’s
a pure Rust implementation, with no SIMD or assembly code.

The hashing implementation moves to `gix-hash`, as it no longer
depends on any feature configuration. I wasn’t sure the ideal
crate to put this in, but after checking reverse dependencies on
crates.io, it seems like there’s essentially no user of `gix-hash`
that wouldn’t be pulling in a hashing implementation anyway, so I
think this is a fine and logical place for it to be.

A fallible API seems better than killing the process as Git does,
since we’re in a library context and it would be bad if you could
perform denial‐of‐service attacks on a server by sending it hash
collisions. (Although there are probably cheaper ways to mount a
denial‐of‐service attack.)

The new API also returns an `ObjectId` rather than `[u8; 20]`; the
vast majority of `Hasher::digest()` users immediately convert the
result to `ObjectId`, so this will help eliminate a lot of cruft
across the tree. `ObjectId` also has nicer `Debug` and `Display`
instances than `[u8; 20]`, and should theoretically make supporting
the hash function transition easier, although I suspect further API
changes will be required for that anyway. I wasn’t sure whether
this would be a good change, as not every digest identifies an
entry in the Git object database, but even many of the existing
uses for non‐object digests across the tree used the `ObjectId`
API anyway. Perhaps it would be best to have a separate non‐alias
`Digest` type that `ObjectId` wraps, but this seems like the pragmatic
choice for now that sticks with current practice.

The old API remains in this commit, as well as a temporary
non‐fallible but `ObjectId`‐returning `Hasher::finalize()`,
pending the migration of all in‐tree callers.

I named the module `gix_hash::hasher` since `gix_hash::hash` seemed
like it would be confusing. This does mean that there is a function
and module with the same name, which is permitted but perhaps a
little strange.

Everything is re‐exported directly other than
`gix_features::hash::Write`, which moves along with the I/O
convenience functions into a new public submodule and becomes
`gix_hash::hasher::io::Write`, as that seems like a clearer name
to me, being akin to the `gix_hash::hasher` function but as an
`std::io::Write` wrapper.

Raw hashing is somewhere around 0.25× to 0.65× the speed of the
previous implementation, depending on the feature configuration
and whether the CPU supports hardware‐accelerated hashing. (The
more portable assembly in `sha1-asm` that doesn’t require the SHA
instruction set doesn’t seem to speed things up that much; in fact,
`sha1_smol` somehow regularly beats the assembly code used by `sha1`
on my i9‐9880H MacBook Pro! Presumably this is why that path was
removed in newer versions of the `sha1` crate.)

Performance on an end‐to‐end `gix no-repo pack verify` benchmark
using pack files from the Linux kernel Git server measures around
0.41× to 0.44× compared to the base commit on an M2 Max and a
Ryzen 7 5800X, both of which have hardware instructions for SHA‐1
acceleration that the previous implementation uses but this one does
not. On the i9‐9880H, it’s around 0.58× to 0.60× the speed;
the slowdown is reduced by the older hardware’s lack of SHA‐1
instructions.

The `sha1collisiondetection` crate from the Sequoia PGP project,
based on a modified C2Rust translation of the library used by Git,
was also considered; although its raw hashing performance seems
to measure around 1.12–1.15× the speed of `sha1-checked` on
x86, it’s indistinguishable from noise on the end‐to‐end
benchmark, and on an M2 Max `sha1-checked` is consistently
around 1.03× the speed of `sha1collisiondetection` on that
benchmark. The `sha1collisiondetection` crate has also had a
soundness issue in the past due to the automatic C translation,
whereas `sha1-checked` has only one trivial `unsafe` block. On the
other hand, `sha1collisiondetection` is used by both Sequoia itself
and the `gitoid` crate, whereas rPGP is the only major user of
`sha1-checked`. I don’t think there’s a clear winner here.

The performance regression is very unfortunate, but the [SHAttered]
attack demonstrated a collision back in 2017, and the 2020 [SHA‐1 is
a Shambles] attack demonstrated a practical chosen‐prefix collision
that broke the use of SHA‐1 in OpenPGP, costing $75k to perform,
with an estimate of $45k to replicate at the time of publication and
$11k for a classical collision.

[SHAttered]: https://shattered.io/
[SHA‐1 is a Shambles]: https://sha-mbles.github.io/

Given the increase in GPU performance and production since then,
that puts the Git object format squarely at risk. Git mitigated this
attack in 2017; the algorithm is fairly general and detects all the
existing public collisions. My understanding is that an entirely new
cryptanalytic approach would be required to develop a collision attack
for SHA‐1 that would not be detected with very high probability.

I believe that the speed penalty could be mitigated, although not
fully eliminated, by implementing a version of the hardened SHA‐1
function that makes use of SIMD. For instance, the assembly code used
by `openssl speed sha1` on my i9‐9880H measures around 830 MiB/s,
compared to the winning 580 MiB/s of `sha1_smol`; adding collision
detection support to that would surely incur a performance penalty,
but it is likely that it could be much more competitive with
the performance before this commit than the 310 MiB/s I get with
`sha1-checked`. I haven’t been able to find any existing work on
this; it seems that more or less everyone just uses the original
C library that Git does, presumably because nothing except Git and
OpenPGP is still relying on SHA‐1 anyway…

The performance will never compete with the >2 GiB/s that can
be achieved with the x86 SHA instruction set extension, as the
`SHA1RNDS4` instruction sadly runs four rounds at a time while the
collision detection algorithm requires checks after every round,
but I believe SIMD would still offer a significant improvement,
and the AArch64 extension seems like it may be more flexible.

I know that these days the Git codebase has an additional faster
unsafe API without these checks that it tries to carefully use only
for operations that do not depend on hashing results for correctness
or safety. I personally believe that’s not a terribly good idea,
as it seems easy to misuse in a case where correctness actually does
matter, but maybe that’s just my Rust safety bias talking. I think
it would be better to focus on improving the performance of the safer
algorithm, as I think that many of the operations where the performance
penalty is the most painful are dealing with untrusted input anyway.

The `Hasher` struct gets a lot bigger; I don’t know if this is
an issue or not, but if it is, it could potentially be boxed.

Closes: GitoxideLabs#585
The hashing API has moved to `gix_hash::hasher`, and we now use
`sha1-checked` unconditionally.
Complete the transition to `ObjectId` returns.
@emilazy emilazy force-pushed the push-qvyqmopsoltr branch from bac5c78 to fdcc33b Compare April 2, 2025 16:38
@emilazy
Copy link
Contributor Author

emilazy commented Apr 2, 2025

Makes sense about the changelogs. I really hate to break bisection though, so I’ve reorganized the commits here to hopefully achieve the same goal of no inapplicable changelog entries while attempting to keep the tree green on every commit. The final tree is almost identical save a few more touch‐ups I noticed, but the path there is different, and probably better overall, although there was a bit of ugly contortion around avoiding a cyclic dependency when moving the hashing API and a temporary rename for migration purposes; I find myself really wishing that cargo-smart-release let you mark something as breaking for one sub‐tree a commit touches but as something else for others, but maybe this isn’t a desire you run into often. Let me know if you hate it more than breaking bisection and I can swap things around again to be temporarily broken :)

I don’t mind you pushing to the PR if you want, although if you have other feedback I’d be happy to address it myself to get a better understanding of what’s expected for future contributions.

@emilazy emilazy changed the title fix!: detect SHA‐1 collision attacks feat!: detect SHA‐1 collision attacks Apr 2, 2025
/// Return the actual checksum on success or [`checksum::Error`] if there is a mismatch.
pub fn verify_checksum(&self) -> Result<gix_hash::ObjectId, checksum::Error> {
// Even though we could use gix_hash::bytes_of_file(…), this would require extending our
// Error type to support io::Error. As we only gain progress, there probably isn't much value // as these files are usually small enough to process them in less than a second, even for the large ones.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing line break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks! Fixed.

emilazy added 9 commits April 2, 2025 21:15
This mostly just affects return types – using
`git_hash::verify::Error` instead of bespoke duplicated versions
thereof, and occasionally returning an `ObjectId` instead of `()`
for convenience.
Prepare for hashing becoming fallible.
This does mean a lot of churn across the tree, but the change is
usually just an adjustment to variants of an existing error type,
so I expect that most downstream users will require little to no
adaption for this change.
`compute_stream_hash` is already fallible, so we don’t want to keep
the `try_*` prefix on the fallible API.
Since the APIs were already adjusted and all callers migrated, we
only need to drop the migration shims.
@emilazy emilazy force-pushed the push-qvyqmopsoltr branch from fdcc33b to a68f115 Compare April 2, 2025 20:15
@Byron
Copy link
Member

Byron commented Apr 3, 2025

Thanks a lot!

Makes sense about the changelogs. I really hate to break bisection though, so I’ve reorganized the commits here to hopefully achieve the same goal of no inapplicable changelog entries while attempting to keep the tree green on every commit. The final tree is almost identical save a few more touch‐ups I noticed, but the path there is different, and probably better overall, although there was a bit of ugly contortion around avoiding a cyclic dependency when moving the hashing API and a temporary rename for migration purposes; I find myself really wishing that cargo-smart-release let you mark something as breaking for one sub‐tree a commit touches but as something else for others, but maybe this isn’t a desire you run into often.

I have many desires when it comes to cargo smart-release, but they are all overruled by not wanting to spend anymore time than I already have in tooling that should be part of the toolchain :D. cargo smart-release was very costly, and now it's time for it to pay off, probably for another 10 years 😅.

Let me know if you hate it more than breaking bisection and I can swap things around again to be temporarily broken :)

I sense some pain in the paragraph above and don't want to inflict more. From a versioning perspective this changes nothing and I think it's fine to just continue as is, focussing on getting this merged today which I think it will.

I don’t mind you pushing to the PR if you want, although if you have other feedback I’d be happy to address it myself to get a better understanding of what’s expected for future contributions.

That's fair - if there should be major changes I will leave them to you, but small ones I will just do myself on top of yours. My feeling is the PR will be merged when these are done.

Byron added 2 commits April 3, 2025 13:06
- align `gix-hash` integration tests with 'new' style.
- reorganize `hasher` module to avoid duplicate paths to the same types/functions.
- use shorter paths to `gix_hash::hasher|io` everywhere.
@Byron
Copy link
Member

Byron commented Apr 3, 2025

Thanks a million, this is it!

The good times with incredibly high hashing performance are over, at least until SHA256 lands (#281). If that's no motivation that I don't know what is :).

More notes
  • I know realize the size of the hasher changed from ~100 bytes to ~800 - an 8x increase is surprising. It turns out that the DetectionState is 724 bytes! Not much to do about that, apparently.
  • I really like the numerous fly-by improvements, and tried to add my own in case of gix-index.

@Byron Byron enabled auto-merge April 3, 2025 05:27
@Byron Byron merged commit 4660f7a into GitoxideLabs:main Apr 3, 2025
21 checks passed
@emilazy
Copy link
Contributor Author

emilazy commented Apr 3, 2025

Yeah, we could maybe box the hasher? Not sure if it’d help or hurt performance.

FWIW, exporting gix_hash::hasher::io as gix_hash::io actually wasn’t intended. I think that use gix_hash::hasher; and hasher::io::{Write, Error} make more sense, because they’re specifically for I/O operations that do hashing, and gix_hash::io::Error has no relation to gix_hash::Error, but does relate to gix_hash::hasher::Error. Of course the API design is up to your judgement here, but I would personally recommend making gix_hash::hasher::io the canonical path and not exposing gix_hash::io.

@emilazy emilazy deleted the push-qvyqmopsoltr branch April 3, 2025 16:11
@EliahKagan
Copy link
Member

I've opened rustsec/advisory-db#2268 to add a RUSTSEC advisory.

(Also, an entry in the GitHub Advisory Database -- i.e. the global version of GHSA-2frx-2596-x5r6 -- will likely be added in the next few days, along with a National Vulnerability Database entry for the associated CVE-2025-31130. These things are taken care of by the GitHub security team.)

@EliahKagan
Copy link
Member

The vulnerability fixed here has been assigned RUSTSEC-2025-0021.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Apr 5, 2025
Due to ABI differences between different 32-bit targets, the
`size_of_hasher` test wrongly failed on `i686-pc-windows-msvc`.

Although the test case with that name was introduced in GitoxideLabs#1915, the
failure is actually long-standing, in that an analogous faiure
occurred in the old `size_of_sha1` test that preceded it and on
which it is based. That failure only happened when the old
`fast-sha1` feature was enabled, and not with the old `rustsha1`
feature. It was not detected earlier as that target is irregularly
tested, and built with `--no-default-features --features max-pure`
more often than other targets due to difficulties building some
other non-Rust dependencies for it. Since GitoxideLabs#1915, the failure
happens more often, since we now use only one SHA-1 implementation,
`sha1-checked`, so the test always fails on `i686-pc-windows-msvc`.

This changes the test to use `gix_testtools::size_ok`, which makes
a `==` comparison on 64-bit targets but a `<=` comparison on 32-bit
targets where there tends to be more variation in data structures'
sizes. This is similar to the fixes in GitoxideLabs#1687 (77c3c59, fc13fc3).
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Apr 5, 2025
Due to ABI differences between different 32-bit targets, the
`size_of_hasher` test wrongly failed on `i686-pc-windows-msvc`.

Although the test case with that name was introduced in GitoxideLabs#1915, the
failure is actually long-standing, in that an analogous faiure
occurred in the old `size_of_sha1` test that preceded it and on
which it is based. That failure only happened when the old
`fast-sha1` feature was enabled, and not with the old `rustsha1`
feature. It was not detected earlier as that target is irregularly
tested, and built with `--no-default-features --features max-pure`
more often than other targets due to difficulties building some
other non-Rust dependencies on it (when not cross-compiling).

Since GitoxideLabs#1915, the failure is easier to detect, since we now use only
one SHA-1 implementation, `sha1-checked`, so the test always fails
on `i686-pc-windows-msvc`. This is only a bug in the test itself,
not in any of the code under test.

This commit changes the test to use `gix_testtools::size_ok`, which
makes a `==` comparison on 64-bit targets but a `<=` comparison on
32-bit targets where there tends to be more variation in data
structures' sizes. This is similar to some of the size assertion
fixes in GitoxideLabs#1687 (77c3c59, fc13fc3).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use hardened Sha1 implementations (collision detection)
4 participants