Skip to content

Commit cd6674e

Browse files
committed
fix!: detect SHA‐1 collision attacks
Fix [GHSA-2frx-2596-x5r6]. [GHSA-2frx-2596-x5r6]: GHSA-2frx-2596-x5r6 This uses the `sha1-checked` crate from the RustCrypto project. It’s a pure Rust implementation, with no SIMD or assembly code. Raw hashing is somewhere around 0.25× to 0.65× the speed of the previous implementation, depending on the feature configuration and whether the CPU supports hardware‐accelerated hashing. (The more portable assembly in `sha1-asm` that doesn’t require the SHA instruction set doesn’t seem to speed things up that much; in fact, `sha1_smol` somehow regularly beats the assembly code used by `sha1` on my i9‐9880H MacBook Pro! Presumably this is why that path was removed in newer versions of the `sha1` crate.) Performance on an end‐to‐end `gix no-repo pack verify` benchmark using pack files from the Linux kernel Git server measures around 0.41× to 0.44× compared to the base commit on an M2 Max and a Ryzen 7 5800X, both of which have hardware instructions for SHA‐1 acceleration that the previous implementation uses but this one does not. On the i9‐9880H, it’s around 0.58× to 0.60× the speed; the slowdown is reduced by the older hardware’s lack of SHA‐1 instructions. The `sha1collisiondetection` crate from the Sequoia PGP project, based on a modified C2Rust translation of the library used by Git, was also considered; although its raw hashing performance seems to measure around 1.12–1.15× the speed of `sha1-checked` on x86, it’s indistinguishable from noise on the end‐to‐end benchmark, and on an M2 Max `sha1-checked` is consistently around 1.03× the speed of `sha1collisiondetection` on that benchmark. The `sha1collisiondetection` crate has also had a soundness issue in the past due to the automatic C translation, whereas `sha1-checked` has only one trivial `unsafe` block. On the other hand, `sha1collisiondetection` is used by both Sequoia itself and the `gitoid` crate, whereas rPGP is the only major user of `sha1-checked`. I don’t think there’s a clear winner here. The performance regression is very unfortunate, but the [SHAttered] attack demonstrated a collision back in 2017, and the 2020 [SHA‐1 is a Shambles] attack demonstrated a practical chosen‐prefix collision that broke the use of SHA‐1 in OpenPGP, costing $75k to perform, with an estimate of $45k to replicate at the time of publication and $11k for a classical collision. [SHAttered]: https://shattered.io/ [SHA‐1 is a Shambles]: https://sha-mbles.github.io/ Given the increase in GPU performance and production since then, that puts the Git object format squarely at risk. Git mitigated this attack in 2017; the algorithm is fairly general and detects all the existing public collisions. My understanding is that an entirely new cryptanalytic approach would be required to develop a collision attack for SHA‐1 that would not be detected with very high probability. I believe that the speed penalty could be mitigated, although not fully eliminated, by implementing a version of the hardened SHA‐1 function that makes use of SIMD. For instance, the assembly code used by `openssl speed sha1` on my i9‐9880H measures around 830 MiB/s, compared to the winning 580 MiB/s of `sha1_smol`; adding collision detection support to that would surely incur a performance penalty, but it is likely that it could be much more competitive with the performance before this commit than the 310 MiB/s I get with `sha1-checked`. I haven’t been able to find any existing work on this; it seems that more or less everyone just uses the original C library that Git does, presumably because nothing except Git and OpenPGP is still relying on SHA‐1 anyway… The performance will never compete with the >2 GiB/s that can be achieved with the x86 SHA instruction set extension, as the `SHA1RNDS4` instruction sadly runs four rounds at a time while the collision detection algorithm requires checks after every round, but I believe SIMD would still offer a significant improvement, and the AArch64 extension seems like it may be more flexible. I know that these days the Git codebase has an additional faster unsafe API without these checks that it tries to carefully use only for operations that do not depend on hashing results for correctness or safety. I personally believe that’s not a terribly good idea, as it seems easy to misuse in a case where correctness actually does matter, but maybe that’s just my Rust safety bias talking. I think it would be better to focus on improving the performance of the safer algorithm, as I think that many of the operations where the performance penalty is the most painful are dealing with untrusted input anyway. The `Hasher` struct gets a lot bigger; I don’t know if this is an issue or not, but if it is, it could potentially be boxed. Closes: #585
1 parent 6c46a07 commit cd6674e

File tree

22 files changed

+90
-149
lines changed

22 files changed

+90
-149
lines changed

.github/workflows/ci.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -391,7 +391,7 @@ jobs:
391391
- name: features of gix-features
392392
run: |
393393
set +x
394-
for feature in progress fs-walkdir-parallel parallel io-pipe crc32 zlib zlib-rust-backend fast-sha1 rustsha1 cache-efficiency-debug; do
394+
for feature in progress fs-walkdir-parallel parallel io-pipe crc32 zlib zlib-rust-backend cache-efficiency-debug; do
395395
(cd gix-features && cargo build --features "$feature" --target "$TARGET")
396396
done
397397
- name: crates with 'wasm' feature

.github/workflows/release.yml

+1-2
Original file line numberDiff line numberDiff line change
@@ -137,8 +137,7 @@ jobs:
137137
os: windows-latest
138138
- target: aarch64-pc-windows-msvc
139139
os: windows-latest
140-
# on linux we build with musl which causes trouble with open-ssl. For now, just build max-pure there
141-
# even though we could also build with `--features max-control,http-client-reqwest,gitoxide-core-blocking-client,gix-features/fast-sha1` for fast hashing.
140+
# on linux we build with musl which causes trouble with open-ssl. For now, just build max-pure there.
142141
# It's a TODO.
143142
exclude:
144143
- target: x86_64-unknown-linux-musl

Cargo.lock

+6-7
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

+8-12
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ max = ["max-control", "fast", "gitoxide-core-tools-query", "gitoxide-core-tools-
4545
## transports as it uses Rust's HTTP implementation.
4646
##
4747
## As fast as possible, with TUI progress, progress line rendering with auto-configuration, all transports available but less mature pure Rust HTTP implementation, all `ein` tools, CLI colors and local-time support, JSON output, regex support for rev-specs.
48-
max-pure = ["max-control", "gix-features/rustsha1", "gix-features/zlib-rust-backend", "http-client-reqwest", "gitoxide-core-blocking-client"]
48+
max-pure = ["max-control", "gix-features/zlib-rust-backend", "http-client-reqwest", "gitoxide-core-blocking-client"]
4949

5050
## Like `max`, but with more control for configuration. See the *Package Maintainers* headline for more information.
5151
max-control = ["tracing", "fast-safe", "pretty-cli", "gitoxide-core-tools", "prodash-render-line", "prodash-render-tui", "prodash/render-line-autoconfigure", "gix/revparse-regex"]
@@ -60,7 +60,7 @@ lean = ["fast", "tracing", "pretty-cli", "http-client-curl", "gitoxide-core-tool
6060
## This build is essentially limited to local operations without any fanciness.
6161
##
6262
## Optimized for size, no parallelism thus much slower, progress line rendering.
63-
small = ["pretty-cli", "gix-features/rustsha1", "gix-features/zlib-rust-backend", "prodash-render-line", "is-terminal"]
63+
small = ["pretty-cli", "gix-features/zlib-rust-backend", "prodash-render-line", "is-terminal"]
6464

6565
## Like lean, but uses Rusts async implementations for networking.
6666
##
@@ -74,7 +74,7 @@ small = ["pretty-cli", "gix-features/rustsha1", "gix-features/zlib-rust-backend"
7474
lean-async = ["fast", "tracing", "pretty-cli", "gitoxide-core-tools", "gitoxide-core-tools-query", "gitoxide-core-tools-corpus", "gitoxide-core-async-client", "prodash-render-line"]
7575

7676
#! ### Package Maintainers
77-
#! `*-control` features leave it to you to configure C libraries, involving choices for `zlib`, ! hashing and transport implementation.
77+
#! `*-control` features leave it to you to configure C libraries, involving choices for `zlib` and transport implementation.
7878
#!
7979
#! Additional features *can* be provided with `--features` and are handled by the [`gix-features` crate](https://docs.rs/gix-features/latest).
8080
#! If nothing else is specified, the Rust implementation is used. ! Note that only one feature of each section can be enabled at a time.
@@ -84,28 +84,25 @@ lean-async = ["fast", "tracing", "pretty-cli", "gitoxide-core-tools", "gitoxide-
8484
#! - `gix-features/zlib-ng-compat`
8585
#! - `gix-features/zlib-stock`
8686
#! - `gix-features/zlib-rust-backend` (*default if no choice is made*)
87-
#! * **sha1**
88-
#! - `gix-features/fast-sha1`
89-
#! - `gix-features/rustsha1` (*default if no choice is made*)
9087
#! * **HTTP** - see the *Building Blocks for mutually exclusive networking* headline
9188
#!
9289
#! #### Examples
9390
#!
9491
#! * `cargo build --release --no-default-features --features max-control,gix-features/zlib-stock,gitoxide-core-blocking-client,http-client-curl`
9592
#! - Create a build just like `max`, but using the stock `zlib` library instead of `zlib-ng`
96-
#! * `cargo build --release --no-default-features --features max-control,http-client-reqwest,gitoxide-core-blocking-client,gix-features/fast-sha1`
97-
#! - Create a build just like `max-pure`, but with faster hashing due to `fast-sha1`.
93+
#! * `cargo build --release --no-default-features --features max-control,http-client-reqwest,gitoxide-core-blocking-client,gix-features/zlib-ng`
94+
#! - Create a build just like `max-pure`, but with faster compression due to `zlib-ng`.
9895

9996
#! ### Building Blocks
10097
#! Typical combinations of features of our dependencies, some of which are referred to in the `gitoxide` crate's code for conditional compilation.
10198

10299
## Makes the crate execute as fast as possible by supporting parallel computation of otherwise long-running functions
103-
## as well as fast, hardware accelerated hashing, along with a faster zlib backend.
100+
## as well as a faster zlib backend.
104101
## If disabled, the binary will be visibly smaller.
105102
fast = ["gix/max-performance", "gix/comfort"]
106103

107104
## Makes the crate execute as fast as possible by supporting parallel computation of otherwise long-running functions
108-
## as well as fast, hardware accelerated hashing, along with a faster zlib backend.
105+
## as well as a faster zlib backend.
109106
## If disabled, the binary will be visibly smaller.
110107
fast-safe = ["gix/max-performance-safe", "gix/comfort"]
111108

@@ -205,8 +202,7 @@ gix-hash = { opt-level = 3 }
205202
gix-actor = { opt-level = 3 }
206203
gix-config = { opt-level = 3 }
207204
miniz_oxide = { opt-level = 3 }
208-
sha1 = { opt-level = 3 }
209-
sha1_smol = { opt-level = 3 }
205+
sha1-checked = { opt-level = 3 }
210206

211207
[profile.release]
212208
overflow-checks = false

SHORTCOMINGS.md

-4
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,3 @@ This file is for tracking features that are less well implemented or less powerf
3535
* **gix-url** _might_ be more restrictive than what git allows as for the most part, it uses a browser grade URL parser.
3636
* Thus far there is no proof for this, and as _potential remedy_ we could certainly re-implement exactly what git does
3737
to handle its URLs.
38-
39-
### `gix-features`
40-
41-
* **sha1** isn't hardened (i.e. doesn't have collision detection). Needs [to be contributed](https://github.com/GitoxideLabs/gitoxide/issues/585).

crate-status.md

-2
Original file line numberDiff line numberDiff line change
@@ -894,8 +894,6 @@ See its [README.md](https://github.com/GitoxideLabs/gitoxide/blob/main/gix-lock/
894894
* `in_parallel`
895895
* `join`
896896
* _When off all functions execute serially_
897-
* **fast-sha1**
898-
* provides a faster SHA1 implementation using CPU intrinsics
899897
* [x] API documentation
900898

901899
### gix-tui

gix-commitgraph/Cargo.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ doctest = false
2020
serde = ["dep:serde", "gix-hash/serde", "bstr/serde"]
2121

2222
[dependencies]
23-
gix-features = { version = "^0.40.0", path = "../gix-features", features = ["rustsha1"] }
23+
gix-features = { version = "^0.40.0", path = "../gix-features" }
2424
gix-hash = { version = "^0.16.0", path = "../gix-hash" }
2525
gix-chunk = { version = "^0.4.11", path = "../gix-chunk" }
2626

gix-features/Cargo.toml

+3-26
Original file line numberDiff line numberDiff line change
@@ -80,40 +80,25 @@ zlib-stock = ["zlib", "flate2?/zlib"]
8080
## may build in environments where other backends don't.
8181
zlib-rust-backend = ["zlib", "flate2?/rust_backend"]
8282

83-
#! ### Mutually Exclusive SHA1
84-
## A fast SHA1 implementation is critical to `gitoxide's` object database performance
85-
## A multi-crate implementation that can use hardware acceleration, thus bearing the potential for up to 2Gb/s throughput on
86-
## CPUs that support it, like AMD Ryzen or Intel Core i3, as well as Apple Silicon like M1.
87-
## Takes precedence over `rustsha1` if both are specified.
88-
fast-sha1 = ["dep:sha1"]
89-
## A standard and well performing pure Rust implementation of Sha1. Will significantly slow down various git operations.
90-
rustsha1 = ["dep:sha1_smol"]
91-
9283
#! ### Other
9384

9485
## Count cache hits and misses and print that debug information on drop.
9586
## Caches implement this by default, which costs nothing unless this feature is enabled
9687
cache-efficiency-debug = []
9788

98-
[[test]]
99-
name = "hash"
100-
path = "tests/hash.rs"
101-
required-features = ["rustsha1"]
102-
10389
[[test]]
10490
name = "parallel"
10591
path = "tests/parallel_threaded.rs"
106-
required-features = ["parallel", "rustsha1"]
92+
required-features = ["parallel"]
10793

10894
[[test]]
10995
name = "multi-threaded"
11096
path = "tests/parallel_shared_threaded.rs"
111-
required-features = ["parallel", "rustsha1"]
97+
required-features = ["parallel"]
11298

11399
[[test]]
114100
name = "single-threaded"
115101
path = "tests/parallel_shared.rs"
116-
required-features = ["rustsha1"]
117102

118103
[[test]]
119104
name = "pipe"
@@ -133,10 +118,8 @@ parking_lot = { version = "0.12.0", default-features = false, optional = true }
133118
jwalk = { version = "0.8.1", optional = true }
134119
walkdir = { version = "2.3.2", optional = true } # used when parallel is off
135120

136-
# hashing and 'fast-sha1' feature
137-
sha1_smol = { version = "1.0.0", optional = true }
121+
# hashing
138122
crc32fast = { version = "1.2.1", optional = true }
139-
sha1 = { version = "0.10.0", optional = true }
140123

141124
# progress
142125
prodash = { version = "29.0.1", optional = true }
@@ -159,12 +142,6 @@ libc = { version = "0.2.119" }
159142
[dev-dependencies]
160143
bstr = { version = "1.3.0", default-features = false }
161144

162-
163-
# Assembly doesn't yet compile on MSVC on windows, but does on GNU, see https://github.com/RustCrypto/asm-hashes/issues/17
164-
# At this time, only aarch64, x86 and x86_64 are supported.
165-
[target.'cfg(all(any(target_arch = "aarch64", target_arch = "x86", target_arch = "x86_64"), not(target_os = "windows")))'.dependencies]
166-
sha1 = { version = "0.10.0", optional = true, features = ["asm"] }
167-
168145
[package.metadata.docs.rs]
169146
all-features = true
170147
features = ["document-features"]

gix-features/src/hash.rs

-50
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,4 @@
11
//! Hash functions and hash utilities
2-
//!
3-
//! With the `fast-sha1` feature, the `Sha1` hash type will use a more elaborate implementation utilizing hardware support
4-
//! in case it is available. Otherwise the `rustsha1` feature should be set. `fast-sha1` will take precedence.
5-
//! Otherwise, a minimal yet performant implementation is used instead for a decent trade-off between compile times and run-time performance.
6-
#[cfg(all(feature = "rustsha1", not(feature = "fast-sha1")))]
7-
mod _impl {
8-
use super::Digest;
9-
10-
/// A implementation of the Sha1 hash, which can be used once.
11-
#[derive(Default, Clone)]
12-
pub struct Sha1(sha1_smol::Sha1);
13-
14-
impl Sha1 {
15-
/// Digest the given `bytes`.
16-
pub fn update(&mut self, bytes: &[u8]) {
17-
self.0.update(bytes);
18-
}
19-
/// Finalize the hash and produce a digest.
20-
pub fn digest(self) -> Digest {
21-
self.0.digest().bytes()
22-
}
23-
}
24-
}
25-
26-
/// A hash-digest produced by a [`Hasher`] hash implementation.
27-
#[cfg(any(feature = "fast-sha1", feature = "rustsha1"))]
28-
pub type Digest = [u8; 20];
29-
30-
#[cfg(feature = "fast-sha1")]
31-
mod _impl {
32-
use sha1::Digest;
33-
34-
/// A implementation of the Sha1 hash, which can be used once.
35-
#[derive(Default, Clone)]
36-
pub struct Sha1(sha1::Sha1);
37-
38-
impl Sha1 {
39-
/// Digest the given `bytes`.
40-
pub fn update(&mut self, bytes: &[u8]) {
41-
self.0.update(bytes);
42-
}
43-
/// Finalize the hash and produce a digest.
44-
pub fn digest(self) -> super::Digest {
45-
self.0.finalize().into()
46-
}
47-
}
48-
}
49-
50-
#[cfg(any(feature = "rustsha1", feature = "fast-sha1"))]
51-
pub use _impl::Sha1 as Hasher;
522
533
/// Compute a CRC32 hash from the given `bytes`, returning the CRC32 hash.
544
///

gix-features/tests/hash.rs

-16
This file was deleted.

gix-hash/Cargo.toml

+3-2
Original file line numberDiff line numberDiff line change
@@ -20,17 +20,18 @@ test = false
2020
serde = ["dep:serde"]
2121

2222
[dependencies]
23-
gix-features = { version = "^0.40.0", path = "../gix-features", features = ["rustsha1", "progress"] }
23+
gix-features = { version = "^0.40.0", path = "../gix-features", features = ["progress"] }
2424

2525
thiserror = "2.0.0"
2626
faster-hex = { version = "0.9.0" }
2727
serde = { version = "1.0.114", optional = true, default-features = false, features = ["derive"] }
28+
sha1-checked = { version = "0.10.0", default-features = false }
2829

2930
document-features = { version = "0.2.0", optional = true }
3031

3132
[dev-dependencies]
3233
gix-testtools = { path = "../tests/tools" }
33-
gix-features = { path = "../gix-features", features = ["rustsha1"] }
34+
gix-features = { path = "../gix-features" }
3435

3536
[package.metadata.docs.rs]
3637
all-features = true

gix-hash/src/hasher/mod.rs

+45-4
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,65 @@
1+
use sha1_checked::{CollisionResult, Digest};
2+
13
/// The error returned by [`Hasher::try_finalize()`].
24
#[derive(Debug, thiserror::Error)]
35
#[allow(missing_docs)]
4-
pub enum Error {}
6+
pub enum Error {
7+
#[error("Detected SHA-1 collision attack with digest {digest}")]
8+
CollisionAttack { digest: crate::ObjectId },
9+
}
510

611
/// A implementation of the Sha1 hash, which can be used once.
7-
#[derive(Default, Clone)]
8-
pub struct Hasher(gix_features::hash::Hasher);
12+
///
13+
/// We use [`sha1_checked`] to implement the same collision detection
14+
/// algorithm as Git.
15+
#[derive(Clone)]
16+
pub struct Hasher(sha1_checked::Sha1);
17+
18+
impl Default for Hasher {
19+
#[inline]
20+
fn default() -> Self {
21+
// This matches the configuration used by Git, which only uses
22+
// the collision detection to bail out, rather than computing
23+
// alternate “safe hashes” for inputs where a collision attack
24+
// was detected.
25+
Self(sha1_checked::Builder::default().safe_hash(false).build())
26+
}
27+
}
928

1029
impl Hasher {
1130
/// Digest the given `bytes`.
1231
pub fn update(&mut self, bytes: &[u8]) {
1332
self.0.update(bytes);
1433
}
1534
/// Finalize the hash and produce an object ID.
35+
///
36+
/// Returns [`Error`] if a collision attack is detected.
37+
#[inline]
1638
pub fn try_finalize(self) -> Result<crate::ObjectId, Error> {
17-
Ok(self.0.digest().into())
39+
match self.0.try_finalize() {
40+
CollisionResult::Ok(digest) => Ok(crate::ObjectId::Sha1(digest.into())),
41+
CollisionResult::Mitigated(_) => {
42+
// SAFETY: `CollisionResult::Mitigated` is only
43+
// returned when `safe_hash()` is on. `Hasher`’s field
44+
// is private, and we only construct it in the
45+
// `Default` instance, which turns `safe_hash()` off.
46+
//
47+
// As of Rust 1.84.1, the compiler can’t figure out
48+
// this function cannot panic without this.
49+
#[allow(unsafe_code)]
50+
unsafe {
51+
std::hint::unreachable_unchecked()
52+
}
53+
}
54+
CollisionResult::Collision(digest) => Err(Error::CollisionAttack {
55+
digest: crate::ObjectId::Sha1(digest.into()),
56+
}),
57+
}
1858
}
1959
}
2060

2161
/// Produce a hasher suitable for the given kind of hash.
62+
#[inline]
2263
pub fn hasher(kind: crate::Kind) -> Hasher {
2364
match kind {
2465
crate::Kind::Sha1 => Hasher::default(),

gix-hash/tests/hash.rs

+1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
use gix_hash::ObjectId;
22

3+
mod hasher;
34
mod kind;
45
mod object_id;
56
mod oid;

gix-hash/tests/hasher/mod.rs

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
use gix_hash::Hasher;
2+
3+
#[test]
4+
fn size_of_sha1() {
5+
assert_eq!(std::mem::size_of::<Hasher>(), 824);
6+
}

0 commit comments

Comments
 (0)