soundness#235
Draft
a10y wants to merge 1 commit into
Draft
Conversation
Merging this PR will degrade performance by 13.73%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | decompress-into-reuse |
834.4 ns | 967.2 ns | -13.73% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing aduffy/soundness (2caf476) with develop (240b7b0)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix OOB read in
Decompressor::decompress_intoThe bug
Decompression maps each code byte to a symbol with
self.symbols.get_unchecked(code)/self.lengths.get_unchecked(code). Those slices are onlyn_symbolslong, but nothingguaranteed the incoming code was in range. The escape code is
255, so any non-escapecode in
[n_symbols, 254]— from corrupt input, a truncated stream, or a mismatchedsymbol table — sailed past the escape handling and indexed the tables out of bounds.
That's an out-of-bounds read, i.e. UB, reachable from safe callers passing attacker- or
corruption-controlled bytes.
The fix
A code is valid iff
code < n_symbols. The key idea: validate before every tableaccess, so
get_uncheckedis only ever reached with an in-bounds index. No tablepadding, no per-call copy, no change to the
Decompressortype.escape_mask == 0): a single branchless SWAR check,any_byte_ge(next_block, n_symbols), validates all eight codes at once before theunchecked stores. This is the vectorized check.
0..first_escape_pos) are validated inone masked SWAR call. The raw escaped bytes are written directly (not table lookups),
so they need no check.
code >= n_symbolscheck before the access.On a violation we set a flag,
break 'decode, and panic after the decode regionrather than returning a length derived from invalid data. We never index out of bounds on
the way there. (The optimizer collapses the flag + final assert into a direct branch to a
single cold panic site — there is no per-iteration flag overhead.)
Decompressor::newnow also assertssymbols.len() == lengths.len(). Both tables areindexed by the same code, so equal length is what lets the single
code < n_symbolsbound make both
get_uncheckeds sound.Why
any_byte_geis rightIt's a branchless SWAR primitive using the
hasmore/haslessbit-twiddling identities,split at threshold 128 so each arm needs only one loop-invariant broadcast constant
(keeping register pressure low). It is covered by an exhaustive unit test that compares
it against a scalar reference over every threshold in
0..=257crossed with uniform blocksfor all 256 byte values and single-notable-byte blocks in every lane position. If the SWAR
ever disagreed with "does any byte reach the threshold", that test fails — which is exactly
the property the soundness argument relies on (it must never report a block as valid when it
contains an out-of-range code).
Verification
cargo asm(ARM64/Apple Silicon) — the hot-loop check vectorizes as intended: alleight codes in ~3 ALU ops + a test, one hoisted constant, then a branch to the cold panic
site:
previously UB now panic with no UB reported.
any_byte_geunit test; four integration tests that drive the panicthrough each decode route (fast loop, escape-prefix, byte loop, tail loop); all pre-existing
roundtrip tests; clippy (
-D warnings) and rustfmt clean.Performance (clean A/B vs
develop, criterion baselines)cf8): within noise (~+1–3%). This is the genuinecost of adding a per-block validation to the hot path.
The all-escape regression is not the check's runtime cost — that regime never executes the
check (every byte is an escape). The larger loop body leads LLVM to factor the loop latch into
a shared block (one extra unconditional jump per iteration) instead of
develop's inline,fused
cmp/ccmp/b.lolatch. It's a code-layout artifact. I tried four ways to recover it(drop the prefix check,
#[inline(never)], a 1-constant SWAR, inlinepanic!in place of theflag); all land at the same numbers, so it's inherent to growing the loop body. Accepted as
the cost of soundness on a degenerate regime — real, trained-table decompression is unaffected.
Reviewing the diff
src/lib.rsshows a large line count, but most of it is whitespace: the new'decode { … }block re-indents the decode region. Usegit diff -wto see the ~40 lines ofactual logic change.