soundness by a10y · Pull Request #235 · spiraldb/fsst

a10y · 2026-06-24T16:16:26Z

Fix OOB read in `Decompressor::decompress_into`

The bug

Decompression maps each code byte to a symbol with self.symbols.get_unchecked(code) /
self.lengths.get_unchecked(code). Those slices are only n_symbols long, but nothing
guaranteed the incoming code was in range. The escape code is 255, so any non-escape
code in [n_symbols, 254] — from corrupt input, a truncated stream, or a mismatched
symbol table — sailed past the escape handling and indexed the tables out of bounds.
That's an out-of-bounds read, i.e. UB, reachable from safe callers passing attacker- or
corruption-controlled bytes.

The fix

A code is valid iff code < n_symbols. The key idea: validate before every table
access, so get_unchecked is only ever reached with an in-bounds index. No table
padding, no per-call copy, no change to the Decompressor type.

Fast 8-byte loop (escape_mask == 0): a single branchless SWAR check,
any_byte_ge(next_block, n_symbols), validates all eight codes at once before the
unchecked stores. This is the vectorized check.
Escape path: the leading codes (positions 0..first_escape_pos) are validated in
one masked SWAR call. The raw escaped bytes are written directly (not table lookups),
so they need no check.
Byte-at-a-time fallbacks: a scalar code >= n_symbols check before the access.

On a violation we set a flag, break 'decode, and panic after the decode region
rather than returning a length derived from invalid data. We never index out of bounds on
the way there. (The optimizer collapses the flag + final assert into a direct branch to a
single cold panic site — there is no per-iteration flag overhead.)

Decompressor::new now also asserts symbols.len() == lengths.len(). Both tables are
indexed by the same code, so equal length is what lets the single code < n_symbols
bound make both get_uncheckeds sound.

Why `any_byte_ge` is right

It's a branchless SWAR primitive using the hasmore/hasless bit-twiddling identities,
split at threshold 128 so each arm needs only one loop-invariant broadcast constant
(keeping register pressure low). It is covered by an exhaustive unit test that compares
it against a scalar reference over every threshold in 0..=257 crossed with uniform blocks
for all 256 byte values and single-notable-byte blocks in every lane position. If the SWAR
ever disagreed with "does any byte reach the threshold", that test fails — which is exactly
the property the soundness argument relies on (it must never report a block as valid when it
contains an out-of-range code).

Verification

cargo asm (ARM64/Apple Silicon) — the hot-loop check vectorizes as intended: all
eight codes in ~3 ALU ops + a test, one hoisted constant, then a branch to the cold panic
site:

add x2, x17, x11            ; block + bias   (bias hoisted out of the loop)
orr x2, x2, x17             ; | block
tst x2, #0x8080808080808080 ; & HIGH
b.eq <stores>               ; valid -> proceed, else -> panic

Miri — clean on all four invalid-code routes plus the roundtrips. The cases that were
previously UB now panic with no UB reported.
Tests — exhaustive any_byte_ge unit test; four integration tests that drive the panic
through each decode route (fast loop, escape-prefix, byte loop, tail loop); all pre-existing
roundtrip tests; clippy (-D warnings) and rustfmt clean.

Performance (clean A/B vs `develop`, criterion baselines)

Normal trained-table decompression (cf8): within noise (~+1–3%). This is the genuine
cost of adding a per-block validation to the hot path.
All-escape (empty symbol table, pathological): ~+13%.

The all-escape regression is not the check's runtime cost — that regime never executes the
check (every byte is an escape). The larger loop body leads LLVM to factor the loop latch into
a shared block (one extra unconditional jump per iteration) instead of develop's inline,
fused cmp/ccmp/b.lo latch. It's a code-layout artifact. I tried four ways to recover it
(drop the prefix check, #[inline(never)], a 1-constant SWAR, inline panic! in place of the
flag); all land at the same numbers, so it's inherent to growing the loop body. Accepted as
the cost of soundness on a degenerate regime — real, trained-table decompression is unaffected.

Reviewing the diff

src/lib.rs shows a large line count, but most of it is whitespace: the new
'decode { … } block re-indents the decode region. Use git diff -w to see the ~40 lines of
actual logic change.

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

codspeed-hq · 2026-06-24T16:18:33Z

Merging this PR will degrade performance by 13.73%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 1 regressed benchmark
✅ 29 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`decompress-into-reuse`	834.4 ns	967.2 ns	-13.73%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing aduffy/soundness (2caf476) with develop (240b7b0)}

soundness

2caf476

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

soundness#235

soundness#235
a10y wants to merge 1 commit into
developfrom
aduffy/soundness

a10y commented Jun 24, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

a10y commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix OOB read in Decompressor::decompress_into

The bug

The fix

Why any_byte_ge is right

Verification

Performance (clean A/B vs develop, criterion baselines)

Reviewing the diff

Uh oh!

codspeed-hq Bot commented Jun 24, 2026

Merging this PR will degrade performance by 13.73%

Performance Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

a10y commented Jun 24, 2026 •

edited

Loading

Fix OOB read in `Decompressor::decompress_into`

Why `any_byte_ge` is right

Performance (clean A/B vs `develop`, criterion baselines)