Skip to content

fix: port three upstream hardening follow-ups to Rust core#3

Merged
quangdang46 merged 2 commits into
mainfrom
devin/1778415000-tier2a-harden-followups
May 9, 2026
Merged

fix: port three upstream hardening follow-ups to Rust core#3
quangdang46 merged 2 commits into
mainfrom
devin/1778415000-tier2a-harden-followups

Conversation

@quangdang46
Copy link
Copy Markdown
Owner

Summary

Three small, isolated upstream Python fixes ported to the Rust core. All three are defensive/hardening — they sit on top of (or alongside) the four fixes already shipped in PR #2.

# Upstream Rust file Bug it prevents
1 5488e7b (upstream) crates/core/src/miner.rs Mining a generated artifact (CSV/JSON dump, lockfile not in SKIP_FILES, build output) emits thousands of chunks in one batch. Triggers ONNX bad_alloc on Windows; otherwise just silently bloats the palace.
2 2ff6283 (upstream) crates/core/src/diary_ingest.rs Pre-Fix-#3 state file (PR #2) had size only. The legacy size-skip path correctly avoided a re-ingest, but never wrote the missing content_hash back, so post-upgrade same-size edits would slip through the legacy path indefinitely.
3 7545238 (upstream) crates/core/src/exporter.rs TOCTOU window between the directory-level reject_symlink check (PR #2 Fix #2) and the per-file open. A symlink swapped in during that window would still redirect writes.

What changed

1. Miner Windows hardening (5488e7b)

  • SKIP_FILES gains pnpm-lock.yaml and yarn.lock (the JS/TS lockfiles upstream now skips).
  • MAX_CHUNKS_PER_FILE = 500 constant. mine_file() checks the chunk count and refuses to embed any single file that exceeds the cap, printing a concrete skip message naming the file. This catches the broader class of generated artifacts that a named-file list cannot fully cover.

2. Diary state backfill (2ff6283)

  • When the legacy size-skip path triggers (no content_hash in prior state, size unchanged), ingest_diaries now writes back a state entry with the freshly-computed hash and persists the state file even if no drawer was rewritten. Subsequent runs use the strict hash check.

3. Exporter per-file symlink check (7545238)

  • New safe_open_for_write(path, append) helper used for both room files and index.md. POSIX path uses O_NOFOLLOW (OpenOptions::custom_flags(libc::O_NOFOLLOW)) so the open itself fails with ELOOP on a symlinked target. Windows path falls back to symlink_metadata pre-check (narrower than the no-check baseline).

What was NOT ported, and why

These four upstream commits were inspected and intentionally skipped because the underlying mechanism does not exist in the Rust port:

Upstream Reason it doesn't apply
5134a63 SQLite integrity preflight Rust palace uses HNSW (embedvec), not chromadb/SQLite — no equivalent failure mode.
d5ce97c palace lock byte-0 sentinel Rust uses an atomic-create file-existence lock, not msvcrt.locking byte-range locks. The Windows bugs the sentinel addresses (current-position locking, byte-0 read-block) do not exist here.
ef8d83c lock holder + non-zero exit Already implemented: MineAlreadyRunning in mine_palace_lock.rs carries the holder PID and cli.rs calls std::process::exit(1) on contention.
71804c0 / 3a76360 hooks Popen detach + per-target PID guard Rust hooks_cli.rs only emits JSON for the harness; it does not spawn background mining processes. The parent-blocked-on-child symptom and the per-target guard around Popen have no surface here.

Review & Testing Checklist for Human

  • Read the new miner skip-with-warning message format to make sure it's the diagnostic you want users to see when a file gets capped (line 710-727 of miner.rs).
  • Spot-check the diary backfill path: a state file with {"size": N} but no content_hash, content unchanged on disk, second run should write the hash back to disk while still printing nothing.
  • Spot-check the exporter symlink rejection: pre-place a symlink at <output>/<wing>/<room>.md pointing at e.g. /etc/passwd, run mpr export, verify it errors out before writing instead of redirecting.
  • CI must pass on all four jobs (ubuntu, macos, windows, GitGuardian) just like PR fix: bug fixes to Rust core #2.

Notes

  • 386/386 tests pass locally (was 380 in PR fix: bug fixes to Rust core #2; +6 new regression tests).
  • No new dependencies. libc was already a workspace dep used elsewhere in the crate.
  • The O_NOFOLLOW path uses OpenOptions::custom_flags rather than the unstable io::ErrorKind::FilesystemLoop (only checks raw_os_error() == Some(libc::ELOOP)).
  • Each fix has an inline comment citing the upstream commit hash so the next sync knows what's already in.

Mirrors three small, isolated upstream Python commits to the Rust port.
Each fix is gated by regression tests that fail on the pre-fix code.

1. miner Windows hardening (upstream 5488e7b)
   - SKIP_FILES gains pnpm-lock.yaml + yarn.lock so JS/TS lockfiles
     don't get embedded.
   - mine_file() refuses to embed any file that produces more than
     MAX_CHUNKS_PER_FILE chunks (500) and prints a concrete skip
     message naming the file. Catches the broader class of generated
     artifacts (CSV/JSON dumps, build outputs, lockfiles not yet in
     SKIP_FILES) that the named-file list cannot fully cover.
     Upstream cited Windows ONNX 'bad_alloc' as the symptom this
     prevents.

2. diary state backfill (upstream 2ff6283, follow-up to PR #2 fix #3)
   - When a pre-Fix-#3 state file has size set but no content_hash,
     the size-skip path correctly avoids re-ingesting the diary, but
     also writes the missing hash back so subsequent runs use the
     strict hash check. Without the backfill, post-upgrade same-size
     edits would still slip through the legacy code path forever.

3. exporter per-file symlink check (upstream 7545238, follow-up to
   PR #2 fix #2)
   - safe_open_for_write() opens room files and index.md with
     O_NOFOLLOW on POSIX so the open itself fails ELOOP if a symlink
     was swapped in between create-dir and file-open. Closes the
     TOCTOU window the directory-level reject_symlink leaves open.
     Windows path falls back to a symlink_metadata pre-check.

Tests: 386/386 passing locally (+6 new regression tests).

Out of scope (analyzed, do not apply to Rust):
- 5134a63 SQLite preflight: Rust palace uses HNSW, not chromadb.
- d5ce97c palace lock byte-0 sentinel: Rust uses file-existence locks.
- ef8d83c lock holder + non-zero exit: already implemented.
- 71804c0 / 3a76360 hooks Popen detach + PID guard: Rust hooks emit
  JSON only, do not spawn background processes.
ingest_diaries canonicalizes diary_dir internally (resolve_path).
On macOS /tmp -> /private/tmp and on Windows the verbatim UNC form,
both differ from the raw temp.path().join('diaries') used in the
regression test — so state_file_for produced a different SHA256 key
and read_to_string returned ENOENT.

Look the state file up via the same canonical form ingest_diaries
actually used. Linux /tmp didn't symlink, which is why local CI
passed but macOS/Windows didn't.

Test result: 386/386 passing locally on Linux; macOS/Windows
expected to pass on next CI run.
@quangdang46 quangdang46 merged commit 8ee1dde into main May 9, 2026
4 checks passed
@quangdang46 quangdang46 deleted the devin/1778415000-tier2a-harden-followups branch May 9, 2026 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant