feat: V3 chunk-splitter alignment + burst-load resilience by jagajaga · Pull Request #49 · vrtmrz/livesync-bridge

jagajaga · 2026-05-31T18:33:55Z

Summary

Makes livesync-bridge work cleanly as a CouchDB↔filesystem relay against
LiveSync V3 (Fine Deduplication) vaults, and hardens it for the burst /
high-volume workloads that exposed several failure modes.

This supersedes #48 (adds the V3 write-path alignment + trystero removal on top
of those hardening commits).

What I found about "V3 decryption failures"

Many users (myself included) saw a storm of OperationError: Decryption failed
from SubtleCrypto.decrypt after enabling V3 on a client. I investigated against
a live ~78k-doc vault and found:

V3 reading already works with the pinned commonlib. V3 chunks use the same
%= HKDF (E2EE V2) envelope as before; the pinned lib decrypts them fine.
Verified three ways: 30/30 sampled chunks decrypt, note reassembly succeeds, and
the bridge itself wrote 1326 files (markdown + binary, incl. post-V3 docs) with
zero decrypt/corruption errors.
The decryption storm appears to be a transient salt-negotiation / stale-state
issue during the V3 migration window, not a missing-format bug. Once the
_local/...sync_parameters salt is settled, decryption is clean.

So this PR does not claim a decryption fix — reading already works.

What actually needed fixing for V3: the write path

When the bridge writes chunks back to CouchDB it used the old Rabin-Karp
splitter, producing boundaries that don't deduplicate against chunks written by
V3 clients. This PR bumps the commonlib submodule to a single-function backport
of splitPiecesRabinKarp from vrtmrz/livesync-commonlib@6abcea69 ("fixed: fine
deduplication now correctly fine") onto the pinned 0.25.25.

Validated against live data: the backported splitter reproduces stored chunk
IDs exactly (in order) for 8/8 real post-V3 notes, and correctly does not
match older V2 notes.

Changes

main.ts — survive pouchdb-adapter-http's expected AsyncWrap node-fetch
rejections under burst load; a sliding-window circuit breaker exits for a clean
Docker restart if the changes-watch becomes permanently wedged.
PeerCouchDB.ts — data integrity: chunk hash verification + dedup + a
persisted since checkpoint + compareDate conflict handling.
deno.jsonc — drop the trystero P2P dependency (the relay never uses it;
previously stripped via a Dockerfile sed).
.gitmodules / lib — bump commonlib to the V3 Rabin-Karp backport.

Submodule note for maintainers

The splitter lives in livesync-commonlib. Upstream commonlib has moved ~191
commits ahead (service-locator refactor) since the bridge's pinned 0.25.25, so a
straight submodule bump breaks the bridge's API. I backported just the one
function and temporarily repointed the submodule URL to a personal fork
(jagajaga/livesync-commonlib@v3-rabin-karp-backport). You will likely prefer to
land the equivalent splitPiecesRabinKarp change in vrtmrz/livesync-commonlib
and repoint here. The upstream bridge has been dormant since 2025-09-17 while
commonlib moved on, which is why this is a targeted backport rather than a full
bump.

Test plan

deno install + deno run -A main.ts boots with no import errors
Reads a live V3 vault: 1326 files written, 0 Decryption failed / 0
Corrupted document
Backported splitter reproduces stored chunk IDs 8/8 on real post-V3 notes
deno check lib/src/string_and_binary/chunks.ts adds no new type errors
Fresh git clone --recursive resolves the submodule + V3 splitter

The bridge crashed with "Uncaught (in promise) TypeError: expected AsyncWrap" when handling many filesystem events in rapid succession (e.g. bulk imports, folder renames, or vault wipes via scanOfflineChanges). Root cause: pouchdb-adapter-http uses node-fetch internally. Deno's Node-compatibility shim for node-fetch occasionally produces socket-handle state that fails AsyncWrap validation when concurrent HTTP requests hit CouchDB. This is an upstream bug in the Deno/Node interop layer. Without a global unhandled-rejection handler the error terminates the Deno process; Docker's restart policy then brings the bridge back up, but it begins watching "from now" and loses all in-flight filesystem events. The result is CouchDB and disk drifting out of sync — docs present on one side but missing from the other. Two-part fix: 1. main.ts: install a globalThis.unhandledrejection handler that swallows the known AsyncWrap rejection (and logs any others) without exiting. A surviving process is strictly better than a restarted one for sync convergence — chokidar/scanOfflineChanges will re-deliver missed events. 2. Hub.ts: wrap peer.put/peer.delete in dispatch with bounded exponential backoff (100/200/400/800 ms, up to 4 attempts) for the transient HTTP errors that surface this bug (expected AsyncWrap, socket hang up, ECONNRESET, ETIMEDOUT). Most retries succeed on attempt 2 because the underlying socket is fresh. Non-transient errors still throw immediately. Reproduces against vanilla bridge by importing ~1200 small markdown files in <1 second via filesystem writes. Patched bridge processes the same burst without restarting.

Four related fixes prompted by investigating a "A's body got overwritten with B's content" report. None of these are by themselves a smoking gun for the reported symptom, but together they harden the bridge against the most plausible remaining mechanism (HTTP-level response cross-wire from the node-fetch AsyncWrap bug under burst) and fix three latent issues that surfaced during the audit. PeerCouchDB: verify chunk integrity before dispatching. After getByMeta materializes an entry, recompute each chunk's hash and confirm it matches the ID stored in meta.children. Catches the case where a chunk read got cross-wired to another chunk's response payload — without this, the bridge would write the wrong bytes to disk under doc A's path and then push A's metadata pointing to chunks whose content no longer hashes to their ID. On mismatch we log NOTICE and drop the change; the next change for the same doc will retry after the chunk cache is refreshed. PeerCouchDB: persist a real `since` checkpoint. beginWatch's checkIsInterested side-effected setSetting("since", this.man.since), but nothing ever advanced this.man.since past the value set at constructor time. On a watch error the .on("error") handler reconnects after 10 s using the same stale since, causing a full replay from process start each time. Now we update this.man.since (and persist it) from change.seq after each entry is processed, so reconnects resume where we left off. PeerStorage: fix dedup-cache key mismatch. put()/delete() keyed isRepeating() by `lp` (baseDir-prefixed) while dispatch()/dispatchDeleted() used the relative path. With baseDir != "" the keys never collided, so the LRU never short-circuited the echo of our own writes. The echo was still caught by isChanged()/the CouchDB content check, but the cache was effectively dead. Use the global pathSrc on both sides. Peer: fix compareDate int32 truncation. `~~(a?.mtime ?? 0 / 1000)` parses as `~~(a?.mtime ?? 0)` (precedence) and truncates to int32. For 2026-era ms timestamps that wraps past 2^31 and makes the comparison effectively random — breaking the "same-content within an hour, skip the write" optimization in PeerCouchDB.put. Return the delta in whole seconds via Math.floor((mtime ?? 0) / 1000), matching the caller's 3600-second threshold. Verified against a local couchdb: chunk poisoning is detected and the bridge refuses the write rather than corrupting /vault. Normal sync (storage→couchdb and couchdb→storage) is unaffected.

Two production-only fixes that the previous commit's `since` advance depends on to actually work in the user's Docker setup. main.ts: circuit breaker for the persistent AsyncWrap watch loop. Observed in production: pouchdb-adapter-http's changes-feed retry chain (the lib's .on("error") → setTimeout(10s) → beginWatch) hits AsyncWrap on every single reconnect attempt, indefinitely. The previous patch's unhandled-rejection swallow keeps the process alive, but the watch never recovers in-process — the node-fetch socket pool stays broken. Only a fresh Deno process gets a clean state. Count AsyncWrap rejections in a 5-minute sliding window and exit when the threshold (30) is reached. That's roughly "errors firing every 10s for 5 minutes," which is the signature of the broken-loop state and unambiguously not a transient burst. Docker's restart policy brings us back clean; the since checkpoint from the previous commit ensures we resume mid-stream instead of replaying from "now". PeerCouchDB: persist `since` and `remote-created` to a JSON file in dat/. The since fix from the previous commit wrote to localStorage, which under the user's compose lives at /deno-dir/location_data/<hash>/local_storage — inside the container's ephemeral fs. Every Docker restart wipes it, so the checkpoint never survived across the kind of process exit the new circuit breaker triggers. The /app/dat volume IS mounted (the bridge-state named volume), so we shadow the two checkpoints that matter for resume correctness into dat/state-<peer-name>.json there. localStorage is still written as a legacy shadow but is no longer authoritative. `remote-created` had to come along because without it, the start() path ("Remote database looks like rebuilt. fetch from the first again.") fires on every restart with a wiped localStorage and resets since=0 — undoing the file-based checkpoint. Persisting both fixes that. Also: localStorage reads/writes in start() are now wrapped (tryGetSetting / trySetSetting) so a broken/partially-wiped backing store can't crash the peer before beginWatch is even reached. State writes use a trailing-edge 500ms debounce so a burst of changes turns into one volume write per window, not one per change. Verified locally: state-server.json is written on each change, survives a process restart with localStorage fully wiped, and the bridge resumes with "Watch starting from <persisted seq>" instead of "looks like rebuilt".

The bridge runs as a CouchDB<->filesystem relay in Docker and never uses the trystero P2P transport. It was previously stripped at image-build time via `sed -i '/"trystero":/d' deno.jsonc` in the Dockerfile; move that into the repo so a plain `deno install` / `deno task run` works without the GitHub-hosted trystero import (which also avoids a needless network dependency during install).

Point the commonlib submodule at the V3 Rabin-Karp backport (jagajaga/livesync-commonlib@b354ef5, splitPiecesRabinKarp from vrtmrz/livesync-commonlib@6abcea69 on top of the pinned 0.25.25). This makes the chunks the bridge writes back to CouchDB use the same content-defined boundaries as LiveSync V3 (Fine Deduplication) clients, so write-back deduplicates against chunks produced by v0.25.65+. Verified against live data: the new splitter reproduces stored chunk IDs exactly (in order) for real post-V3 notes, and the bridge already reads/decrypts V3 chunks correctly with the pinned library. Note for upstream: the submodule URL is temporarily repointed to a personal fork because the splitter change is a single-function backport onto the pinned commonlib. Maintainers may prefer to land the equivalent splitPiecesRabinKarp change in vrtmrz/livesync-commonlib and repoint here.

jagajaga and others added 5 commits May 26, 2026 04:25

jagajaga mentioned this pull request May 31, 2026

Bridge hardening: AsyncWrap survival, chunk integrity, recovery checkpoints #48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: V3 chunk-splitter alignment + burst-load resilience#49

feat: V3 chunk-splitter alignment + burst-load resilience#49
jagajaga wants to merge 5 commits into
vrtmrz:mainfrom
jagajaga:feat/v3-chunk-splitter

jagajaga commented May 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jagajaga commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What I found about "V3 decryption failures"

What actually needed fixing for V3: the write path

Changes

Submodule note for maintainers

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jagajaga commented May 31, 2026 •

edited

Loading