fix(sdk): bound browser decrypt memory for large chunker downloads#922
fix(sdk): bound browser decrypt memory for large chunker downloads#922bnrobinson93 merged 7 commits intomainfrom
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 50 minutes and 9 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR refactors the AES-GCM decryption pipeline to accept both Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant AesGcmCipher
participant BrowserNativeCryptoService
participant DefaultCryptoService
participant SubtleCrypto
Caller->>AesGcmCipher: decrypt(ArrayBuffer | Uint8Array, key, iv)
AesGcmCipher->>AesGcmCipher: normalize input to Uint8Array
AesGcmCipher->>AesGcmCipher: processGcmPayload (IV + ciphertext+tag)
alt cryptoService.name == BrowserNativeCryptoService
AesGcmCipher->>BrowserNativeCryptoService: decryptBufferSource(subarray(12), key, subarray(0,12))
BrowserNativeCryptoService->>SubtleCrypto: decrypt(params, key, ciphertext+tag)
SubtleCrypto-->>BrowserNativeCryptoService: plaintext
BrowserNativeCryptoService-->>AesGcmCipher: DecryptResult
else other CryptoService
AesGcmCipher->>DefaultCryptoService: decrypt(payload, key, payloadIv, AES_256_GCM)
DefaultCryptoService->>SubtleCrypto: decrypt(params, key, ciphertext+tag)
SubtleCrypto-->>DefaultCryptoService: plaintext
DefaultCryptoService-->>AesGcmCipher: DecryptResult
end
AesGcmCipher-->>Caller: DecryptResult
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Code Review
This pull request optimizes AES-GCM decryption by supporting Uint8Array and BufferSource directly, reducing buffer copies, and implements memory management for streaming decryption by clearing consumed segments. It also introduces comprehensive debug logging for the decryption process. Feedback identifies a redundant IV assignment in the symmetric decryption logic and recommends removing the diagnostic logging and associated array mappings to prevent performance overhead in production.
016f46f to
92696f7
Compare
This comment was marked as resolved.
This comment was marked as resolved.
8f7174c to
1d6e356
Compare
X-Test Failure Report |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/tdf3/src/tdf.ts (1)
1271-1301:⚠️ Potential issue | 🟠 MajorUse subtraction instead of modulo for slice-relative offsets.
Line 1276 computes the in-buffer offset with
% slice[0].encryptedOffset, which can point to the wrong bytes when encrypted segment sizes vary. The fetched buffer starts at the first chunk’s encrypted offset, so each segment offset should beencryptedOffset - firstChunk.encryptedOffset.🐛 Proposed fix
export async function sliceAndDecrypt({ buffer, reconstructedKey, slice, @@ specVersion: string; }) { - for (const index in slice) { - const chunk = slice[index]; + const firstChunk = slice[0]; + if (!firstChunk) { + return; + } + + for (const chunk of slice) { const { encryptedOffset, encryptedSegmentSize, plainSegmentSize } = chunk; - const offset = - slice[0].encryptedOffset === 0 ? encryptedOffset : encryptedOffset % slice[0].encryptedOffset; + const offset = encryptedOffset - firstChunk.encryptedOffset; const encryptedChunk = buffer.subarray(offset, offset + (encryptedSegmentSize as number));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/tdf3/src/tdf.ts` around lines 1271 - 1301, The offset calculation for reading each chunk from the buffer is wrong: replace the modulo-based computation with a slice-relative subtraction so each chunk's offset is computed as encryptedOffset - slice[0].encryptedOffset before slicing the buffer; update the code around the loop that computes offset (referencing slice, encryptedOffset, slice[0].encryptedOffset and where encryptedChunk is assigned) to use subtraction instead of `%`, ensuring the encryptedChunk subarray start/end use that new offset when passed to decryptChunk and subsequent size checks/rejects.
🧹 Nitpick comments (1)
lib/tests/mocha/unit/crypto/crypto-service.spec.ts (1)
248-263: Add coverage for the browser-native decrypt branch.This test uses
DefaultCryptoService, so it primarily covers theprocessGcmPayloadfallback path. Please add a case for theBrowserNativeCryptoServicepath inAesGcmCipher.decrypt, ideally with a non-zerobyteOffsetUint8Arrayview to lock in the subarray handling this PR depends on.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/tests/mocha/unit/crypto/crypto-service.spec.ts` around lines 248 - 263, Add a new unit test that exercises the BrowserNativeCryptoService branch in AesGcmCipher.decrypt by instantiating AesGcmCipher with BrowserNativeCryptoService (not DefaultCryptoService), encrypting a payload, then decrypting using a Uint8Array view that has a non-zero byteOffset (e.g., a subarray or a larger buffer slice) to ensure the BrowserNativeCryptoService path and subarray handling are tested; reuse importSymmetricKey, Binary and the existing encrypt call to produce ciphertext and assert decrypted.payload.asString() equals the original string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/tdf3/src/ciphers/aes-gcm-cipher.ts`:
- Around line 64-82: The decrypt method of AesGcmCipher currently types its
first parameter as Uint8Array which breaks when callers pass an ArrayBuffer;
change the decrypt signature to accept ArrayBuffer | Uint8Array and normalize
the input to a Uint8Array at the top of AesGcmCipher.decrypt (e.g. replace
usages of buffer with a local const like input = new Uint8Array(buffer) or
similar) before calling buffer.subarray(), decryptBufferSource,
processGcmPayload, or this.cryptoService.decrypt so all downstream calls work
regardless of the original binary type.
---
Outside diff comments:
In `@lib/tdf3/src/tdf.ts`:
- Around line 1271-1301: The offset calculation for reading each chunk from the
buffer is wrong: replace the modulo-based computation with a slice-relative
subtraction so each chunk's offset is computed as encryptedOffset -
slice[0].encryptedOffset before slicing the buffer; update the code around the
loop that computes offset (referencing slice, encryptedOffset,
slice[0].encryptedOffset and where encryptedChunk is assigned) to use
subtraction instead of `%`, ensuring the encryptedChunk subarray start/end use
that new offset when passed to decryptChunk and subsequent size checks/rejects.
---
Nitpick comments:
In `@lib/tests/mocha/unit/crypto/crypto-service.spec.ts`:
- Around line 248-263: Add a new unit test that exercises the
BrowserNativeCryptoService branch in AesGcmCipher.decrypt by instantiating
AesGcmCipher with BrowserNativeCryptoService (not DefaultCryptoService),
encrypting a payload, then decrypting using a Uint8Array view that has a
non-zero byteOffset (e.g., a subarray or a larger buffer slice) to ensure the
BrowserNativeCryptoService path and subarray handling are tested; reuse
importSymmetricKey, Binary and the existing encrypt call to produce ciphertext
and assert decrypted.payload.asString() equals the original string.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5df44342-46c1-436a-ab29-e0f0aede5e39
📒 Files selected for processing (5)
lib/tdf3/src/ciphers/aes-gcm-cipher.tslib/tdf3/src/crypto/core/symmetric.tslib/tdf3/src/tdf.tslib/tdf3/src/utils/zip-reader.tslib/tests/mocha/unit/crypto/crypto-service.spec.ts
There was a problem hiding this comment.
Not a blocker, but a suggestion.
The existing huge.spec.ts test always passes because the CI runner has enough RAM to hold 3 GB of leaked plaintext without the browser crashing. It tests correctness (output size) but not the memory invariant.
A simple way to make it catch this class of bug: cap the browser's heap size via Playwright launch args. If you limit V8's old-space to 512 MB, then 3 GB of retained plaintext will OOM but a properly streaming decrypt that releases segments will fit fine.
test.use({
launchOptions: {
args: ['--js-flags=--max-old-space-size=512'],
},
});
This directly enforces the invariant that peak heap scales with segment size and concurrency, not total file size. Would have caught the original bug without needing a bigger file or longer CI time.
I've created a PR in main with that update, without the fix we're expecting it to fail
#925
X-Test Failure Report |
X-Test Failure Report |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/reusable_build-and-test.yaml (1)
285-293:⚠️ Potential issue | 🟡 MinorVerify
ciaggregation semantics with the addedplatform-hugedependency.The success gate
contains(needs.*.result, 'success')(line 300) passes if any need succeeded, and the failure gatecontains(needs.*.result, 'failure')(line 305) only trips onfailure— notcancelledorskipped. Withplatform-hugeadded, if it is ever skipped (e.g., via a futureif:gate) or cancelled due to timeout,cimay still report success as long as at least one other job passed. This isn't new to this PR, but adding a long-running, potentially-cancelled job makes the existing logic more fragile. Consider tightening to something like!contains(needs.*.result, 'failure') && !contains(needs.*.result, 'cancelled')and requiring all needed jobs to have succeeded.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/reusable_build-and-test.yaml around lines 285 - 293, The ci job's current if-guards use contains(needs.*.result, 'success') and only check for 'failure', which lets ci succeed even if some needs were cancelled or skipped (especially now with platform-huge). Update the ci job's if conditions (the expressions referencing contains(needs.*.result, ...)) to require all needed jobs be successful by ensuring there are no non-success results; for example replace the success gate with a predicate that asserts there are no 'failure', 'cancelled' or 'skipped' statuses (e.g. !contains(needs.*.result,'failure') && !contains(needs.*.result,'cancelled') && !contains(needs.*.result,'skipped')) or otherwise implement an “all succeeded” check so ci only runs when every need is success.
🧹 Nitpick comments (1)
.github/workflows/reusable_build-and-test.yaml (1)
223-273: Consider deduplicatingplatform-hugewithplatform-roundtripvia a matrix.
platform-hugeis an almost verbatim copy ofplatform-roundtrip(lines 171-221), differing only intimeout-minutes(45 vs 90) andPLAYWRIGHT_TESTS_TO_RUN(roundtripvshuge). Any future change to the setup (checkout refs, Go/Node versions, compose steps) will have to be made in two places and will drift. Collapsing the two into a single job with astrategy.matrixover{ name, tests, timeout }would remove ~50 lines and eliminate the drift risk.♻️ Sketch of a matrix-based consolidation
platform-roundtrip: needs: [cli, lib, web-app] runs-on: ubuntu-22.04 strategy: fail-fast: false matrix: include: - name: roundtrip tests: roundtrip timeout: 45 - name: huge tests: huge timeout: 90 name: platform-${{ matrix.name }} timeout-minutes: ${{ matrix.timeout }} defaults: run: working-directory: .github/workflows/roundtrip steps: # ... existing steps unchanged ... - env: PLAYWRIGHT_TESTS_TO_RUN: ${{ matrix.tests }} run: ./wait-and-test.sh platformThen in the
cijobneeds:list, a singleplatform-roundtripentry covers both matrix legs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/reusable_build-and-test.yaml around lines 223 - 273, The two nearly identical GitHub Actions jobs platform-huge and platform-roundtrip should be consolidated into one job using a strategy.matrix to avoid duplication; edit the workflow to replace the separate platform-huge job with a single job (e.g., platform-roundtrip) that declares strategy.matrix include entries for the two variants (name/tests/timeout), set name to include matrix.name, set timeout-minutes to use matrix.timeout, and replace the hard-coded PLAYWRIGHT_TESTS_TO_RUN env with matrix.tests so the existing steps (checkout, setup-node, setup-go, docker compose, ./wait-and-test.sh platform) are reused for both legs without duplicating the steps.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/reusable_build-and-test.yaml:
- Around line 223-232: The platform-huge job currently runs on every workflow
and blocks the ci job; update the platform-huge job (job name "platform-huge")
to gate it with an if condition (e.g., only run on push, scheduled/nightly,
workflow_dispatch, or when a PR has a specific label like "run-huge") so it does
not execute on every PR, and then remove "platform-huge" from the ci job's needs
list (or change the ci job success logic) so skipped runs no longer block
merges; ensure the gating condition is added to the platform-huge job and
references the same job name so downstream dependencies remain correct.
In `@web-app/tests/tests/huge.spec.ts`:
- Around line 53-56: The test awaits page.waitForEvent('download') too early:
remove the leading await so that plainDownloadPromise is the unresolved promise
returned by page.waitForEvent('download', { timeout: 60000 }) before triggering
the download; then perform page.locator('#fileSink').click() and
page.locator('#decryptButton').click(), and finally await plainDownloadPromise
to capture the decrypt-triggered download. Locate the occurrences of
plainDownloadPromise, page.waitForEvent, and the '#decryptButton'/'#fileSink'
locators in huge.spec.ts and change the assignment to store the promise (no
await) prior to clicking.
- Around line 22-24: The test references appUrl in the Large File test which is
undefined and causes page.goto(`${appUrl}?...`) to navigate to an invalid URL;
fix by either (A) defining a constant appUrl (e.g., const appUrl =
'http://localhost:65432') at the top of the test file where authorize and
test('Large File') are declared, (B) export appUrl from acts.ts and import it
into this spec, or (C) avoid appUrl entirely and call page.goto with a relative
path leveraging Playwright's baseURL; update the reference used in page.goto to
use the chosen valid appUrl or relative path.
---
Outside diff comments:
In @.github/workflows/reusable_build-and-test.yaml:
- Around line 285-293: The ci job's current if-guards use
contains(needs.*.result, 'success') and only check for 'failure', which lets ci
succeed even if some needs were cancelled or skipped (especially now with
platform-huge). Update the ci job's if conditions (the expressions referencing
contains(needs.*.result, ...)) to require all needed jobs be successful by
ensuring there are no non-success results; for example replace the success gate
with a predicate that asserts there are no 'failure', 'cancelled' or 'skipped'
statuses (e.g. !contains(needs.*.result,'failure') &&
!contains(needs.*.result,'cancelled') && !contains(needs.*.result,'skipped')) or
otherwise implement an “all succeeded” check so ci only runs when every need is
success.
---
Nitpick comments:
In @.github/workflows/reusable_build-and-test.yaml:
- Around line 223-273: The two nearly identical GitHub Actions jobs
platform-huge and platform-roundtrip should be consolidated into one job using a
strategy.matrix to avoid duplication; edit the workflow to replace the separate
platform-huge job with a single job (e.g., platform-roundtrip) that declares
strategy.matrix include entries for the two variants (name/tests/timeout), set
name to include matrix.name, set timeout-minutes to use matrix.timeout, and
replace the hard-coded PLAYWRIGHT_TESTS_TO_RUN env with matrix.tests so the
existing steps (checkout, setup-node, setup-go, docker compose,
./wait-and-test.sh platform) are reused for both legs without duplicating the
steps.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0a76d048-faef-4e3b-800b-18377a8b8663
📒 Files selected for processing (7)
.github/workflows/build-and-test.yaml.github/workflows/reusable_build-and-test.yamllib/tdf3/src/ciphers/aes-gcm-cipher.tslib/tdf3/src/ciphers/symmetric-cipher-base.tslib/tdf3/src/models/encryption-information.tslib/tests/mocha/unit/crypto/crypto-service.spec.tsweb-app/tests/tests/huge.spec.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/tests/mocha/unit/crypto/crypto-service.spec.ts
X-Test Failure Report✅ java-main |
X-Test Failure Report |
X-Test Failure Report |
1d03e62 to
bfb9692
Compare
X-Test Failure Report |
Summary
Fix large browser decrypts from chunker sources failing near EOF due to memory pressure in @opentdf/sdk.
Main bug upstream: consumed decrypted segments stayed referenced in the decrypt queue after enqueue, so long- running browser decrypts retained plaintext history and eventually killed the tab. This change releases consumed plaintext promptly and also removes a few avoidable decrypt-path copies that were amplifying heap pressure.
Validated with a browser-local harness using local @opentdf/sdk:
Root Cause
Browser decrypt path for chunker sources was not truly bounded in resident memory.
Most important issue:
Additional pressure came from extra buffer copies in the AES-GCM/browser decrypt path.
What Changed
Core fix
Memory / copy reductions
Validation
Reproduced in standalone browser harness outside downstream app/authz/transport path.
Observed before:
Observed after:
Risk / Notes
How To Test
Summary by CodeRabbit
Bug Fixes
Tests
Chores