fix: persist symmetric keys + byte-identical metadata; stop wiping keys on keybox edits#22
Conversation
…ys on keybox edits
Five issues that together caused keystore-pinned apps to be silently
logged out across reboots and config changes. All flow from the same
root cause: GeneratedKeyPersistence loses information on save -> reload.
1. Symmetric keys (AES, HMAC, 3DES) were never persisted at all
- GeneratedKeyPersistence.save only accepted KeyPair, ignoring SecretKey
- AndroidX security MasterKey (AES-GCM-256) regenerated on every
reboot, making EncryptedSharedPreferences undecryptable
- Apps that wrap session tokens in EncryptedSharedPreferences
interpret this as session expiry and force a relogin
2. Restored KeyMetadata authorizations differed from generation-time bytes
- loadPersistedKeys rebuilt KeyMintAttestation with mostly null/empty
fields, so toAuthorizations emitted a different tag set after
reboot vs. at generateKey time
- Apps that fingerprint metadata across keystore calls saw a
"changed key"
3. certificate / certificateChain split could shift after restore
- buildKeyEntryResponse called updateCertificateChain on the rebuilt
metadata, which is allowed to repartition leaf vs. chain bytes
- Apps with strict leaf fingerprint checks saw a "changed cert"
4. Touching ANY .xml under /data/adb/tricky_store wiped every cached key
- ConfigObserver called clearAllGeneratedKeys() which also calls
GeneratedKeyPersistence.deleteAll()
- Editing keybox.xml (or any unrelated .xml) thus deleted every
persisted key on disk
- Even the keybox-cache argument does not justify wiping per-app keys:
patched chains alone are stale, raw keypairs are not
5. SoftwareOperation NPE when restored keyParams missed PURPOSE tag
- Init dereferenced keyPair!! before checking purpose, so a
half-restored record crashed instead of producing a clean error
Single on-disk format (FORMAT_VERSION = 3) covers everything: PKCS8
private key bytes for asymmetric, raw secret bytes for symmetric, plus
the byte-identical KeyMetadata parcel snapshot so authorizations
restore exactly. Earlier dev-only formats are silently skipped by the
loader; the next generateKey for those aliases re-creates them in v3.
ConfigObserver now calls invalidatePatchedChains() instead of
clearAllGeneratedKeys() on .xml edits - only the chain cache is
stale, not the underlying keypairs.
Tested on OnePlus 13 (Android 16, KSU 3.2.4):
- Apps survive force-stop + cold reboot without losing keystore state
- Apps survive keybox.xml edits / replacements (touch, sed, cp -mv)
- Tamper score still 4 (CONSISTENT) on Duck Detector
- KeyAttestation chain output unchanged
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThis PR enhances key persistence to capture byte-identical metadata snapshots and symmetric key material, enabling loss-less key restoration across reboots. The persistence format is bumped to v3, interception points snapshot metadata during generation, restore paths unmarshal metadata with fallback rebuilding, authorization security levels are corrected, and operation initialization becomes defensive against missing authorizations. ChangesMetadata-preserving key persistence
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR updates the software-keystore path to persist both asymmetric and symmetric key material across reboots, and to restore KeyMetadata byte-identically (via a marshalled parcel snapshot) so apps that fingerprint keystore metadata don’t treat restored aliases as “changed”. It also stops wiping all persisted keys on keybox .xml edits by only invalidating patched certificate chains.
Changes:
- Persist symmetric keys (AES/HMAC/3DES) alongside algorithm + metadata snapshot, and restore them on boot.
- Persist a byte-identical
KeyMetadataparcel snapshot and prefer restoring from it; fall back to rebuilding from primitive fields when unavailable. - Avoid deleting all generated/persisted keys on keybox edits; only clear patched chains. Add KEY_ID-domain lookup for
getKeyEntry.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| app/src/main/java/org/matrix/TEESimulator/interception/keystore/shim/SoftwareOperation.kt | Adds defensive handling for missing PURPOSE / missing key material during operation init. |
| app/src/main/java/org/matrix/TEESimulator/interception/keystore/shim/KeyMintSecurityLevelInterceptor.kt | Persists symmetric keys + metadata snapshots; restores persisted keys using metadata bytes; adds patched-chain invalidation helper; adjusts keystore-enforced auth tagging. |
| app/src/main/java/org/matrix/TEESimulator/interception/keystore/shim/GeneratedKeyPersistence.kt | Bumps persistence format to v3 and adds fields for metadata snapshot + symmetric key block. |
| app/src/main/java/org/matrix/TEESimulator/interception/keystore/Keystore2Interceptor.kt | Handles KEY_ID-based getKeyEntry lookup from the in-memory generated-key cache; persists metadata snapshot for generated attest keys. |
| app/src/main/java/org/matrix/TEESimulator/config/ConfigurationManager.kt | Stops wiping generated/persisted keys on .xml edits; only invalidates patched chains. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/src/main/java/org/matrix/TEESimulator/interception/keystore/shim/GeneratedKeyPersistence.kt (1)
134-156:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAlign the v3 writer and reader bounds.
Line 134 onward can write any
certChain.sizeandmetadataBytes.size, but Line 283 and Line 457 rejectcertCount > 10, and Line 291 and Line 465 rejectmetaLen > 256 KiB. A large chain fromupdateSubcomponent()or a larger metadata snapshot will therefore persist successfully and then be skipped as "corrupted" on the next boot.Possible fix
object GeneratedKeyPersistence { + private const val MAX_CERT_COUNT = 10 + private const val MAX_METADATA_BYTES = 1024 * 1024 + fun save( keyId: KeyIdentifier, keyPair: KeyPair?, secretKey: javax.crypto.SecretKey?, @@ metadataBytes: ByteArray? = null, ) { require(keyPair != null || secretKey != null) { "Either keyPair or secretKey must be provided" } + require(certChain.size <= MAX_CERT_COUNT) { + "certChain too long: ${certChain.size} (max $MAX_CERT_COUNT)" + } + require((metadataBytes?.size ?: 0) <= MAX_METADATA_BYTES) { + "metadata snapshot too large: ${metadataBytes?.size ?: 0} (max $MAX_METADATA_BYTES)" + } @@ - val certCount = requireBounds(input.readInt(), 10, "certCount") + val certCount = requireBounds(input.readInt(), MAX_CERT_COUNT, "certCount") @@ - val metaLen = requireBounds(input.readInt(), 256 * 1024, "metaLen") + val metaLen = requireBounds(input.readInt(), MAX_METADATA_BYTES, "metaLen") @@ - val certCount = requireBounds(input.readInt(), 10, "certCount") + val certCount = requireBounds(input.readInt(), MAX_CERT_COUNT, "certCount") @@ - val metaLen = requireBounds(input.readInt(), 256 * 1024, "metaLen") + val metaLen = requireBounds(input.readInt(), MAX_METADATA_BYTES, "metaLen")Also applies to: 283-294, 457-474
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/main/java/org/matrix/TEESimulator/interception/keystore/shim/GeneratedKeyPersistence.kt` around lines 134 - 156, The writer in GeneratedKeyPersistence currently writes arbitrary certChain.size and metadataBytes sizes, but the v3 reader later rejects certCount > 10 and metaLen > 256 KiB; update the writer (the v3 persistence branch in GeneratedKeyPersistence where it writes certChain and metadata snapshot) to enforce the same bounds as the reader: ensure certChain.size is <= 10 (either throw an IOException or trim the chain before writing) and ensure metadataBytes.size <= 256 * 1024 (either throw or truncate) and then write the limited sizes/bytes; use the same error handling semantics as the reader so data written will not be considered "corrupted" on read.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@app/src/main/java/org/matrix/TEESimulator/interception/keystore/shim/KeyMintSecurityLevelInterceptor.kt`:
- Around line 582-598: The async persistExecutor.execute {
GeneratedKeyPersistence.save(...) } used in KeyMintSecurityLevelInterceptor
returns success before the first save completes, risking lost or resurrected
aliases; change the first persistence (the GeneratedKeyPersistence.save call
inside the persistExecutor block that runs when creating/importing an alias) to
be performed synchronously (either call GeneratedKeyPersistence.save directly on
the current thread or submit a Callable/Runnable and wait for completion) and
ensure the save is durably flushed (use the persistence API's
synchronous/transactional/flush/fsync facility if available) before returning
the alias; apply the same change to the other occurrence of
GeneratedKeyPersistence.save in the class (the block around lines 654-669) so
both initial saves complete durably before reporting success.
In
`@app/src/main/java/org/matrix/TEESimulator/interception/keystore/shim/SoftwareOperation.kt`:
- Around line 244-255: Replace the requireNotNull checks in the KeyPurpose.SIGN,
KeyPurpose.VERIFY and KeyPurpose.AGREE_KEY branches so they throw the same
ServiceSpecificException used elsewhere instead of an IllegalArgumentException;
use the Elvis operator on keyPair (e.g. keyPair ?: throw
ServiceSpecificException(KeystoreErrorCodes.unsupportedPurpose)) before
constructing Signer(keyPair, params), Verifier(keyPair, params) or the AGREE_KEY
counterpart, and include txId in the exception message if desired to match
existing logging/formatting conventions.
---
Outside diff comments:
In
`@app/src/main/java/org/matrix/TEESimulator/interception/keystore/shim/GeneratedKeyPersistence.kt`:
- Around line 134-156: The writer in GeneratedKeyPersistence currently writes
arbitrary certChain.size and metadataBytes sizes, but the v3 reader later
rejects certCount > 10 and metaLen > 256 KiB; update the writer (the v3
persistence branch in GeneratedKeyPersistence where it writes certChain and
metadata snapshot) to enforce the same bounds as the reader: ensure
certChain.size is <= 10 (either throw an IOException or trim the chain before
writing) and ensure metadataBytes.size <= 256 * 1024 (either throw or truncate)
and then write the limited sizes/bytes; use the same error handling semantics as
the reader so data written will not be considered "corrupted" on read.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a13829e1-c5d8-429a-bd79-caa4594d46e2
📒 Files selected for processing (5)
app/src/main/java/org/matrix/TEESimulator/config/ConfigurationManager.ktapp/src/main/java/org/matrix/TEESimulator/interception/keystore/Keystore2Interceptor.ktapp/src/main/java/org/matrix/TEESimulator/interception/keystore/shim/GeneratedKeyPersistence.ktapp/src/main/java/org/matrix/TEESimulator/interception/keystore/shim/KeyMintSecurityLevelInterceptor.ktapp/src/main/java/org/matrix/TEESimulator/interception/keystore/shim/SoftwareOperation.kt
| persistExecutor.execute { | ||
| GeneratedKeyPersistence.save( | ||
| keyId = keyId, | ||
| keyPair = null, | ||
| secretKey = secretKey, | ||
| nspace = keyDescriptor.nspace, | ||
| securityLevel = securityLevel, | ||
| certChain = emptyList(), | ||
| algorithm = parsedParams.algorithm, | ||
| keySize = parsedParams.keySize, | ||
| ecCurve = parsedParams.ecCurve ?: 0, | ||
| purposes = parsedParams.purpose, | ||
| digests = parsedParams.digest, | ||
| isAttestationKey = false, | ||
| metadataBytes = metadataBytesForSymmetric, | ||
| ) | ||
| } |
There was a problem hiding this comment.
Make the first persist durable before returning the alias.
Line 582 and Line 654 enqueue save() and return success immediately. If the process is force-stopped/rebooted, or the alias is deleted/imported before that task runs, the record can be lost or a late save can resurrect a deleted key on disk. That leaves the reboot/delete semantics nondeterministic again.
Also applies to: 654-669
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@app/src/main/java/org/matrix/TEESimulator/interception/keystore/shim/KeyMintSecurityLevelInterceptor.kt`
around lines 582 - 598, The async persistExecutor.execute {
GeneratedKeyPersistence.save(...) } used in KeyMintSecurityLevelInterceptor
returns success before the first save completes, risking lost or resurrected
aliases; change the first persistence (the GeneratedKeyPersistence.save call
inside the persistExecutor block that runs when creating/importing an alias) to
be performed synchronously (either call GeneratedKeyPersistence.save directly on
the current thread or submit a Callable/Runnable and wait for completion) and
ensure the save is durably flushed (use the persistence API's
synchronous/transactional/flush/fsync facility if available) before returning
the alias; apply the same change to the other occurrence of
GeneratedKeyPersistence.save in the class (the block around lines 654-669) so
both initial saves complete durably before reporting success.
There was a problem hiding this comment.
The async persistExecutor.execute { save() } pattern was inherited from the existing asymmetric path in doSoftwareKeyGen — making generateKey synchronous on the persistence I/O would add fsync latency to every keystore call (5-50ms typical, more under load). I think durability vs. throughput here is a deliberate architectural trade-off that's broader than this PR's scope: persistence is best-effort on crash, but cleanupKeyData still calls delete() synchronously so the delete-then-resurrect race you describe is bounded to "process killed within ~1ms of generate returning".
There was a problem hiding this comment.
@Andrea-lyz, that's a well-reasoned trade-off. The key safety argument — that cleanupKeyData calls delete() synchronously, so the only remaining window for a "late save resurrects a deleted alias" race is the ~1ms between generate returning and the executor task landing on disk — is convincing enough that blocking the caller for an fsync round-trip isn't justified here.
Would you like me to open a follow-up GitHub issue to track the durability concern (e.g. adding an optional awaitPersisted() / flush() API on GeneratedKeyPersistence for callers that need stronger guarantees without penalising the common path)? That way it stays on the radar without coupling it to this PR.
✏️ Learnings added
Learnt from: Andrea-lyz
Repo: Enginex0/TEESimulator-RS PR: 22
File: app/src/main/java/org/matrix/TEESimulator/interception/keystore/shim/KeyMintSecurityLevelInterceptor.kt:582-598
Timestamp: 2026-05-16T17:16:00.892Z
Learning: In `KeyMintSecurityLevelInterceptor.kt` (Kotlin, Android), the `GeneratedKeyPersistence.save(...)` calls inside `doSoftwareKeyGen` (both symmetric and asymmetric paths) are intentionally executed asynchronously via `persistExecutor.execute { ... }` rather than synchronously. This is a deliberate architectural trade-off: synchronous save would add 5–50 ms of fsync latency to every keystore generateKey call. Persistence is best-effort on crash; the delete-then-resurrect race is considered acceptable because `cleanupKeyData` calls `delete()` synchronously, bounding the problematic window to ~1 ms. Do not suggest making these saves synchronous.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
Address Copilot/CodeRabbit review feedback on the persistence PR. 1. SoftwareOperation: replace requireNotNull(keyPair) in SIGN/VERIFY/AGREE_KEY branches with ServiceSpecificException(invalidArgument). The original requireNotNull throws IllegalArgumentException, which the binder layer wraps as KEYMINT_UNKNOWN_ERROR — defeating the goal of surfacing a clean keystore-style error. Aligns with how ENCRYPT/DECRYPT already handle missing key material in the same when block. 2. loadPersistedKeys: when a symmetric record has empty metadataBytes (e.g. a save where Parcel.marshall() was empty for any reason), rebuild a minimal KeyMetadata from PersistedKeyData primitive fields instead of skipping the record. Skipping silently dropped the AES key, which is the same 'logged out after reboot' behavior the PR is trying to fix. The rebuilt metadata is structurally minimal but preserves the secret material, which is the dominant correctness concern. 3. Comment fix: rebuildResponseFromRecord docs referred to 'v2 metadata snapshot' — the format in this PR is v3.
|
Note: occasional "Register timer timing side-channel" Positive on Duck Detector Flagging this in case it shows up in review. On my device the probe sometimes reports: Register timer • bound_cpu0 • attested 0.143ms • non-attested 0.130ms • diff 0.013ms • ratio 1.100x • threshold > 1.1x • Positive A few notes: Verdict stays CONSISTENT (tier STRONGBOX, trust root Google). The signal is supplementary, attestation and trust-path checks remain aligned. |
PR #22's KEY_ID lookup at Keystore2Interceptor.onPreTransact only scanned generatedKeys, which is populated exclusively by doSoftwareKeyGen (GENERATE mode and the attest-key override path). For AUTO packages on TEE-good devices the real TEE handles generateKey and the response lands in teeResponses via the existing GENERATE_KEY_TRANSACTION post-hook, so getKeyEntry(KEY_ID) by TimingSideChannelProbe missed and the call leaked SSE. Add findTeeResponseByKeyId companion helper that mirrors findGeneratedKeyByKeyId's shape but scans teeResponses keyed by response.metadata.key.nspace. Wire it as a fallback after the existing PR #22 lookup. Behavior unchanged for GENERATE packages.
After PR #22 and the AUTO-mode extension started caching attested generateKey responses in teeResponses, KEY_ID getKeyEntry lookups for attested keys returned from memory in ~1ms while non-attested keys forwarded to real keystore2 took ~1.5ms. TimingSideChannelProbe measured the 1.55x ratio against its 1.1x threshold and flagged the asymmetry. Forward non-attested generateKey to real keystore2 with post-hook enabled (Continue instead of ContinueAndSkipPost), and extend the GENERATE_KEY post-hook to cache no-chain responses into teeResponses. The KEY_ID lookup added in the previous commit now resolves both paths from memory at matched latency. Cert-chain patching is skipped for the no-chain branch because there is no attestation extension to rewrite.
Bump OTA pointer to v6.0.0-224 and document the 59-commit delta from v6.0.0-162. Highlights: - Duck Detector TamperScore-4 cleared on Xiaomi A16 - Self-sufficient spoofing (PatchLevelManager, BulletinPoller, PIF FileObserver, vbmeta complement props) - Persistent symmetric key storage (PR #22) - Action button hardened: vol+ confirm + 22-language i18n - Build: JVM 21, gradle auto-rewrites update.json
This PR fixes five intertwined issues in
GeneratedKeyPersistencethattogether cause keystore-pinned apps to be silently logged out across
reboots and config edits. All five trace to the same root cause: the
on-disk format loses information that callers depend on.
Reproduction (any app that pins a keystore alias)
target.txtto include the appam force-stopthe app)commonly shows the login screen as if the session expired
The same regression appears when editing or replacing keybox.xml (any
write to
/data/adb/tricky_store/*.xml) — fixed by issue 4 below.Root causes & fixes
1. Symmetric keys (AES, HMAC, 3DES) were never persisted
GeneratedKeyPersistence.saveonly acceptedKeyPair. The symmetricpath in
KeyMintSecurityLevelInterceptor(isSymmetricbranch) cachedthe
SecretKeyin memory but never called save. Every reboot regenerateda fresh AES key, making AndroidX security MasterKey unable to decrypt
EncryptedSharedPreferences. That's the proximate cause of the "logged
out after reboot" symptom for any app that wraps session state in
EncryptedSharedPreferences.
Fix: add
secretKey: SecretKey?to save signature; persist rawsecret bytes + algorithm in the new format; restore via
SecretKeySpecin
loadPersistedKeys.2. Restored authorizations differed from generation-time
loadPersistedKeysrebuiltKeyMintAttestationfrom primitive fieldsonly —
origin,blockMode,padding,attestationChallenge, expirytimestamps, etc. were all
null/emptyList()after reboot.toAuthorizations()then emitted a different tag set than the onecaptured at
generateKeytime. Apps that fingerprint metadata acrosskeystore calls observe this as "key changed".
Fix: snapshot the live
KeyMetadataparcel bytes at save time(KeyMetadata is binder-free so
marshall()is safe), restore viaParcel.unmarshall+KeyMetadata.CREATOR.createFromParcel. Falls backto the old rebuild path only for legacy v1 records that have no snapshot.
3. certificate / certificateChain split could shift after restore
buildKeyEntryResponsecallsCertificateHelper.updateCertificateChainwhich is allowed to repartition the leaf vs. intermediates differently
from how the original parcel laid them out. Apps with strict leaf
fingerprint checks observe this as "cert changed".
Fix: byte-identical metadata snapshot from issue 2 also fixes this —
the original split is preserved.
4. Editing any .xml in /data/adb/tricky_store wiped every cached key
ConfigObservercalledclearAllGeneratedKeys()on any.xmlchange.That clears the in-memory map and
GeneratedKeyPersistence.deleteAll(),deleting every persisted key on disk. So legitimate keybox edits (or
even unrelated
.xmledits in the same directory) wiped per-app keysthat have nothing to do with the keybox cache.
The original justification (stale patched chains) only requires
patchedChains.clear()— the underlying keypairs are still valid andstill bind to apps' fingerprints.
Fix: switch to
invalidatePatchedChains(). The next attestationre-signs with the new keybox; previously-registered aliases keep working.
5. SoftwareOperation NPE when restored record missed PURPOSE
The
initblock didkeyPair!!before checking purpose. If ahalf-restored record produced an empty purpose list, the result was a
silent NPE in the binder reply rather than a clean keystore error.
Fix: explicit null/empty checks; surface
ServiceSpecificException(unsupportedPurpose)so the caller can recover.On-disk format
Bumped
FORMAT_VERSIONto 3. v3 always includes:v1/v2 records are silently skipped by the loader; the next
generateKeyfor those aliases re-creates them in v3. Affected apps re-login once
after upgrade, then keystore state is stable across reboots.
Testing
OnePlus 13 (Snapdragon 8 Elite, Android 16, KSU 3.2.4):
am force-stop+ cold opentouch keybox.xmlcp valid_keybox.xml > keybox.xml(Known limitation: when a user changes the keybox between first
registration of an alias and the first re-attestation for that alias,
apps that pin the chain root at registration may see the new root as a
mismatch and force a single relogin. Subsequent edits don't trigger
this, since the second registration fixes the new root.)
Summary by CodeRabbit
Bug Fixes
New Features