Skip to content

feat(logs): secret-redaction core for the Logs app#1558

Merged
jaylfc merged 3 commits into
devfrom
feat/logs-redaction
Jul 2, 2026
Merged

feat(logs): secret-redaction core for the Logs app#1558
jaylfc merged 3 commits into
devfrom
feat/logs-redaction

Conversation

@jaylfc

@jaylfc jaylfc commented Jul 2, 2026

Copy link
Copy Markdown
Owner

First piece of the Logs app (#1548 remaining halves; design in docs/design/logs-app.md). A pure-function redaction layer that every log line will pass through before it leaves the box via the coming system-logs API, so a bug-report bundle pasted into a public issue cannot leak a credential.

Covers key=value / --flag value / Authorization: Bearer / PEM private-key blocks / connection-string passwords / bare provider-key shapes (sk-, ghp_, xoxb-, AKIA), plus masking of known literal secret values from the secrets store. Fails closed (ignores <6-char known values to avoid runaway redaction). 21 tests; fixtures are deliberately synthetic so they exercise the regexes without tripping secret scanning.

Security-critical foundation, kept with the orchestrator since nothing in the logs path may bypass it. The system-logs routes (paged read + SSE tail + bug-report bundle) and the LogsApp frontend build on this next.

Summary by CodeRabbit

  • New Features

    • Added log redaction to automatically hide sensitive values in output, including passwords, bearer tokens, API keys, private key blocks, and common secret formats.
    • Redaction now also works across multiple log line inputs and handles non-text values more safely.
  • Tests

    • Added coverage for redaction behavior across key/value pairs, JSON, command-line flags, connection strings, and known secret values.

The security foundation of the Logs app (docs/design/logs-app.md): every log
line that leaves the box through the coming system-logs API passes through
redact() first, so an operator copying a bug-report bundle into a public issue
cannot leak a credential a dependency or stack trace happened to log.

Redacts by pattern (key=value, --flag value, Authorization: Bearer, PEM
private-key blocks, connection-string passwords, and bare provider-key shapes
sk-/ghp_/xoxb-/AKIA) and by known literal secret value (the exact strings from
the secrets store). Fails closed: known values under 6 chars are ignored to
avoid runaway redaction. Pure functions, 21 tests. Test fixtures are
deliberately synthetic (EXAMPLE-shaped) so they exercise the regexes without
tripping secret scanning. Routes + frontend build on this next.
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@jaylfc, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 50 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 3c51e505-dac7-4bcb-8f8b-6933b6bb0c21

📥 Commits

Reviewing files that changed from the base of the PR and between 06d42a3 and a748673.

📒 Files selected for processing (2)
  • tests/test_log_redaction.py
  • tinyagentos/log_redaction.py
📝 Walkthrough

Walkthrough

This PR introduces a new tinyagentos/log_redaction.py module implementing regex-based redaction of secrets (key/value pairs, bearer tokens, PEM keys, connection strings, token shapes, and known values) via redact() and redact_lines(), along with a corresponding test suite.

Changes

Log Redaction Module

Layer / File(s) Summary
Redaction configuration and regex rules
tinyagentos/log_redaction.py
Defines PLACEHOLDER constant, sensitive-key list, and compiled regexes for key/value pairs, bearer tokens, secret shapes, PEM blocks, and connection strings.
Core redact() and redact_lines() functions
tinyagentos/log_redaction.py
Implements ordered pattern substitution in redact() and batch processing with type coercion in redact_lines().
Redaction test suite
tests/test_log_redaction.py
Adds tests covering key/value, bearer, token shape, PEM, connection string, known_values, safety, and regression cases.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant redact_lines
    participant redact
    participant RegexRules

    Caller->>redact_lines: lines, known_values
    loop each line
        redact_lines->>redact: text, known_values
        redact->>RegexRules: apply PEM, connection string, bearer, flag, key/value, token-shape rules
        RegexRules-->>redact: masked text
        redact->>redact: mask surviving known_values tokens
        redact-->>redact_lines: redacted line
    end
    redact_lines-->>Caller: redacted lines
Loading

Related Issues: None found in the provided context.

Related PRs: None found in the provided context.

Suggested labels: security, enhancement, tests

Suggested reviewers: None specified.

🐰 A whisker twitch, a careful sweep,
Secrets hidden, safely asleep,
Tokens masked and keys concealed,
No password ever revealed,
Redaction hops through every line!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: introducing secret redaction for log output in the Logs app.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/logs-redaction

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gitar-bot

gitar-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

Gitar is working

Gitar

Comment thread tinyagentos/log_redaction.py Outdated

# Authorization: Bearer <token> (header form; the key-value rule catches the
# "authorization=" form, this catches the header " Bearer <tok>" shape).
_BEARER_RE = re.compile(r'(?P<pre>bearer\s+)(?P<val>[A-Za-z0-9._\-]{8,})', re.IGNORECASE)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRITICAL: Bearer-token value class misses characters that commonly appear in real tokens.

The value character class is [A-Za-z0-9._\-]{8,}. It does not allow +, /, or =, which are the trailing three characters of standard base64/base64url encodings. A Bearer header like Authorization: Bearer abc123def+456ghi= would split at the first illegal character: the value class would only see abc123def (and, depending on its length, may not even satisfy {8,}), so the token would either be partially masked or not masked at all.

This is exactly the shape used by AWS session tokens, many OAuth2 access tokens, and other application/x-www-form-urlencoded-friendly credentials. A bypass here defeats the purpose of the header form.

Suggested change
_BEARER_RE = re.compile(r'(?P<pre>bearer\s+)(?P<val>[A-Za-z0-9._\-]{8,})', re.IGNORECASE)
_BEARER_RE = re.compile(r'(?P<pre>bearer\s+)(?P<val>[A-Za-z0-9._\-+/=]{8,})', re.IGNORECASE)

Reply with @kilocode-bot fix it to have Kilo Code address this issue.


# Provider-key shapes that are secrets on their own with no key= prefix:
# sk-..., sk-taos-..., ghp_/gho_/ghs_ (GitHub), xoxb-/xoxp- (Slack), AKIA... (AWS).
_TOKEN_SHAPE_RE = re.compile(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: No detection for JWT-style tokens or other common bare-secret shapes.

_TOKEN_SHAPE_RE only covers sk-, gh[posru]_, xox[baprs]-, and AKIA…. It misses several categories of secrets that frequently end up in logs without a key= prefix:

  • JWTs: many services log Authorization: eyJhbGciOi… (no Bearer scheme word) or emit the token as a bare value. The KV rule may catch some of these when a sensitive key precedes them, but a JWT inside a JSON body field that is not in _SENSITIVE_KEYS (e.g. id_token, refresh_token, assertion, saml, idp_token) would slip through.
  • Stripe live keys (sk_live_…), Google API keys (AIza…), Slack legacy (xoxs- is covered but xoxa-/xoxp- may over-match short identifiers), npm tokens (npm_…), PyPI tokens (pypi-…), SendGrid (SG.…), Mailgun (key-…).

At minimum, consider adding eyJ[A-Za-z0-9_\-]{10,}\.eyJ[A-Za-z0-9_\-]{10,}\.[A-Za-z0-9_\-]{10,} for JWTs and an AIza[0-9A-Za-z_\-]{35} rule for Google keys, plus expanding _SENSITIVE_KEYS to include id_token, refresh_token, assertion, jwt.


Reply with @kilocode-bot fix it to have Kilo Code address this issue.

Comment thread tinyagentos/log_redaction.py Outdated
for val in sorted((v for v in known_values if v), key=len, reverse=True):
if len(val) < _MIN_KNOWN_VALUE_LEN:
continue
text = text.replace(val, PLACEHOLDER)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: known_values are replaced as raw substrings, which can corrupt unrelated text and cause partial leaks.

text.replace(val, PLACEHOLDER) will mask the secret wherever it appears, including inside longer unrelated strings. Two concrete risks:

  1. Partial-leak context. If the known secret is abcdef123456 and a log line contains request_id=abcdef1234567890, the result becomes request_id=[REDACTED]7890, leaking both a 6-char prefix (which the rest of the system may consider identifying) and the trailing digits.
  2. False positives in hashes/paths. A 6-char-or-longer known value that happens to appear in a hex hash, URL path, or UUID fragment will be masked, producing garbled output ([REDACTED]7890-style artefacts) and degrading the diagnostic value of the logs without actually removing the secret.

Consider matching on word boundaries (re.sub(r'\\b' + re.escape(val) + r'\\b', PLACEHOLDER, text)) or, better, masking the whole surrounding token (e.g. \S* on either side) so partial matches either don't redact or redact the entire context that contains the secret.


Reply with @kilocode-bot fix it to have Kilo Code address this issue.

Comment thread tinyagentos/log_redaction.py Outdated
# whitespace, comma, or line end).
_KV_RE = re.compile(
r'(?P<pre>["\']?(?:' + _KEY_ALT + r')["\']?\s*[:=]\s*)'
r'(?P<quote>["\']?)(?P<val>[^\s,"\'}{]+)(?P=quote)',

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: _KV_RE value character class is too narrow for some legitimate secret formats.

[^\s,"'}]{]+ stops at <, >, ), ], and ]. A value such as password=<redacted-by-caller> (e.g. an HTML-escaped or shell-quoted token), a URL like redirect_uri=https://app.example.com/cb?x=1&y=2, or anything containing brackets will be truncated mid-value. While the rule itself isn't a security bypass (the start of the secret still gets masked), it means a non-empty tail of the secret can be left in the log.

Consider broadening to [^\s"'}{]+ (or even \S+) and trusting the closing quote/comma heuristics via the existing (?P=quote) group, with a length cap to avoid runaway matches on pathological input.


Reply with @kilocode-bot fix it to have Kilo Code address this issue.

Comment thread tinyagentos/log_redaction.py Outdated


def redact_lines(lines: "list[str]", known_values: "list[str] | None" = None) -> "list[str]":
"""Redact a list of log lines (convenience for the paged log reader)."""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: redact_lines will crash on None or non-string elements.

The list comprehension [redact(line, known_values) for line in lines] passes each element straight to redact, which does text = _PEM_RE.sub(...). re.Pattern.sub on None raises TypeError. Since the docstring advertises this as the entry point for the paged log reader, a single malformed line (e.g. from a buggy upstream parser emitting None) will take down the whole page response rather than producing a partially-redacted result. Consider coercing or guarding: for line in lines if isinstance(line, str).


Reply with @kilocode-bot fix it to have Kilo Code address this issue.

Comment thread tinyagentos/log_redaction.py Outdated

# PEM private-key blocks (SSH keys materialized on deploy, TLS keys).
_PEM_RE = re.compile(
r'-----BEGIN [A-Z0-9 ]*PRIVATE KEY-----.*?-----END [A-Z0-9 ]*PRIVATE KEY-----',

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: PEM rule swallows surrounding log context, not just the key block.

_PEM_RE is replaced with a single PLACEHOLDER, but the rule's match is the entire span from -----BEGIN … to -----END … (with .*? and DOTALL). That is correct for masking, but the loss is greater than necessary when a stack trace logs the key material inline — the exception type, file, line number, and the message framing around the key are all gone. Consider preserving the framing markers and masking only the base64 body, e.g. replace -----BEGIN …-----\n.*?-----END …----- so -----BEGIN OPENSSH PRIVATE KEY----- and -----END …----- remain visible and only the bytes between them are replaced.


Reply with @kilocode-bot fix it to have Kilo Code address this issue.


class TestBearer:
def test_header(self):
assert redact("Authorization: Bearer abc123def456ghi") == \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: Test coverage is light for the documented threat model.

For security-critical code that is described as "nothing in the logs path may bypass this", the suite is missing negative tests that would catch the most likely regressions:

  • JWT in an Authorization header without the word Bearer (e.g. Authorization: eyJhbGciOi…).
  • Base64-shaped bearer token containing +, /, = (the exact bypass flagged on log_redaction.py:55).
  • A known secret value that appears as a substring of a longer unrelated identifier (the partial-leak case flagged on log_redaction.py:116).
  • Connection string with a percent-encoded password or with @ in the userinfo.
  • Multi-line PEM block embedded in a stack trace (the test uses a clean block, but real-world logs interleave it with exception frames).
  • redact_lines called with a list containing None or empty strings.

Adding these would lock in the intended behaviour against the regression modes that are cheapest to introduce and hardest to spot in code review.


Reply with @kilocode-bot fix it to have Kilo Code address this issue.

@kilo-code-bot

kilo-code-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (2 files)
  • tinyagentos/log_redaction.py
  • tests/test_log_redaction.py
Previous Review Summaries (2 snapshots, latest commit 06d42a3)

Current summary above is authoritative. Previous snapshots are kept for context only.

Previous review (commit 06d42a3)

Status: 3 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 0
SUGGESTION 3
Issue Details (click to expand)

SUGGESTION

File Line Issue
tinyagentos/log_redaction.py 73 JWT segment character class is missing +, /, and =, which can truncate a real JWT that uses non-strict base64
tinyagentos/log_redaction.py 72 SendGrid branch is missing \b anchors and could over-match inside a longer identifier
tests/test_log_redaction.py 118 test_bare_jwt_masked does not actually exercise the new bare-JWT branch of _TOKEN_SHAPE_RE
Files Reviewed (2 files)
  • tinyagentos/log_redaction.py - 2 issues
  • tests/test_log_redaction.py - 1 issue

Fix these issues in Kilo Cloud

Previous review (commit 44cfcd5)

Status: 7 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 1
WARNING 2
SUGGESTION 4
Issue Details (click to expand)

CRITICAL

File Line Issue
tinyagentos/log_redaction.py 55 Bearer-token value class misses +, /, =; base64-shaped tokens bypass the header rule entirely or are only partially masked

WARNING

File Line Issue
tinyagentos/log_redaction.py 59 No detection for JWT-style tokens or other common bare-secret shapes (Stripe, Google, npm, PyPI, SendGrid, Mailgun)
tinyagentos/log_redaction.py 116 known_values plain substring replacement can partially leak secret prefixes/suffixes inside longer unrelated identifiers

SUGGESTION

File Line Issue
tinyagentos/log_redaction.py 41 _KV_RE value character class is too narrow — stops at <, >, ), ], leaving non-empty tails
tinyagentos/log_redaction.py 122 redact_lines crashes on None or non-string elements instead of skipping them
tinyagentos/log_redaction.py 70 PEM rule masks surrounding log context along with the key body; preserve framing markers
tests/test_log_redaction.py 35 Test suite is missing negative cases for JWTs, base64 bearer tokens, substring leaks, encoded connection strings, and PEM embedded in stack traces
Files Reviewed (2 files)
  • tinyagentos/log_redaction.py - 6 issues
  • tests/test_log_redaction.py - 1 issue

Fix these issues in Kilo Cloud


Reviewed by minimax-m3 · Input: 36.8K · Output: 1.9K · Cached: 167K

…token known-value mask)

- CRITICAL: the Bearer value class dropped + / = so base64/base64url tokens
  (AWS session, OAuth2) were truncated at the first illegal char and leaked
  their tail; the class now includes the base64 set
- add bare-token shapes commonly logged without a key= prefix: JWT, Google
  AIza, Stripe sk_live/test, npm_, SendGrid SG.; add id_token/refresh_token/
  access_token/jwt/assertion to the sensitive-key list
- known-value masking now replaces the WHOLE surrounding token, so a secret
  that is a prefix of a longer identifier cannot leak the trailing part or
  produce garbled output
- PEM rule keeps the BEGIN/END markers, masking only the key body, so a key
  in a stack trace still shows what was redacted
- broaden the key-value value class (keep URLs/brackets, cap length) so a
  token is not truncated mid-string
- redact_lines coerces non-string elements so one bad line cannot crash a page

7 fold-regression tests added; 28 total.
Comment thread tinyagentos/log_redaction.py Outdated
r'|\bAIza[0-9A-Za-z_\-]{35}\b' # Google API key
r'|\bnpm_[A-Za-z0-9]{20,}' # npm token
r'|\bSG\.[A-Za-z0-9_\-]{16,}\.[A-Za-z0-9_\-]{16,}' # SendGrid
r'|\beyJ[A-Za-z0-9_\-]{8,}\.eyJ[A-Za-z0-9_\-]{6,}\.[A-Za-z0-9_\-]{6,}' # JWT

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: JWT segment character class is missing +, /, and =, which can truncate a real JWT that uses non-strict base64.

The class [A-Za-z0-9_\-] is exactly the base64url alphabet, so a spec-compliant RFC 7519 JWT is matched in full. But a number of real-world issuers (legacy OAuth2 servers, some IdPs, hand-rolled libraries) emit JWTs that use standard base64 with +, /, and = padding in the signature segment. A token like eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxIn0.abc+def/ghi= would match the header and payload but stop at the first + in the signature, leaking the tail. The rule above already widened the Bearer class to include +/= for the same reason — apply the same reasoning here.

Suggested change
r'|\beyJ[A-Za-z0-9_\-]{8,}\.eyJ[A-Za-z0-9_\-]{6,}\.[A-Za-z0-9_\-]{6,}' # JWT
r'|\beyJ[A-Za-z0-9_\-+/=]{8,}\.eyJ[A-Za-z0-9_\-+/=]{6,}\.[A-Za-z0-9_\-+/=]{6,}' # JWT

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

Comment thread tinyagentos/log_redaction.py Outdated
r'|\bAKIA[0-9A-Z]{16}\b' # AWS access key id
r'|\bAIza[0-9A-Za-z_\-]{35}\b' # Google API key
r'|\bnpm_[A-Za-z0-9]{20,}' # npm token
r'|\bSG\.[A-Za-z0-9_\-]{16,}\.[A-Za-z0-9_\-]{16,}' # SendGrid

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: SendGrid branch is missing \b anchors and could over-match inside a longer identifier.

Every other branch in _TOKEN_SHAPE_RE is wrapped in \b…\b (or \b… at the start when the suffix already provides a boundary). SG\.[A-Za-z0-9_\-]{16,}\.[A-Za-z0-9_\-]{16,} has no \b at either end, so a string like MYAPP_SG.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.bbbbbbbbbbbbbbbbbbbbbbbbbbbb will be redacted even though the surrounding MYAPP_ makes it clearly not a SendGrid key. Add \b at the start (between word boundary and SG) and at the end of the regex.

Suggested change
r'|\bSG\.[A-Za-z0-9_\-]{16,}\.[A-Za-z0-9_\-]{16,}' # SendGrid
r'|\bSG\.[A-Za-z0-9_\-]{16,}\.[A-Za-z0-9_\-]{16,}\b' # SendGrid

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

Comment thread tests/test_log_redaction.py Outdated

def test_bare_jwt_masked(self):
jwt = "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.SflKxwRJSMeKKF2QT4"
out = redact(f"id_token={jwt}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: test_bare_jwt_masked does not actually exercise the new bare-JWT branch of _TOKEN_SHAPE_RE.

The input is f"id_token={jwt}". Because id_token is in _SENSITIVE_KEYS, the key-value rule fires first and the whole JWT is consumed as the KV value — the test would still pass even if the JWT branch of _TOKEN_SHAPE_RE were deleted. The whole point of the new JWT rule is to catch tokens logged without any key= prefix (e.g. a service logging received: eyJhbGciOi…); the current test does not cover that path. Add an assertion with a bare JWT and no sensitive key prefix to lock in the new behavior.


Reply with @kilocode-bot fix it to have Kilo Code address this issue.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/test_log_redaction.py (1)

55-65: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Synthetic fixtures still trip the secret scanner, contradicting the stated test design goal.

Betterleaks flags the JWT (line 117), both PEM blocks (lines 57-59, 137-139), and the Stripe key (line 125) as real secret patterns. This file's own ghp_/xoxb-/AKIA fixtures already avoid this by using literal EXAMPLE placeholder text instead of realistic-looking random data — the same approach should be applied here to actually meet the PR's goal of exercising these regexes "without triggering secret scanning."

♻️ Example rewrite using placeholder text instead of realistic-looking payloads
-        jwt = "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.SflKxwRJSMeKKF2QT4"
+        jwt = "eyEXAMPLE.eyEXAMPLEPAYLOAD.EXAMPLESIGNATURE"
...
-    def test_stripe_live_key(self):
-        assert PLACEHOLDER in redact("sk_live_0000000000000000abcdef")
+    def test_stripe_live_key(self):
+        assert PLACEHOLDER in redact("sk_live_EXAMPLEEXAMPLEEXAMPLE")

The PEM blocks are trickier since the base64 body shape itself is what triggers detection regardless of content realism — worth confirming with the secret-scanning tool's allowlist/nosec-style annotation support if the body must remain base64-shaped.

Also applies to: 116-125, 135-145

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_log_redaction.py` around lines 55 - 65, The synthetic secret
fixtures in test_log_redaction still look like real secrets, causing the scanner
to flag the JWT, PEM blocks, and Stripe key despite the test’s goal; update the
fixtures in test_private_key_block and the related secret test cases to use
obvious placeholder text like the existing ghp_/xoxb_/AKIA examples instead of
realistic payloads, and for the PEM-style cases either replace the base64-shaped
body with non-secret placeholder content or add an appropriate test-only
allowlist/no-scan annotation if that shape must be preserved.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tinyagentos/log_redaction.py`:
- Line 87: The connection-string redaction regex in _CONN_STR_RE only matches
userinfo when a username is present, so URLs like redis://:password@host bypass
masking. Update the pattern to make the username portion optional while still
capturing the password and @ delimiter, and verify the redaction path in
log_redaction.py continues to use _CONN_STR_RE for these cases.
- Around line 51-59: The Authorization redaction only handles Bearer tokens, so
schemes in _AUTH_SCHEMES like basic, digest, token, and negotiate can still leak
credentials after _kv_repl leaves the scheme word untouched. Update _BEARER_RE
in tinyagentos/log_redaction.py (and its redaction call site if needed) so the
follow-up mask covers any Authorization scheme listed in _AUTH_SCHEMES, not just
bearer, while keeping the existing scheme-detection logic in _kv_repl aligned
with that behavior.

---

Nitpick comments:
In `@tests/test_log_redaction.py`:
- Around line 55-65: The synthetic secret fixtures in test_log_redaction still
look like real secrets, causing the scanner to flag the JWT, PEM blocks, and
Stripe key despite the test’s goal; update the fixtures in
test_private_key_block and the related secret test cases to use obvious
placeholder text like the existing ghp_/xoxb_/AKIA examples instead of realistic
payloads, and for the PEM-style cases either replace the base64-shaped body with
non-secret placeholder content or add an appropriate test-only allowlist/no-scan
annotation if that shape must be preserved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ebe56ea0-6063-4a50-89a0-a55492a16fa8

📥 Commits

Reviewing files that changed from the base of the PR and between 8aa0af7 and 06d42a3.

📒 Files selected for processing (2)
  • tests/test_log_redaction.py
  • tinyagentos/log_redaction.py

Comment on lines +51 to +59
# Auth SCHEME words that legitimately follow "authorization:"; the real secret
# is the NEXT token (handled by the bearer rule), so the KV rule must not treat
# the scheme word itself as the value and stop there, leaving the token exposed.
_AUTH_SCHEMES = {"bearer", "basic", "digest", "token", "negotiate"}

# Authorization: Bearer <token> (header form). The value class includes the
# base64/base64url trailing set (+ / =) so AWS session tokens and OAuth2
# access tokens are masked whole, not truncated at the first + or /.
_BEARER_RE = re.compile(r'(?P<pre>bearer\s+)(?P<val>[A-Za-z0-9._\-+/=]{8,})', re.IGNORECASE)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Non-Bearer auth schemes leak their credential.

_kv_repl deliberately returns the match unchanged when the KV value is an auth scheme word (Lines 113-114), trusting a follow-up rule to mask the real token. But only _BEARER_RE exists, so basic, digest, token, and negotiate have no follow-up. Authorization: Basic dXNlcjpwYXNzd29yZA== therefore survives every rule and the base64 user:password is logged verbatim — a direct redaction bypass for HTTP Basic auth (and GitHub-style Authorization: token …).

Broaden the header rule to cover all schemes in _AUTH_SCHEMES rather than only bearer:

🔒 Proposed fix: mask the token after any auth scheme
-# Authorization: Bearer <token>  (header form). The value class includes the
-# base64/base64url trailing set (+ / =) so AWS session tokens and OAuth2
-# access tokens are masked whole, not truncated at the first + or /.
-_BEARER_RE = re.compile(r'(?P<pre>bearer\s+)(?P<val>[A-Za-z0-9._\-+/=]{8,})', re.IGNORECASE)
+# Authorization: <scheme> <token>  (header form). Covers every scheme in
+# _AUTH_SCHEMES so Basic/Digest/Token/Negotiate credentials are masked too,
+# not just Bearer. The value class includes the base64/base64url trailing set
+# (+ / =) so AWS session tokens and OAuth2 access tokens are masked whole.
+_AUTH_TOKEN_RE = re.compile(
+    r'(?P<pre>(?:bearer|basic|digest|token|negotiate)\s+)(?P<val>[A-Za-z0-9._\-+/=]{8,})',
+    re.IGNORECASE,
+)

And update the application site:

-    text = _BEARER_RE.sub(lambda m: m.group("pre") + PLACEHOLDER, text)
+    text = _AUTH_TOKEN_RE.sub(lambda m: m.group("pre") + PLACEHOLDER, text)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tinyagentos/log_redaction.py` around lines 51 - 59, The Authorization
redaction only handles Bearer tokens, so schemes in _AUTH_SCHEMES like basic,
digest, token, and negotiate can still leak credentials after _kv_repl leaves
the scheme word untouched. Update _BEARER_RE in tinyagentos/log_redaction.py
(and its redaction call site if needed) so the follow-up mask covers any
Authorization scheme listed in _AUTH_SCHEMES, not just bearer, while keeping the
existing scheme-detection logic in _kv_repl aligned with that behavior.


# postgres://user:pass@host, mysql://..., redis://..., amqp:// -- mask the
# password component only.
_CONN_STR_RE = re.compile(r'(?P<pre>[a-zA-Z][a-zA-Z0-9+.\-]*://[^:/\s]+:)(?P<val>[^@\s]+)(?P<post>@)')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟡 Minor | ⚡ Quick win

Connection strings with an empty username leak the password.

[^:/\s]+ requires at least one character before the :password@, so URLs of the form redis://:s3cretpass@host (empty user, common for Redis/AMQP) never match and the password survives. Make the userinfo prefix optional.

🔒 Proposed fix
-_CONN_STR_RE = re.compile(r'(?P<pre>[a-zA-Z][a-zA-Z0-9+.\-]*://[^:/\s]+:)(?P<val>[^@\s]+)(?P<post>@)')
+_CONN_STR_RE = re.compile(r'(?P<pre>[a-zA-Z][a-zA-Z0-9+.\-]*://[^:/\s]*:)(?P<val>[^@\s]+)(?P<post>@)')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_CONN_STR_RE = re.compile(r'(?P<pre>[a-zA-Z][a-zA-Z0-9+.\-]*://[^:/\s]+:)(?P<val>[^@\s]+)(?P<post>@)')
_CONN_STR_RE = re.compile(r'(?P<pre>[a-zA-Z][a-zA-Z0-9+.\-]*://[^:/\s]*:)(?P<val>[^@\s]+)(?P<post>@)')
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tinyagentos/log_redaction.py` at line 87, The connection-string redaction
regex in _CONN_STR_RE only matches userinfo when a username is present, so URLs
like redis://:password@host bypass masking. Update the pattern to make the
username portion optional while still capturing the password and @ delimiter,
and verify the redaction path in log_redaction.py continues to use _CONN_STR_RE
for these cases.

- JWT shape now tolerates standard-base64 (+ / =) segments so a non-strict
  encoder's token is masked whole instead of truncated at the first + or /
- SendGrid branch gets a trailing word boundary to match the sibling AWS/Google
  patterns and avoid over-matching into a longer identifier
- test_bare_jwt_masked now feeds a bare JWT in prose (no key= prefix, no Bearer
  word) so it actually exercises the standalone JWT branch, and asserts the
  placeholder is present
@jaylfc jaylfc merged commit 0568f32 into dev Jul 2, 2026
9 checks passed
@jaylfc jaylfc deleted the feat/logs-redaction branch July 2, 2026 22:41
@github-project-automation github-project-automation Bot moved this from Todo to Done in TinyAgentOS Roadmap Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant