Skip to content

Fix/restore readable investigation output after masking 479#807

Open
Ade20boss wants to merge 4 commits intoTracer-Cloud:mainfrom
Ade20boss:fix/restore-readable-investigation-output-after-masking-479
Open

Fix/restore readable investigation output after masking 479#807
Ade20boss wants to merge 4 commits intoTracer-Cloud:mainfrom
Ade20boss:fix/restore-readable-investigation-output-after-masking-479

Conversation

@Ade20boss
Copy link
Copy Markdown

Fixes #479

Describe the changes you have made in this PR -

This PR closes the remaining gaps in the masking pipeline by ensuring that planning and diagnosis prompts never expose raw infrastructure identifiers to the LLM, completing the three requirements from the issue.

1. Planning prompts now use placeholders (app/nodes/plan_actions/node.py)

Previously, build_plan_actions() received unmasked input_data directly from state. A MaskingContext is now constructed from state and applied to mask input_data fields before they are passed into the planning LLM call. This is a no-op when masking is disabled.

2. Diagnosis prompts now use placeholders (app/nodes/root_cause_diagnosis/prompt_builder.py)

The evidence dict was already masked upstream in investigate/node.py, but problem_md, hypotheses, and raw_alert were pulled directly from state and injected into the prompt unmasked. A MaskingContext is now constructed at the top of build_diagnosis_prompt and applied to these fields before they reach the prompt string.

3. Final output already restores identifiers — no change needed

Both publish_findings/node.py and root_cause_diagnosis/node.py already call masking_ctx.unmask() before any user-facing output. The prefix collision bug (<NS_1> vs <NS_10>) referenced in #639 is also already fixed in context.py via longest-first sort in unmask().

Screenshots of the UI changes (If any) -

N/A — backend masking pipeline only, no UI changes.


Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • No, I wrote all the code myself
  • [] Yes, I used AI assistance (continue below)

If you used AI assistance:

  • [] I have reviewed every single line of the AI-generated code
  • [] I can explain the purpose and logic of each function/component I added
  • [] I have tested edge cases and understand how the code handles them
  • [] I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

The masking system already had a solid foundation — MaskingContext handles placeholder assignment, stability across nodes, and safe unmasking. The gap was that two upstream nodes were feeding raw state fields into LLM prompts without going through the masking layer first. The fix follows the same pattern already used in investigate/node.py: construct a MaskingContext from state, apply mask() or mask_value() to the relevant fields, and pass the masked version downstream. No new abstractions were introduced, the existing API was sufficient.


Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • If it is a core feature, I have added thorough tests
  • My code follows the project's style guidelines and conventions

Note: Please check Allow edits from maintainers if you would like us to assist in the PR.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

This PR completes the masking pipeline by ensuring plan_actions and root_cause_diagnosis prompts never expose raw infrastructure identifiers to the LLM, following the same MaskingContext pattern already used in investigate/node.py. Both previously flagged blockers (the NameError from _masking_ctx being out of scope and the missing masking_map persistence in node_plan_actions) are now resolved.

Confidence Score: 5/5

Safe to merge — all previously flagged blockers are resolved; only cosmetic P2 findings remain.

Both P1 issues from the prior review round (NameError scope bug and missing masking_map persistence) are now correctly fixed. The only remaining findings are a stray extra-whitespace lint nit and a module-level import style suggestion, neither of which affects runtime behaviour.

No files require special attention.

Important Files Changed

Filename Overview
app/nodes/plan_actions/node.py Constructs a MaskingContext from state and masks all InvestigateInput fields before the planning LLM call; persists masking_map back to each return branch. Previously flagged issues (NameError, missing masking_map persistence) are resolved. Minor: in-function import style.
app/nodes/root_cause_diagnosis/prompt_builder.py Adds MaskingContext to mask problem_md, hypotheses, and raw_alert before they reach the diagnosis prompt; passes masking_ctx into _build_evidence_sections. Previously flagged NameError is fixed. One stray multi-space on line 368 was introduced by the diff.
Makefile Adds indentation inside ifeq/ifneq blocks for readability, adds a Windows fallback when the venv is absent, and simplifies the python3-check redirect to 2>/dev/null (safe since Windows is handled earlier). No functional regressions on Linux/macOS.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[InvestigationState] --> B[node_plan_actions]
    B --> B1[MaskingContext.from_state]
    B1 --> B2[mask_value each InvestigateInput field]
    B2 --> B3[build_plan_actions LLM call masked input_data]
    B3 --> B4[persist masking_map to state]

    A --> C[build_diagnosis_prompt]
    C --> C1[MaskingContext.from_state]
    C1 --> C2[mask problem_md]
    C1 --> C3[mask hypotheses list]
    C1 --> C4[_build_evidence_sections masking_ctx passed in]
    C4 --> C5[mask raw_alert str branch]
    C2 & C3 & C5 --> C6[LLM receives placeholder-only prompt]

    B4 & C6 --> D[publish_findings / rca node masking_ctx.unmask to user output]
Loading

Reviews (2): Last reviewed commit: "fix: address review comments — scope mas..." | Re-trigger Greptile

raw_alert_text: str = ""
if isinstance(raw_alert, str):
raw_alert_text = raw_alert
raw_alert_text = _masking_ctx.mask(raw_alert)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 NameError: _masking_ctx is not in scope of _build_evidence_sections

_masking_ctx is a local variable defined in build_diagnosis_prompt (line 52), but _build_evidence_sections is a module-level function, not a nested closure. Python will raise NameError: name '_masking_ctx' is not defined whenever raw_alert is a str, crashing every diagnosis prompt build for string-type alerts.

The fix is to pass the MaskingContext as an argument into _build_evidence_sections:

# In _build_evidence_sections signature:
def _build_evidence_sections(
    state: InvestigationState,
    evidence: dict[str, Any],
    masking_ctx: "MaskingContext | None" = None,
) -> str:
    ...
    if isinstance(raw_alert, str):
        raw_alert_text = masking_ctx.mask(raw_alert) if masking_ctx else raw_alert

And in build_diagnosis_prompt:

evidence_text = _build_evidence_sections(state, evidence, _masking_ctx)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 NameError: _masking_ctx is not in scope of _build_evidence_sections

_masking_ctx is a local variable defined in build_diagnosis_prompt (line 52), but _build_evidence_sections is a module-level function, not a nested closure. Python will raise NameError: name '_masking_ctx' is not defined whenever raw_alert is a str, crashing every diagnosis prompt build for string-type alerts.

The fix is to pass the MaskingContext as an argument into _build_evidence_sections:

# In _build_evidence_sections signature:
def _build_evidence_sections(
    state: InvestigationState,
    evidence: dict[str, Any],
    masking_ctx: "MaskingContext | None" = None,
) -> str:
    ...
    if isinstance(raw_alert, str):
        raw_alert_text = masking_ctx.mask(raw_alert) if masking_ctx else raw_alert

And in build_diagnosis_prompt:

evidence_text = _build_evidence_sections(state, evidence, _masking_ctx)

Comment thread app/nodes/plan_actions/node.py
Comment thread app/nodes/plan_actions/node.py
Copy link
Copy Markdown

@jlalbdalghnyalhlaly-afk jlalbdalghnyalhlaly-afk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

مدري

@Ade20boss
Copy link
Copy Markdown
Author

@greptile-apps re-trigger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

P0 Restore readable investigation output after masking

2 participants