Fix/restore readable investigation output after masking 479#807
Fix/restore readable investigation output after masking 479#807Ade20boss wants to merge 4 commits intoTracer-Cloud:mainfrom
Conversation
Greptile SummaryThis PR completes the masking pipeline by ensuring Confidence Score: 5/5Safe to merge — all previously flagged blockers are resolved; only cosmetic P2 findings remain. Both P1 issues from the prior review round (NameError scope bug and missing masking_map persistence) are now correctly fixed. The only remaining findings are a stray extra-whitespace lint nit and a module-level import style suggestion, neither of which affects runtime behaviour. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[InvestigationState] --> B[node_plan_actions]
B --> B1[MaskingContext.from_state]
B1 --> B2[mask_value each InvestigateInput field]
B2 --> B3[build_plan_actions LLM call masked input_data]
B3 --> B4[persist masking_map to state]
A --> C[build_diagnosis_prompt]
C --> C1[MaskingContext.from_state]
C1 --> C2[mask problem_md]
C1 --> C3[mask hypotheses list]
C1 --> C4[_build_evidence_sections masking_ctx passed in]
C4 --> C5[mask raw_alert str branch]
C2 & C3 & C5 --> C6[LLM receives placeholder-only prompt]
B4 & C6 --> D[publish_findings / rca node masking_ctx.unmask to user output]
Reviews (2): Last reviewed commit: "fix: address review comments — scope mas..." | Re-trigger Greptile |
| raw_alert_text: str = "" | ||
| if isinstance(raw_alert, str): | ||
| raw_alert_text = raw_alert | ||
| raw_alert_text = _masking_ctx.mask(raw_alert) |
There was a problem hiding this comment.
NameError: _masking_ctx is not in scope of _build_evidence_sections
_masking_ctx is a local variable defined in build_diagnosis_prompt (line 52), but _build_evidence_sections is a module-level function, not a nested closure. Python will raise NameError: name '_masking_ctx' is not defined whenever raw_alert is a str, crashing every diagnosis prompt build for string-type alerts.
The fix is to pass the MaskingContext as an argument into _build_evidence_sections:
# In _build_evidence_sections signature:
def _build_evidence_sections(
state: InvestigationState,
evidence: dict[str, Any],
masking_ctx: "MaskingContext | None" = None,
) -> str:
...
if isinstance(raw_alert, str):
raw_alert_text = masking_ctx.mask(raw_alert) if masking_ctx else raw_alertAnd in build_diagnosis_prompt:
evidence_text = _build_evidence_sections(state, evidence, _masking_ctx)There was a problem hiding this comment.
![]()
NameError:_masking_ctxis not in scope of_build_evidence_sections
_masking_ctxis a local variable defined inbuild_diagnosis_prompt(line 52), but_build_evidence_sectionsis a module-level function, not a nested closure. Python will raiseNameError: name '_masking_ctx' is not definedwheneverraw_alertis astr, crashing every diagnosis prompt build for string-type alerts.The fix is to pass the
MaskingContextas an argument into_build_evidence_sections:# In _build_evidence_sections signature: def _build_evidence_sections( state: InvestigationState, evidence: dict[str, Any], masking_ctx: "MaskingContext | None" = None, ) -> str: ... if isinstance(raw_alert, str): raw_alert_text = masking_ctx.mask(raw_alert) if masking_ctx else raw_alertAnd in
build_diagnosis_prompt:evidence_text = _build_evidence_sections(state, evidence, _masking_ctx)
|
@greptile-apps re-trigger |
Fixes #479
Describe the changes you have made in this PR -
This PR closes the remaining gaps in the masking pipeline by ensuring that planning and diagnosis prompts never expose raw infrastructure identifiers to the LLM, completing the three requirements from the issue.
1. Planning prompts now use placeholders (
app/nodes/plan_actions/node.py)Previously,
build_plan_actions()received unmaskedinput_datadirectly from state. AMaskingContextis now constructed from state and applied to maskinput_datafields before they are passed into the planning LLM call. This is a no-op when masking is disabled.2. Diagnosis prompts now use placeholders (
app/nodes/root_cause_diagnosis/prompt_builder.py)The
evidencedict was already masked upstream ininvestigate/node.py, butproblem_md,hypotheses, andraw_alertwere pulled directly from state and injected into the prompt unmasked. AMaskingContextis now constructed at the top ofbuild_diagnosis_promptand applied to these fields before they reach the prompt string.3. Final output already restores identifiers — no change needed
Both
publish_findings/node.pyandroot_cause_diagnosis/node.pyalready callmasking_ctx.unmask()before any user-facing output. The prefix collision bug (<NS_1>vs<NS_10>) referenced in #639 is also already fixed incontext.pyvia longest-first sort inunmask().Screenshots of the UI changes (If any) -
N/A — backend masking pipeline only, no UI changes.
Code Understanding and AI Usage
Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?
If you used AI assistance:
Explain your implementation approach:
The masking system already had a solid foundation —
MaskingContexthandles placeholder assignment, stability across nodes, and safe unmasking. The gap was that two upstream nodes were feeding raw state fields into LLM prompts without going through the masking layer first. The fix follows the same pattern already used ininvestigate/node.py: construct aMaskingContextfrom state, applymask()ormask_value()to the relevant fields, and pass the masked version downstream. No new abstractions were introduced, the existing API was sufficient.Checklist before requesting a review
Note: Please check Allow edits from maintainers if you would like us to assist in the PR.