Skip to content

feat(piipolicy): add policy-based automatic PII detection and dynamic…#3506

Open
Deeven-Seru wants to merge 2 commits into
googleapis:mainfrom
Deeven-Seru:feat/3373-pii-masking
Open

feat(piipolicy): add policy-based automatic PII detection and dynamic…#3506
Deeven-Seru wants to merge 2 commits into
googleapis:mainfrom
Deeven-Seru:feat/3373-pii-masking

Conversation

@Deeven-Seru

Copy link
Copy Markdown
Contributor

1. Description
This PR implements #3373 by adding a policy-based automatic PII detection and dynamic masking layer. It intercepts tool results and dynamically masks PII based on role-based policies defined in tools.yaml.

2. PR Checklist

  • Make sure to open an issue as a bug/issue before writing your code!
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)
  • Make sure to add ! if this involves a breaking change

3. Issue Reference
Fixes #3373

@Deeven-Seru Deeven-Seru requested review from a team as code owners June 23, 2026 15:21

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a PII policy masking feature, allowing the server to mask or deny sensitive fields in tool execution results based on user claims. The changes include adding a new piipolicy package, updating configuration parsing and merging to support piiPolicies, and integrating policy application into the MCP tool execution handlers. The review feedback suggests several high-value improvements: converting strings to rune slices in applyActionToString to correctly handle multi-byte UTF-8 characters, recursively applying PII policies to nested maps and slices within applyToMap to prevent nested data from bypassing masking, validating and pre-compiling regex patterns during configuration parsing to fail fast, and adding corresponding test cases for nested structures.

Comment thread internal/piipolicy/engine.go
Comment thread internal/piipolicy/engine.go Outdated
Comment thread internal/server/config.go
Comment thread internal/piipolicy/engine_test.go
@Deeven-Seru

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a PII (Personally Identifiable Information) policy engine to the MCP Toolbox, allowing the masking or denying of sensitive fields in tool execution results based on user claims and configured rules. The changes include adding PII policy configurations, updating the resource manager, and applying the policy during tool invocation across various MCP protocol versions. The review feedback highlights several critical correctness and performance improvements for the new piipolicy engine. Specifically, it recommends caching compiled regular expressions to avoid expensive on-the-fly compilation, handling primitive types directly in the policy switch to prevent serialization overhead and type mutation, hoisting regex compilation and action checks out of loops, and correctly handling multi-byte UTF-8 characters during string masking to prevent invalid UTF-8 sequences and incorrect length calculations.

Comment on lines +17 to +23
import (
"context"
"encoding/json"
"fmt"
"regexp"
"strings"
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Performance & Correctness: Thread-Safe Regex Cache

Regex compilation via regexp.Compile is an expensive operation. Currently, regexes are compiled on the fly during every string evaluation. Introducing a thread-safe cache using sync.Map avoids redundant compilation and significantly improves performance, especially when processing large datasets.

import (
	"context"
	"encoding/json"
	"fmt"
	"regexp"
	"strings"
	"sync"
	"unicode/utf8"
)

var regexCache sync.Map // map[string]*regexp.Regexp

func getRegexp(pattern string) (*regexp.Regexp, error) {
	if val, ok := regexCache.Load(pattern); ok {
		return val.(*regexp.Regexp), nil
	}
	re, err := regexp.Compile(pattern)
	if err != nil {
		return nil, err
	}
	regexCache.Store(pattern, re)
	return re, nil
}

Comment on lines +57 to +58
return out, nil
default:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Correctness & Efficiency: Prevent Type Mutation of Primitives

Currently, any primitive type other than string (such as int, bool, float64, or nil) falls into the default case of the type switch, triggering a full json.Marshal and json.Unmarshal cycle. For integer types, this decodes them as float64, causing type mutation that can break downstream code expecting concrete integer types. Handling primitive types directly in the switch avoids this mutation and eliminates serialization overhead.

Suggested change
return out, nil
default:
return out, nil
case int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, float32, float64, bool, nil:
return data, nil
default:

Comment on lines +97 to +100
re, err := regexp.Compile(rule.Pattern)
if err != nil {
return "", fmt.Errorf("invalid pattern in rule %q: %w", rule.Name, err)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Performance: Use Cached Regex

Use the newly introduced getRegexp helper to retrieve the compiled regex from the cache instead of compiling it on every string evaluation.

Suggested change
re, err := regexp.Compile(rule.Pattern)
if err != nil {
return "", fmt.Errorf("invalid pattern in rule %q: %w", rule.Name, err)
}
re, err := getRegexp(rule.Pattern)
if err != nil {
return "", fmt.Errorf("invalid pattern in rule %q: %w", rule.Name, err)
}

Comment on lines +177 to +192
func applyActionToString(action Action, val string) string {
switch action {
case MaskFull:
return strings.Repeat("*", len(val))
case MaskPartial:
if len(val) <= 2 {
return strings.Repeat("*", len(val))
}
visibleLen := len(val) / 2
return val[:visibleLen] + strings.Repeat("*", len(val)-visibleLen)
case DenyField:
return "[DENIED]"
default:
return val // Unmask or unknown
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Correctness: Handle Multi-Byte UTF-8 Characters Correctly

In Go, len(val) returns the number of bytes, not characters (runes). For multi-byte UTF-8 characters (e.g., accented characters, non-ASCII names, or emojis), strings.Repeat("*", len(val)) will produce more asterisks than the actual character count, leaking byte-length information.

Additionally, slicing a string using byte indices (val[:visibleLen]) can slice in the middle of a multi-byte UTF-8 character, producing invalid UTF-8 sequences. Converting the string to []rune and using utf8.RuneCountInString ensures correct character-based masking and slicing.

Suggested change
func applyActionToString(action Action, val string) string {
switch action {
case MaskFull:
return strings.Repeat("*", len(val))
case MaskPartial:
if len(val) <= 2 {
return strings.Repeat("*", len(val))
}
visibleLen := len(val) / 2
return val[:visibleLen] + strings.Repeat("*", len(val)-visibleLen)
case DenyField:
return "[DENIED]"
default:
return val // Unmask or unknown
}
}
func applyActionToString(action Action, val string) string {
switch action {
case MaskFull:
return strings.Repeat("*", utf8.RuneCountInString(val))
case MaskPartial:
runes := []rune(val)
if len(runes) <= 2 {
return strings.Repeat("*", len(runes))
}
visibleLen := len(runes) / 2
return string(runes[:visibleLen]) + strings.Repeat("*", len(runes)-visibleLen)
case DenyField:
return "[DENIED]"
default:
return val // Unmask or unknown
}
}

Comment on lines +134 to +151
if rule.Pattern != "" {
for k, v := range out {
if strVal, ok := v.(string); ok {
action := getActionForRule(rule, tier)
if action == Unmask {
continue
}
re, err := regexp.Compile(rule.Pattern)
if err != nil {
return nil, fmt.Errorf("invalid pattern in rule %q: %w", rule.Name, err)
}
out[k] = re.ReplaceAllStringFunc(strVal, func(match string) string {
return applyActionToString(action, match)
})
}
}
}
continue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Performance: Hoist Action Check and Regex Compilation

In applyToMap, the action and regex pattern are constant for the entire rule. Hoisting the action check and regex compilation outside of the map key iteration loop avoids redundant lookups and regex retrievals for every key in the map.

Suggested change
if rule.Pattern != "" {
for k, v := range out {
if strVal, ok := v.(string); ok {
action := getActionForRule(rule, tier)
if action == Unmask {
continue
}
re, err := regexp.Compile(rule.Pattern)
if err != nil {
return nil, fmt.Errorf("invalid pattern in rule %q: %w", rule.Name, err)
}
out[k] = re.ReplaceAllStringFunc(strVal, func(match string) string {
return applyActionToString(action, match)
})
}
}
}
continue
if rule.Pattern != "" {
action := getActionForRule(rule, tier)
if action != Unmask {
re, err := getRegexp(rule.Pattern)
if err != nil {
return nil, fmt.Errorf("invalid pattern in rule %q: %w", rule.Name, err)
}
for k, v := range out {
if strVal, ok := v.(string); ok {
out[k] = re.ReplaceAllStringFunc(strVal, func(match string) string {
return applyActionToString(action, match)
})
}
}
}
}
continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add policy-based automatic PII detection and dynamic masking

3 participants