feat(piipolicy): add policy-based automatic PII detection and dynamic…#3506
feat(piipolicy): add policy-based automatic PII detection and dynamic…#3506Deeven-Seru wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a PII policy masking feature, allowing the server to mask or deny sensitive fields in tool execution results based on user claims. The changes include adding a new piipolicy package, updating configuration parsing and merging to support piiPolicies, and integrating policy application into the MCP tool execution handlers. The review feedback suggests several high-value improvements: converting strings to rune slices in applyActionToString to correctly handle multi-byte UTF-8 characters, recursively applying PII policies to nested maps and slices within applyToMap to prevent nested data from bypassing masking, validating and pre-compiling regex patterns during configuration parsing to fail fast, and adding corresponding test cases for nested structures.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a PII (Personally Identifiable Information) policy engine to the MCP Toolbox, allowing the masking or denying of sensitive fields in tool execution results based on user claims and configured rules. The changes include adding PII policy configurations, updating the resource manager, and applying the policy during tool invocation across various MCP protocol versions. The review feedback highlights several critical correctness and performance improvements for the new piipolicy engine. Specifically, it recommends caching compiled regular expressions to avoid expensive on-the-fly compilation, handling primitive types directly in the policy switch to prevent serialization overhead and type mutation, hoisting regex compilation and action checks out of loops, and correctly handling multi-byte UTF-8 characters during string masking to prevent invalid UTF-8 sequences and incorrect length calculations.
| import ( | ||
| "context" | ||
| "encoding/json" | ||
| "fmt" | ||
| "regexp" | ||
| "strings" | ||
| ) |
There was a problem hiding this comment.
Performance & Correctness: Thread-Safe Regex Cache
Regex compilation via regexp.Compile is an expensive operation. Currently, regexes are compiled on the fly during every string evaluation. Introducing a thread-safe cache using sync.Map avoids redundant compilation and significantly improves performance, especially when processing large datasets.
import (
"context"
"encoding/json"
"fmt"
"regexp"
"strings"
"sync"
"unicode/utf8"
)
var regexCache sync.Map // map[string]*regexp.Regexp
func getRegexp(pattern string) (*regexp.Regexp, error) {
if val, ok := regexCache.Load(pattern); ok {
return val.(*regexp.Regexp), nil
}
re, err := regexp.Compile(pattern)
if err != nil {
return nil, err
}
regexCache.Store(pattern, re)
return re, nil
}| return out, nil | ||
| default: |
There was a problem hiding this comment.
Correctness & Efficiency: Prevent Type Mutation of Primitives
Currently, any primitive type other than string (such as int, bool, float64, or nil) falls into the default case of the type switch, triggering a full json.Marshal and json.Unmarshal cycle. For integer types, this decodes them as float64, causing type mutation that can break downstream code expecting concrete integer types. Handling primitive types directly in the switch avoids this mutation and eliminates serialization overhead.
| return out, nil | |
| default: | |
| return out, nil | |
| case int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, float32, float64, bool, nil: | |
| return data, nil | |
| default: |
| re, err := regexp.Compile(rule.Pattern) | ||
| if err != nil { | ||
| return "", fmt.Errorf("invalid pattern in rule %q: %w", rule.Name, err) | ||
| } |
There was a problem hiding this comment.
Performance: Use Cached Regex
Use the newly introduced getRegexp helper to retrieve the compiled regex from the cache instead of compiling it on every string evaluation.
| re, err := regexp.Compile(rule.Pattern) | |
| if err != nil { | |
| return "", fmt.Errorf("invalid pattern in rule %q: %w", rule.Name, err) | |
| } | |
| re, err := getRegexp(rule.Pattern) | |
| if err != nil { | |
| return "", fmt.Errorf("invalid pattern in rule %q: %w", rule.Name, err) | |
| } |
| func applyActionToString(action Action, val string) string { | ||
| switch action { | ||
| case MaskFull: | ||
| return strings.Repeat("*", len(val)) | ||
| case MaskPartial: | ||
| if len(val) <= 2 { | ||
| return strings.Repeat("*", len(val)) | ||
| } | ||
| visibleLen := len(val) / 2 | ||
| return val[:visibleLen] + strings.Repeat("*", len(val)-visibleLen) | ||
| case DenyField: | ||
| return "[DENIED]" | ||
| default: | ||
| return val // Unmask or unknown | ||
| } | ||
| } |
There was a problem hiding this comment.
Correctness: Handle Multi-Byte UTF-8 Characters Correctly
In Go, len(val) returns the number of bytes, not characters (runes). For multi-byte UTF-8 characters (e.g., accented characters, non-ASCII names, or emojis), strings.Repeat("*", len(val)) will produce more asterisks than the actual character count, leaking byte-length information.
Additionally, slicing a string using byte indices (val[:visibleLen]) can slice in the middle of a multi-byte UTF-8 character, producing invalid UTF-8 sequences. Converting the string to []rune and using utf8.RuneCountInString ensures correct character-based masking and slicing.
| func applyActionToString(action Action, val string) string { | |
| switch action { | |
| case MaskFull: | |
| return strings.Repeat("*", len(val)) | |
| case MaskPartial: | |
| if len(val) <= 2 { | |
| return strings.Repeat("*", len(val)) | |
| } | |
| visibleLen := len(val) / 2 | |
| return val[:visibleLen] + strings.Repeat("*", len(val)-visibleLen) | |
| case DenyField: | |
| return "[DENIED]" | |
| default: | |
| return val // Unmask or unknown | |
| } | |
| } | |
| func applyActionToString(action Action, val string) string { | |
| switch action { | |
| case MaskFull: | |
| return strings.Repeat("*", utf8.RuneCountInString(val)) | |
| case MaskPartial: | |
| runes := []rune(val) | |
| if len(runes) <= 2 { | |
| return strings.Repeat("*", len(runes)) | |
| } | |
| visibleLen := len(runes) / 2 | |
| return string(runes[:visibleLen]) + strings.Repeat("*", len(runes)-visibleLen) | |
| case DenyField: | |
| return "[DENIED]" | |
| default: | |
| return val // Unmask or unknown | |
| } | |
| } |
| if rule.Pattern != "" { | ||
| for k, v := range out { | ||
| if strVal, ok := v.(string); ok { | ||
| action := getActionForRule(rule, tier) | ||
| if action == Unmask { | ||
| continue | ||
| } | ||
| re, err := regexp.Compile(rule.Pattern) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("invalid pattern in rule %q: %w", rule.Name, err) | ||
| } | ||
| out[k] = re.ReplaceAllStringFunc(strVal, func(match string) string { | ||
| return applyActionToString(action, match) | ||
| }) | ||
| } | ||
| } | ||
| } | ||
| continue |
There was a problem hiding this comment.
Performance: Hoist Action Check and Regex Compilation
In applyToMap, the action and regex pattern are constant for the entire rule. Hoisting the action check and regex compilation outside of the map key iteration loop avoids redundant lookups and regex retrievals for every key in the map.
| if rule.Pattern != "" { | |
| for k, v := range out { | |
| if strVal, ok := v.(string); ok { | |
| action := getActionForRule(rule, tier) | |
| if action == Unmask { | |
| continue | |
| } | |
| re, err := regexp.Compile(rule.Pattern) | |
| if err != nil { | |
| return nil, fmt.Errorf("invalid pattern in rule %q: %w", rule.Name, err) | |
| } | |
| out[k] = re.ReplaceAllStringFunc(strVal, func(match string) string { | |
| return applyActionToString(action, match) | |
| }) | |
| } | |
| } | |
| } | |
| continue | |
| if rule.Pattern != "" { | |
| action := getActionForRule(rule, tier) | |
| if action != Unmask { | |
| re, err := getRegexp(rule.Pattern) | |
| if err != nil { | |
| return nil, fmt.Errorf("invalid pattern in rule %q: %w", rule.Name, err) | |
| } | |
| for k, v := range out { | |
| if strVal, ok := v.(string); ok { | |
| out[k] = re.ReplaceAllStringFunc(strVal, func(match string) string { | |
| return applyActionToString(action, match) | |
| }) | |
| } | |
| } | |
| } | |
| } | |
| continue |
1. Description
This PR implements #3373 by adding a policy-based automatic PII detection and dynamic masking layer. It intercepts tool results and dynamically masks PII based on role-based policies defined in
tools.yaml.2. PR Checklist
!if this involves a breaking change3. Issue Reference
Fixes #3373