feat: add XSS context analyzer for fuzzing engine#7076
feat: add XSS context analyzer for fuzzing engine#7076dejan-v1007 wants to merge 1 commit intoprojectdiscovery:devfrom
Conversation
Implements an XSS context analyzer that integrates with nuclei's fuzzing pipeline. The analyzer: - Injects a canary string with XSS-critical characters (<>"'/) to detect reflection and character survival in HTTP responses - Uses golang.org/x/net/html tokenizer to classify reflection context into 8 types: HTMLText, Attribute, AttributeUnquoted, Script, ScriptString, Style, HTMLComment, and None - Selects context-appropriate XSS payloads filtered by which special characters survive server-side encoding - Replays payloads and verifies unencoded reflection to confirm XSS - Reports CSP header presence as a note on exploitability New files: pkg/fuzz/analyzers/xss/types.go - context types and constants pkg/fuzz/analyzers/xss/context.go - HTML tokenizer context detection pkg/fuzz/analyzers/xss/analyzer.go - main analyzer with replay logic pkg/fuzz/analyzers/xss/context_test.go - comprehensive tests Modified files: pkg/fuzz/analyzers/analyzers.go - add response fields to Options pkg/protocols/http/http.go - register xss analyzer via import pkg/protocols/http/request.go - pass response data to analyzer pkg/protocols/http/request_fuzz.go - init nil Parameters map Closes projectdiscovery#5838
Neo - PR Security ReviewHigh: 1 Highlights
High (1)
Security ImpactUnbounded Memory Consumption in XSS Analyzer Replay ( Attack ExamplesUnbounded Memory Consumption in XSS Analyzer Replay ( Suggested FixesUnbounded Memory Consumption in XSS Analyzer Replay ( 🤖 Prompt for AI AgentsHardening Notes
Comment |
WalkthroughThis PR introduces an XSS context analyzer that detects cross-site scripting vulnerabilities through canary injection and context-aware payload replay. It extends the fuzzing framework with concurrency-safe random utilities and HTTP response capture capabilities while adding comprehensive HTML parsing and reflection context detection. Changes
Sequence DiagramsequenceDiagram
participant Fuzzer as Fuzzer Engine
participant XSSAnalyzer as XSS Analyzer
participant Parser as HTML Parser
participant HTTPClient as HTTP Client
participant Detector as Context Detector
Fuzzer->>XSSAnalyzer: ApplyInitialTransformation(data, params)
XSSAnalyzer->>XSSAnalyzer: Inject canary [XSS_CANARY]
XSSAnalyzer-->>Fuzzer: Return transformed data
Fuzzer->>XSSAnalyzer: Analyze(options)
XSSAnalyzer->>HTTPClient: Get HTTP response body & headers
HTTPClient-->>XSSAnalyzer: Response data
XSSAnalyzer->>Parser: Parse HTML response
Parser->>Detector: DetectReflections(body, canary)
Detector-->>Parser: []ReflectionInfo with contexts
XSSAnalyzer->>Detector: BestReflection(reflections)
Detector-->>XSSAnalyzer: Priority-ranked reflection
XSSAnalyzer->>XSSAnalyzer: Select context-appropriate payloads
loop For each payload
XSSAnalyzer->>XSSAnalyzer: replayAndVerify(payload, reflection)
XSSAnalyzer->>HTTPClient: Inject payload & send request
HTTPClient-->>XSSAnalyzer: Response with payload
XSSAnalyzer->>XSSAnalyzer: Check if payload reflected unencoded
end
XSSAnalyzer-->>Fuzzer: (vulnerability detected, context, details)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
pkg/fuzz/analyzers/xss/analyzer.go (1)
60-67: Prefer deriving canary fromoptions.FuzzGenerated.Valuefor per-request robustness.Fetching canary from mutable analyzer parameters is more fragile than extracting from the generated fuzz value, especially across fuzzing modes and concurrent executions.
Based on learnings: In Go code related to fuzz analyzers (e.g., XSS analyzer in pkg/fuzz/analyzers), extract the canary from FuzzGenerated.Value (the final reflected string) rather than from GeneratedRequest.OriginalPayload. This ensures compatibility across all fuzzing modes, including KV-mode, and makes the canary source consistent and robust for all fuzz runs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/fuzz/analyzers/xss/analyzer.go` around lines 60 - 67, Replace the fragile lookup of the XSS canary from options.AnalyzerParameters with a derivation from options.FuzzGenerated.Value (the final reflected string); specifically, in the XSS analyzer code around the canary determination (the block that currently reads options.AnalyzerParameters["xss_canary"]), attempt to read and assert options.FuzzGenerated.Value as a string and use that as the canary, fall back to the AnalyzerParameters value only if FuzzGenerated.Value is empty or not a string, and keep the early return if the resulting canary is empty so behavior remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/fuzz/analyzers/xss/analyzer.go`:
- Around line 219-226: detectCharacterSurvival is currently checking for
sequences like canary+"<>" or canary+`<>"'` which makes detection
order-dependent and misses cases where an earlier char is encoded but a later
one survives; update detectCharacterSurvival to test each target character
independently using strings.Contains(body, canary+targetChar) (e.g. check
canary+"<", canary+">", canary+`"`, canary+"'", canary+"/") instead of compound
concatenations, and for robustness also check common percent-encoded variants
(e.g. canary+"%3C", canary+"%3E", etc.) when setting the CharacterSet fields
(refer to detectCharacterSurvival and the CharacterSet fields LessThan,
GreaterThan, DoubleQuote, SingleQuote, ForwardSlash).
In `@pkg/fuzz/analyzers/xss/context.go`:
- Around line 236-253: detectAttrQuoting is matching attrName+"=" as a raw
substring which can hit partial names (e.g. data-value vs value) and won't
handle spaces around '='; update detectAttrQuoting to find the actual attribute
token instead of a blind substring: locate a case-insensitive whole-attribute
match for attrName as a token (use a regex like case-insensitive word-boundary
or parse attributes) that allows optional whitespace before/after '=' and then
examine the first non-whitespace character after '=' to decide between '"' '\''
or unquoted (0); keep the function signature and return semantics (byte, bool)
and ensure you do not treat partial matches as valid.
- Around line 12-18: The early case-sensitive pre-check using
strings.Contains(body, marker) can short-circuit mixed-case reflections; change
it to a case-insensitive check (e.g., compare strings.ToLower(body) against
markerLower) or remove the pre-check entirely so subsequent case-insensitive
logic runs; update the code around body, marker, markerLower, and reflections so
mixed-case markers are not dropped.
In `@pkg/protocols/http/request_fuzz.go`:
- Around line 150-157: input.AnalyzerParams is set to the same map instance as
request.Analyzer.Parameters which allows analyzers (via analyzers.GetAnalyzer
and analyzer.ApplyInitialTransformation) to mutate shared request state; change
this to assign a new map copy (clone) into input.AnalyzerParams after ensuring
request.Analyzer.Parameters is non-nil so each execution gets its own map
instance and mutations won't bleed back into request.Analyzer.Parameters or
other executions.
---
Nitpick comments:
In `@pkg/fuzz/analyzers/xss/analyzer.go`:
- Around line 60-67: Replace the fragile lookup of the XSS canary from
options.AnalyzerParameters with a derivation from options.FuzzGenerated.Value
(the final reflected string); specifically, in the XSS analyzer code around the
canary determination (the block that currently reads
options.AnalyzerParameters["xss_canary"]), attempt to read and assert
options.FuzzGenerated.Value as a string and use that as the canary, fall back to
the AnalyzerParameters value only if FuzzGenerated.Value is empty or not a
string, and keep the early return if the resulting canary is empty so behavior
remains unchanged.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
pkg/fuzz/analyzers/analyzers.gopkg/fuzz/analyzers/xss/analyzer.gopkg/fuzz/analyzers/xss/context.gopkg/fuzz/analyzers/xss/context_test.gopkg/fuzz/analyzers/xss/types.gopkg/protocols/http/http.gopkg/protocols/http/request.gopkg/protocols/http/request_fuzz.go
| func detectCharacterSurvival(body string, canary string) CharacterSet { | ||
| return CharacterSet{ | ||
| LessThan: strings.Contains(body, canary+"<"), | ||
| GreaterThan: strings.Contains(body, canary+"<>") || strings.Contains(body, canary+">"), | ||
| DoubleQuote: strings.Contains(body, canary+`<>"`), | ||
| SingleQuote: strings.Contains(body, canary+`<>"'`), | ||
| ForwardSlash: strings.Contains(body, canary+canaryChars), // full canary+chars survived | ||
| } |
There was a problem hiding this comment.
Character survival detection is order-dependent and misses partial-encoding cases.
Current checks require earlier chars to survive for later chars to be marked. Example: if < is encoded but > survives, GreaterThan can still evaluate false.
💡 Suggested fix
func detectCharacterSurvival(body string, canary string) CharacterSet {
+ idx := strings.Index(body, canary)
+ if idx == -1 {
+ return CharacterSet{}
+ }
+ suffix := body[idx+len(canary):]
+ // Keep analysis local to the injected canary tail.
+ if len(suffix) > 32 {
+ suffix = suffix[:32]
+ }
+
return CharacterSet{
- LessThan: strings.Contains(body, canary+"<"),
- GreaterThan: strings.Contains(body, canary+"<>") || strings.Contains(body, canary+">"),
- DoubleQuote: strings.Contains(body, canary+`<>"`),
- SingleQuote: strings.Contains(body, canary+`<>"'`),
- ForwardSlash: strings.Contains(body, canary+canaryChars), // full canary+chars survived
+ LessThan: strings.Contains(suffix, "<"),
+ GreaterThan: strings.Contains(suffix, ">"),
+ DoubleQuote: strings.Contains(suffix, `"`),
+ SingleQuote: strings.Contains(suffix, `'`),
+ ForwardSlash: strings.Contains(suffix, `/`),
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/fuzz/analyzers/xss/analyzer.go` around lines 219 - 226,
detectCharacterSurvival is currently checking for sequences like canary+"<>" or
canary+`<>"'` which makes detection order-dependent and misses cases where an
earlier char is encoded but a later one survives; update detectCharacterSurvival
to test each target character independently using strings.Contains(body,
canary+targetChar) (e.g. check canary+"<", canary+">", canary+`"`, canary+"'",
canary+"/") instead of compound concatenations, and for robustness also check
common percent-encoded variants (e.g. canary+"%3C", canary+"%3E", etc.) when
setting the CharacterSet fields (refer to detectCharacterSurvival and the
CharacterSet fields LessThan, GreaterThan, DoubleQuote, SingleQuote,
ForwardSlash).
| if !strings.Contains(body, marker) { | ||
| return nil | ||
| } | ||
|
|
||
| var reflections []ReflectionInfo | ||
| markerLower := strings.ToLower(marker) | ||
|
|
There was a problem hiding this comment.
Fix the pre-check so mixed-case reflections are not dropped.
The early strings.Contains(body, marker) is case-sensitive and can return nil before the later case-insensitive checks run.
💡 Suggested fix
func DetectReflections(body string, marker string) []ReflectionInfo {
- if !strings.Contains(body, marker) {
+ if body == "" || marker == "" {
+ return nil
+ }
+ markerLower := strings.ToLower(marker)
+ if !strings.Contains(strings.ToLower(body), markerLower) {
return nil
}
var reflections []ReflectionInfo
- markerLower := strings.ToLower(marker)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/fuzz/analyzers/xss/context.go` around lines 12 - 18, The early
case-sensitive pre-check using strings.Contains(body, marker) can short-circuit
mixed-case reflections; change it to a case-insensitive check (e.g., compare
strings.ToLower(body) against markerLower) or remove the pre-check entirely so
subsequent case-insensitive logic runs; update the code around body, marker,
markerLower, and reflections so mixed-case markers are not dropped.
| func detectAttrQuoting(rawToken, attrName string) (byte, bool) { | ||
| attrAssign := attrName + "=" | ||
| rawLower := strings.ToLower(rawToken) | ||
| idx := strings.Index(rawLower, attrAssign) | ||
| if idx < 0 { | ||
| return '"', false // default to double-quoted | ||
| } | ||
| afterEq := idx + len(attrAssign) | ||
| if afterEq >= len(rawToken) { | ||
| return '"', false | ||
| } | ||
| switch rawToken[afterEq] { | ||
| case '"': | ||
| return '"', false | ||
| case '\'': | ||
| return '\'', false | ||
| default: | ||
| return 0, true |
There was a problem hiding this comment.
detectAttrQuoting can classify the wrong attribute token.
Matching attrName + "=" via raw substring is fragile: it can hit substrings (e.g. data-value vs value) and misses valid forms with spaces around =. That can misclassify quoted/unquoted context and pick wrong payload sets.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/fuzz/analyzers/xss/context.go` around lines 236 - 253, detectAttrQuoting
is matching attrName+"=" as a raw substring which can hit partial names (e.g.
data-value vs value) and won't handle spaces around '='; update
detectAttrQuoting to find the actual attribute token instead of a blind
substring: locate a case-insensitive whole-attribute match for attrName as a
token (use a regex like case-insensitive word-boundary or parse attributes) that
allows optional whitespace before/after '=' and then examine the first
non-whitespace character after '=' to decide between '"' '\'' or unquoted (0);
keep the function signature and return semantics (byte, bool) and ensure you do
not treat partial matches as valid.
| if request.Analyzer != nil { | ||
| analyzer := analyzers.GetAnalyzer(request.Analyzer.Name) | ||
| input.ApplyPayloadInitialTransformation = analyzer.ApplyInitialTransformation | ||
| if request.Analyzer.Parameters == nil { | ||
| request.Analyzer.Parameters = make(map[string]interface{}) | ||
| } | ||
| input.AnalyzerParams = request.Analyzer.Parameters | ||
| } |
There was a problem hiding this comment.
Isolate analyzer params per execution to prevent mutable state bleed.
input.AnalyzerParams currently points to request.Analyzer.Parameters. Since analyzers can mutate params during transformation, this shares mutable state across executions and can overwrite per-request analyzer state.
💡 Suggested fix
if request.Analyzer != nil {
analyzer := analyzers.GetAnalyzer(request.Analyzer.Name)
input.ApplyPayloadInitialTransformation = analyzer.ApplyInitialTransformation
- if request.Analyzer.Parameters == nil {
- request.Analyzer.Parameters = make(map[string]interface{})
- }
- input.AnalyzerParams = request.Analyzer.Parameters
+ params := make(map[string]interface{}, len(request.Analyzer.Parameters))
+ for k, v := range request.Analyzer.Parameters {
+ params[k] = v
+ }
+ input.AnalyzerParams = params
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if request.Analyzer != nil { | |
| analyzer := analyzers.GetAnalyzer(request.Analyzer.Name) | |
| input.ApplyPayloadInitialTransformation = analyzer.ApplyInitialTransformation | |
| if request.Analyzer.Parameters == nil { | |
| request.Analyzer.Parameters = make(map[string]interface{}) | |
| } | |
| input.AnalyzerParams = request.Analyzer.Parameters | |
| } | |
| if request.Analyzer != nil { | |
| analyzer := analyzers.GetAnalyzer(request.Analyzer.Name) | |
| input.ApplyPayloadInitialTransformation = analyzer.ApplyInitialTransformation | |
| params := make(map[string]interface{}, len(request.Analyzer.Parameters)) | |
| for k, v := range request.Analyzer.Parameters { | |
| params[k] = v | |
| } | |
| input.AnalyzerParams = params | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/protocols/http/request_fuzz.go` around lines 150 - 157,
input.AnalyzerParams is set to the same map instance as
request.Analyzer.Parameters which allows analyzers (via analyzers.GetAnalyzer
and analyzer.ApplyInitialTransformation) to mutate shared request state; change
this to assign a new map copy (clone) into input.AnalyzerParams after ensuring
request.Analyzer.Parameters is non-nil so each execution gets its own map
instance and mutations won't bleed back into request.Analyzer.Parameters or
other executions.
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| respBody, err := io.ReadAll(resp.Body) |
There was a problem hiding this comment.
🟠 Unbounded Memory Consumption in XSS Analyzer Replay (CWE-770) — The replayAndVerify function reads HTTP response bodies using io.ReadAll without any size limit. An attacker controlling a malicious web server can return multi-gigabyte responses, causing Nuclei to consume unbounded memory and crash.
Attack Example
1. Attacker sets up a malicious web server that reflects the XSS canary
2. Nuclei's XSS analyzer detects reflection and initiates payload replay
3. Malicious server returns a 10GB HTML response with the payload reflected
4. io.ReadAll(resp.Body) attempts to allocate 10GB of memory
5. Nuclei crashes with OOM error, scanner stops functioning
Suggested Fix
Use io.LimitReader to cap response body size. Example:
maxReplayBodySize := 10 * 1024 * 1024 // 10MB
respBody, err := io.ReadAll(io.LimitReader(resp.Body, int64(maxReplayBodySize)))
This matches the pattern used elsewhere in the codebase (see pkg/protocols/http/request.go:946 where MaxBodyRead = 10MB).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/fuzz/analyzers/xss/analyzer.go` at line 153, replace the unbounded
`io.ReadAll(resp.Body)` with a size-limited read using `io.LimitReader`. Add a
constant `maxReplayBodySize = 10 * 1024 * 1024` at the package level, then
change line 153 to: `respBody, err := io.ReadAll(io.LimitReader(resp.Body,
int64(maxReplayBodySize)))` to prevent unbounded memory allocation from
attacker-controlled response sizes.
Summary
Implements an XSS context analyzer (
xss_context) that integrates with nuclei's existing fuzzing pipeline, following the same pattern as thetime_delayanalyzer.How it works
[XSS_CANARY]with a unique canary string appended with XSS-critical characters (<>"'/) to detect reflection and character survivalgolang.org/x/net/htmltokenizer to classify where the canary is reflected into one of 8 context types:HTMLText— between HTML tagsAttribute— inside a quoted attribute valueAttributeUnquoted— inside an unquoted attribute valueScript— inside a<script>block (not in a string)ScriptString— inside a JavaScript string literalStyle— inside a<style>blockHTMLComment— inside an HTML commentNone— no actionable context<,>,",',/) survive server-side encodingFiles
New files:
pkg/fuzz/analyzers/xss/types.go— context types, character set, event handler mappkg/fuzz/analyzers/xss/context.go— HTML tokenizer-based context detectionpkg/fuzz/analyzers/xss/analyzer.go— main analyzer with canary injection, replay, and verificationpkg/fuzz/analyzers/xss/context_test.go— 30+ test cases and 2 benchmarksModified files:
pkg/fuzz/analyzers/analyzers.go— addedResponseBody,ResponseHeaders,ResponseStatusCodetoOptions; made random thread-safepkg/protocols/http/http.go— blank import to register the xss analyzerpkg/protocols/http/request.go— pass HTTP response data to analyzerpkg/protocols/http/request_fuzz.go— initialize nilParametersmap for canary storageUsage
Test plan
/claim #5838
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Tests
Refactor