Skip to content

Add XSS context analyzer with AST-based detection#7063

Open
odvcencio wants to merge 4 commits intoprojectdiscovery:devfrom
odvcencio:feat/xss-context-analyzer
Open

Add XSS context analyzer with AST-based detection#7063
odvcencio wants to merge 4 commits intoprojectdiscovery:devfrom
odvcencio:feat/xss-context-analyzer

Conversation

@odvcencio
Copy link

@odvcencio odvcencio commented Feb 27, 2026

Summary

Adds an XSS context analyzer that detects the precise injection context of reflected input in HTTP responses. This enables context-aware XSS payload generation in the fuzzer.

  • 17 distinct context types detected: HTML text, comments, attribute values (double/single/unquoted), event handlers (onclick/onload/etc.), URL attributes (href/src/action), JavaScript strings (double/single/template literal), JS expressions, JS comments (line/block), CSS values, CSS url(), and style attributes
  • Hybrid parsing approach: golang.org/x/net/html tokenizer for HTML-level context + gotreesitter AST parsing for JavaScript and CSS sub-context classification
  • Canary-based detection: Injects a unique random marker, sends the request, then locates all reflection points in the response body
  • Full test coverage: 31 tests covering all context types, real-world scenarios, edge cases, and the analyzer interface

How it works

  1. Generates a unique canary string (gtss + 8 random alphanum chars)
  2. Sets canary as the fuzz parameter value and sends the request
  3. Scans the response body for canary reflections
  4. For each reflection: classifies the HTML context, then sub-parses with JS or CSS grammars if inside <script> or <style> blocks
  5. Returns the detected context(s) in analyzer_details

Why gotreesitter for JS/CSS

Previous attempts at this feature used regex or flat tokenizers, which can't distinguish between:

  • <script>var x = "REFLECTED"</script> (string context — needs quote escape)
  • <script>var x = REFLECTED</script> (expression context — direct injection)
  • <script>// REFLECTED</script> (comment context — needs newline)

AST parsing with tree-sitter grammars handles these correctly via structural analysis rather than pattern matching.

Files

File Purpose
pkg/fuzz/analyzers/xss/context.go Context enum types and string mappings
pkg/fuzz/analyzers/xss/html_context.go HTML tokenizer-based context detection
pkg/fuzz/analyzers/xss/js_context.go JS sub-context via gotreesitter AST
pkg/fuzz/analyzers/xss/css_context.go CSS sub-context via gotreesitter AST
pkg/fuzz/analyzers/xss/analyzer.go Analyzer interface + registration
pkg/fuzz/analyzers/xss/analyzer_test.go 31 table-driven tests + benchmarks
pkg/protocols/http/http.go Blank import for auto-registration

Build

Requires the grammar_set_core build tag for gotreesitter grammars (includes HTML/JS/CSS, ~1MB):

go build -tags grammar_set_core ./...
go test -tags grammar_set_core ./pkg/fuzz/analyzers/xss/ -v

Test Results

All 31 tests pass:

=== RUN   TestDetermineContext (22 sub-tests: all 17 context types)          PASS
=== RUN   TestNoReflection                                                   PASS
=== RUN   TestMultipleReflections                                            PASS
=== RUN   TestGenerateCanary                                                 PASS
=== RUN   TestRealWorldReflections (6 sub-tests)                             PASS
=== RUN   TestContextStrings (9 sub-tests)                                   PASS

Real-World Test Scenarios

TestRealWorldReflections validates context detection against realistic HTML from actual web applications:

Scenario Reflections Contexts Detected
Search results page (query in heading + input) 2 html_text, attr_value_double_quoted
Error page (param in script var + message) 2 script_string_double, html_text
Profile page (username in 6 places) 6 css_url, url_attribute, html_text, url_attribute, attr_value_double_quoted, event_handler
SPA boot page (config in JSON init) 1 script_string_double
Comment form (input in textarea + hidden) 2 attr_value_double_quoted, html_text
Template literal + event handler 2 script_template_literal, event_handler

Benchmarks

goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) Ultra 9 285

BenchmarkFindReflections/html_text-20            1371783       961 ns/op      4424 B/op       8 allocs/op
BenchmarkFindReflections/attribute-20            1108882      1074 ns/op      4592 B/op      13 allocs/op
BenchmarkFindReflections/event_handler-20        1238641       961 ns/op      4528 B/op      12 allocs/op
BenchmarkFindReflections/script_string-20           1953    613302 ns/op    397377 B/op    4491 allocs/op
BenchmarkFindReflections/script_template-20         1845    624138 ns/op    397694 B/op    4491 allocs/op
BenchmarkFindReflections/css_value-20              10000    101334 ns/op    111953 B/op    1357 allocs/op
BenchmarkFindReflections/css_url-20                11804    100782 ns/op    112031 B/op    1359 allocs/op
BenchmarkFindReflections/multi_reflect-20           1551    750427 ns/op    504590 B/op    5844 allocs/op

HTML-only contexts (text, attribute, event handler) run in ~1μs. JS/CSS sub-parsing adds ~100-600μs for grammar loading — negligible for network-bound fuzzing.

Test plan

  • All 31 context detection tests pass (22 unit + 6 real-world + 3 utility)
  • Canary generation produces unique 12-char strings
  • Multiple reflections in single response detected correctly
  • No-reflection case returns false correctly
  • Real-world HTML scenarios with multiple mixed contexts all classified correctly
  • Full nuclei binary builds cleanly with grammar_set_core tag

/claim #5838

Summary by CodeRabbit

Release Notes

  • New Features

    • Added XSS context detection analyzer to identify injection points and classify contexts (HTML text, attributes, JavaScript, CSS, etc.) during analysis.
  • Tests

    • Added comprehensive test suite for XSS context detection.

- Add XSS context analyzer for the fuzzer that detects precise injection contexts
- Implement HTML tokenizer-based detection for element text, comments, and attributes
- Add JavaScript AST parsing using gotreesitter for script string/comment/expression context classification
- Add CSS AST parsing for style value and url() function detection
- Support detection of event handlers, URL attributes, and style attributes
- Implement unique canary generation for reliable reflection detection
- Register analyzer with fuzz framework via blank import in http protocol
- Include comprehensive tests covering all supported context types
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 27, 2026

Walkthrough

Adds a new XSS context analyzer for the fuzzer: it generates deterministic canaries, injects them into requests, sends requests, parses responses (HTML/JS/CSS) with Tree-sitter to find reflections, and classifies reflection contexts. Also adds tests, a dependency, and a side-effect import to register the analyzer.

Changes

Cohort / File(s) Summary
Dependency
go.mod
Added indirect dependency: github.com/odvcencio/gotreesitter v0.5.3-0.20260227083844-016686287e7c.
Analyzer core
pkg/fuzz/analyzers/xss/analyzer.go
New Analyzer implementing deterministic canary generation, request injection/rebuild, sending via HTTP client, response read (capped 10MB), reflection detection workflow, and analyzer registration via init.
Context types
pkg/fuzz/analyzers/xss/context.go
New XSSContext enum and string mappings for HTML, attribute, event, JS, CSS subcontexts and unknown.
HTML classification
pkg/fuzz/analyzers/xss/html_context.go
HTML tokenizer-based reflection finder that detects canary in text, comments, and attributes; infers quoting, identifies event/URL/style attributes, and delegates JS/CSS sub-classification.
JS classification
pkg/fuzz/analyzers/xss/js_context.go
Tree-sitter-based JS classifier (classifyJSContext) locating canary AST node and classifying string/template/comment/expression contexts with parsing guards and nesting limits.
CSS classification
pkg/fuzz/analyzers/xss/css_context.go
Tree-sitter-based CSS classifier (classifyCSSContext) that detects url(...) usage vs. plain CSS values for canary reflections.
Tests
pkg/fuzz/analyzers/xss/analyzer_test.go
Comprehensive table-driven unit tests and benchmarks covering HTML/attribute/JS/CSS contexts, multiple reflections, encoded canaries, and performance.
Integration
pkg/protocols/http/http.go
Added side-effect import to register the XSS analyzer: _ "github.com/projectdiscovery/nuclei/v3/pkg/fuzz/analyzers/xss".

Sequence Diagram

sequenceDiagram
    participant Fuzzer as Fuzzer
    participant Analyzer as XSS Analyzer
    participant HTTPClient as HTTP Client
    participant Response as HTTP Response
    participant Parser as Tree-sitter Parser

    Fuzzer->>Analyzer: Analyze(options)
    Analyzer->>Analyzer: generateCanary()
    Analyzer->>Analyzer: inject canary into request
    Analyzer->>HTTPClient: send rebuilt request
    HTTPClient->>Response: receive response
    Analyzer->>Response: read body (<=10MB)

    alt canary reflected
        Analyzer->>Parser: parse HTML (tokenizer)
        Parser-->>Analyzer: tokens / reflection positions
        Analyzer->>Analyzer: classify HTML contexts (text, comment, attr)
        Analyzer->>Parser: parse JS/CSS fragments as needed
        Parser-->>Analyzer: AST nodes for JS/CSS
        Analyzer-->>Fuzzer: (true, comma-separated contexts, nil)
    else not reflected
        Analyzer-->>Fuzzer: (false, "", nil)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hid a canary, bright and sly,
gtss peeks where tokens lie,
Parsers hop through HTML and script,
CSS and JS give their cryptic tip,
A rabbit cheers — contexts found, oh my!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.42% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: introducing an XSS context analyzer that uses AST-based detection methods.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/fuzz/analyzers/xss/js_context.go (1)

10-13: Unused parameter scriptOffset.

The scriptOffset parameter is documented as "not currently needed" but is still part of the function signature. Consider removing it or prefixing with _ to clarify it's intentionally unused.

♻️ Suggested fix
-// classifyJSContext parses JavaScript source and determines the sub-context
-// of a canary reflection. scriptOffset is the byte offset of the script content
-// within the original HTML document (used for error context, not currently needed).
-func classifyJSContext(jsSource []byte, canary string, scriptOffset uint32) XSSContext {
+// classifyJSContext parses JavaScript source and determines the sub-context
+// of a canary reflection.
+func classifyJSContext(jsSource []byte, canary string) XSSContext {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/js_context.go` around lines 10 - 13, The parameter
scriptOffset on function classifyJSContext is unused; either remove it from the
signature and all call sites (update callers of classifyJSContext) or mark it as
intentionally unused by renaming to _scriptOffset (or prefixing with _) in the
classifyJSContext declaration and any implementations to silence linter
warnings; update function comments to reflect the change and ensure references
to scriptOffset inside classifyJSContext (if any) are removed or adjusted.
pkg/fuzz/analyzers/xss/html_context.go (1)

162-200: Edge case: findQuoteContext may match wrong attribute occurrence.

If the same attribute key appears multiple times in the HTML (e.g., multiple value= assignments), this function may return the quoting context of the wrong occurrence. However, given the canary's uniqueness, this is unlikely to cause incorrect results in practice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/html_context.go` around lines 162 - 200,
findQuoteContext can pick the wrong occurrence when the same attribute key
appears multiple times; update the search to only accept a match if the found
key is a true attribute name and the specific occurrence's value matches
attr.Val. Concretely, in findQuoteContext(bodyStr, attr) ensure the character
before the found key is not an identifier character (so you match whole
attribute names), parse from the '=' to the attribute value boundary (handling
quotes) and only return the context when that parsed raw value contains
attr.Val; otherwise continue searching after the end of that value (not just
after '=') so you advance to the next occurrence. Use the existing symbols key,
pos, rawVal and attr.Val to implement these checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/fuzz/analyzers/xss/analyzer.go`:
- Around line 50-58: The HTTP response body returned by
options.HttpClient.Do(rebuilt) is not being closed, causing a resource leak;
update the logic in analyzer.go around the call to options.HttpClient.Do (where
resp, err := options.HttpClient.Do(rebuilt) and body, err :=
io.ReadAll(resp.Body) live) to ensure resp.Body.Close() is always
called—preferably by adding a defer resp.Body.Close() immediately after
confirming err == nil from the Do call so the body is closed whether ReadAll
succeeds or fails.

---

Nitpick comments:
In `@pkg/fuzz/analyzers/xss/html_context.go`:
- Around line 162-200: findQuoteContext can pick the wrong occurrence when the
same attribute key appears multiple times; update the search to only accept a
match if the found key is a true attribute name and the specific occurrence's
value matches attr.Val. Concretely, in findQuoteContext(bodyStr, attr) ensure
the character before the found key is not an identifier character (so you match
whole attribute names), parse from the '=' to the attribute value boundary
(handling quotes) and only return the context when that parsed raw value
contains attr.Val; otherwise continue searching after the end of that value (not
just after '=') so you advance to the next occurrence. Use the existing symbols
key, pos, rawVal and attr.Val to implement these checks.

In `@pkg/fuzz/analyzers/xss/js_context.go`:
- Around line 10-13: The parameter scriptOffset on function classifyJSContext is
unused; either remove it from the signature and all call sites (update callers
of classifyJSContext) or mark it as intentionally unused by renaming to
_scriptOffset (or prefixing with _) in the classifyJSContext declaration and any
implementations to silence linter warnings; update function comments to reflect
the change and ensure references to scriptOffset inside classifyJSContext (if
any) are removed or adjusted.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9142eae and 538753f.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (8)
  • go.mod
  • pkg/fuzz/analyzers/xss/analyzer.go
  • pkg/fuzz/analyzers/xss/analyzer_test.go
  • pkg/fuzz/analyzers/xss/context.go
  • pkg/fuzz/analyzers/xss/css_context.go
  • pkg/fuzz/analyzers/xss/html_context.go
  • pkg/fuzz/analyzers/xss/js_context.go
  • pkg/protocols/http/http.go

- Add benchmark test covering HTML text, attribute, event handler contexts
- Include script string/template, CSS value/URL, and multi-reflection scenarios
- Enable performance measurement across all XSS context detection patterns
- Add comprehensive test cases covering realistic HTML scenarios from actual web applications
- Test various XSS contexts including HTML text, attributes, URLs, CSS, scripts, and event handlers
- Validate context detection on complex patterns like SPAs, search results, error pages, and user profiles
- Ensure analyzer correctly identifies multiple reflection points within single responses
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/fuzz/analyzers/xss/analyzer_test.go (1)

152-158: Add a regression test for encoded-canary behavior.

Please add one case asserting that fully/heavily encoded canary forms are not treated as exploitable reflections in detection, so this contract stays protected.

Based on learnings: in pkg/fuzz/analyzers/xss/, detection should use exact-canary matching as phase 1, and not treat heavily encoded reflections as exploitable detection hits.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer_test.go` around lines 152 - 158, Add a
regression test alongside TestNoReflection that verifies heavily/fully encoded
canary forms are not treated as exploitable reflections: create a new test
(e.g., TestEncodedCanaryNotMatched) that constructs HTML containing encoded
variants of testCanary (use findReflections([]byte(...), testCanary)) and assert
len(points) == 0; this ensures the detection in findReflections uses
exact-canary matching and does not flag encoded canary payloads as hits.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/fuzz/analyzers/xss/analyzer_test.go`:
- Around line 190-192: The test slices c1[:4] directly which can panic if the
canary is shorter; update the assertion in
pkg/fuzz/analyzers/xss/analyzer_test.go to first guard the length (e.g., assert
len(c1) >= 4 and fail the test with a clear message if not) or replace the slice
check with a safe strings.HasPrefix(c1, "gtss") call so the test fails cleanly
instead of panicking; reference the variable c1 and the existing prefix check to
locate where to change.

---

Nitpick comments:
In `@pkg/fuzz/analyzers/xss/analyzer_test.go`:
- Around line 152-158: Add a regression test alongside TestNoReflection that
verifies heavily/fully encoded canary forms are not treated as exploitable
reflections: create a new test (e.g., TestEncodedCanaryNotMatched) that
constructs HTML containing encoded variants of testCanary (use
findReflections([]byte(...), testCanary)) and assert len(points) == 0; this
ensures the detection in findReflections uses exact-canary matching and does not
flag encoded canary payloads as hits.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02b04d1 and 1884b13.

📒 Files selected for processing (1)
  • pkg/fuzz/analyzers/xss/analyzer_test.go

@neo-by-projectdiscovery-dev
Copy link

neo-by-projectdiscovery-dev bot commented Feb 27, 2026

Neo - PR Security Review

High: 1 · Medium: 1

Highlights

  • Adds XSS context analyzer with AST-based detection using gotreesitter for HTML/JS/CSS parsing
  • Implements 17 distinct context types including HTML text, attributes, event handlers, script strings, and CSS values
  • Uses canary-based reflection detection with unique random markers to locate injection points in HTTP responses
  • Includes comprehensive test coverage with 25 tests covering all context types and edge cases
High (1)
  • Unbounded response body read enables memory exhaustion DoSpkg/fuzz/analyzers/xss/analyzer.go:55
    The analyzer reads the entire HTTP response body into memory using io.ReadAll without verifying that response size limits are enforced. While nuclei has MaxBodyRead protections at the protocol layer (default 10MB), the analyzer makes its own HTTP request via options.HttpClient.Do() and directly calls ReadAll on the response body. If the HTTP client is misconfigured or lacks MaxRespBodySize enforcement, a malicious server could return gigabytes of data causing memory exhaustion.
Medium (1)
  • Gotreesitter parser DoS from malicious HTML/JS/CSS structurespkg/fuzz/analyzers/xss/js_context.go:16
    The gotreesitter parser processes untrusted HTML/JS/CSS content from HTTP responses without timeout or resource limits. The parser.Parse() calls in js_context.go:16 and css_context.go:15 can hang indefinitely or consume excessive CPU/memory when processing specially crafted deeply nested structures. Known tree-sitter issues include indefinite hangs on malicious JavaScript (tree-sitter-javascript fixing goreleaser target #322) and timeout enforcement not applying during tree balancing (tree-sitter Update README_CN.md #4019).
Security Impact

Unbounded response body read enables memory exhaustion DoS (pkg/fuzz/analyzers/xss/analyzer.go:55):
An attacker controlling a target web server can cause nuclei to allocate unbounded memory by returning a multi-gigabyte response when the XSS analyzer sends its canary request. This leads to out-of-memory conditions and denial of service for the nuclei process. The attack requires the attacker to control the HTTP server being scanned, which is a realistic scenario for XSS detection workflows.

Gotreesitter parser DoS from malicious HTML/JS/CSS structures (pkg/fuzz/analyzers/xss/js_context.go:16):
An attacker can cause a nuclei worker goroutine to hang or consume excessive CPU by returning HTTP responses containing deeply nested JS/CSS (e.g., 50,000 levels of nested arrays or function calls). While error handling returns default contexts on parse failures, a hang would block that worker indefinitely. This reduces nuclei's scanning throughput and can lead to resource exhaustion if many workers encounter such payloads.

Attack Examples

Unbounded response body read enables memory exhaustion DoS (pkg/fuzz/analyzers/xss/analyzer.go:55):

Attacker's malicious server responds to the canary request with Transfer-Encoding: chunked and sends 10GB of data in chunks. The io.ReadAll call attempts to buffer all 10GB in memory, causing the nuclei process to crash with OOM.

Gotreesitter parser DoS from malicious HTML/JS/CSS structures (pkg/fuzz/analyzers/xss/js_context.go:16):

Attacker returns response with <script> tag containing 50,000 levels of nested arrays: [[[[...]]]] or deeply nested function calls. The gotreesitter parser attempts to build an AST with 50,000 depth, exhausting stack space or hanging in tree balancing operations that don't respect timeouts.
Suggested Fixes

Unbounded response body read enables memory exhaustion DoS (pkg/fuzz/analyzers/xss/analyzer.go:55):

Wrap resp.Body with io.LimitReader before calling ReadAll: body, err := io.ReadAll(io.LimitReader(resp.Body, maxBodyLimit)) where maxBodyLimit is derived from nuclei's MaxBodyRead constant or a configurable analyzer parameter. This ensures the analyzer respects size limits regardless of HTTP client configuration.

Gotreesitter parser DoS from malicious HTML/JS/CSS structures (pkg/fuzz/analyzers/xss/js_context.go:16):

Implement a timeout mechanism for parser.Parse() calls using context.WithTimeout and recover from panics: ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second); defer cancel(); use a goroutine with select on ctx.Done() to abort parsing. Alternatively, add depth/size pre-validation: reject JS/CSS content >100KB or with nesting depth >500 before parsing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer.go` at line 55, replace the unbounded
`io.ReadAll(resp.Body)` call with a size-limited read: `body, err :=
io.ReadAll(io.LimitReader(resp.Body, 10*1024*1024))` to prevent memory
exhaustion from malicious servers returning gigabyte-sized responses. Import the
io package if not already imported. Consider making the 10MB limit configurable
via analyzer parameters.

In `@pkg/fuzz/analyzers/xss/js_context.go` around lines 14-19, add timeout
protection for gotreesitter parsing: wrap the parser.Parse(jsSource) call in a
goroutine with context.WithTimeout(5*time.Second), and return
ContextScriptExpression if the timeout is exceeded. Also add size validation
before parsing: if len(jsSource) > 100*1024, return ContextScriptExpression
immediately. Apply the same pattern to css_context.go:15.
Hardening Notes
  • Add explicit response size validation before io.ReadAll at analyzer.go:55 using io.LimitReader with MaxBodyRead limit (10MB) to prevent memory exhaustion from malicious servers
  • Implement parser timeout mechanism for gotreesitter Parse() calls in js_context.go:16 and css_context.go:15 using context.WithTimeout to prevent indefinite hangs on malicious nested structures
  • Consider adding pre-parse validation in js_context.go:13 and css_context.go:12 to reject excessively large inputs (>100KB) before passing to gotreesitter
  • Add depth limit checks in HTML tokenizer at html_context.go:48 to reject responses with excessive tag nesting (>1000 depth) before context classification
  • Document the required grammar_set_core build tag more prominently in analyzer.go package comment to prevent silent failures when compiled without grammars

Comment @neo help for available commands. · Open in Neo

return false, "", errors.Wrap(err, "could not send canary request")
}

body, err := io.ReadAll(resp.Body)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Unbounded response body read enables memory exhaustion DoS (CWE-770) — The analyzer reads the entire HTTP response body into memory using io.ReadAll without verifying that response size limits are enforced. While nuclei has MaxBodyRead protections at the protocol layer (default 10MB), the analyzer makes its own HTTP request via options.HttpClient.Do() and directly calls ReadAll on the response body. If the HTTP client is misconfigured or lacks MaxRespBodySize enforcement, a malicious server could return gigabytes of data causing memory exhaustion.

Attack Example
Attacker's malicious server responds to the canary request with Transfer-Encoding: chunked and sends 10GB of data in chunks. The io.ReadAll call attempts to buffer all 10GB in memory, causing the nuclei process to crash with OOM.
Suggested Fix
Wrap resp.Body with io.LimitReader before calling ReadAll: body, err := io.ReadAll(io.LimitReader(resp.Body, maxBodyLimit)) where maxBodyLimit is derived from nuclei's MaxBodyRead constant or a configurable analyzer parameter. This ensures the analyzer respects size limits regardless of HTTP client configuration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer.go` at line 55, replace the unbounded
`io.ReadAll(resp.Body)` call with a size-limited read: `body, err :=
io.ReadAll(io.LimitReader(resp.Body, 10*1024*1024))` to prevent memory
exhaustion from malicious servers returning gigabyte-sized responses. Import the
io package if not already imported. Consider making the 10MB limit configurable
via analyzer parameters.

func classifyJSContext(jsSource []byte, canary string, scriptOffset uint32) XSSContext {
lang := grammars.JavascriptLanguage()
parser := gotreesitter.NewParser(lang)
tree, err := parser.Parse(jsSource)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Gotreesitter parser DoS from malicious HTML/JS/CSS structures (CWE-400) — The gotreesitter parser processes untrusted HTML/JS/CSS content from HTTP responses without timeout or resource limits. The parser.Parse() calls in js_context.go:16 and css_context.go:15 can hang indefinitely or consume excessive CPU/memory when processing specially crafted deeply nested structures. Known tree-sitter issues include indefinite hangs on malicious JavaScript (tree-sitter-javascript #322) and timeout enforcement not applying during tree balancing (tree-sitter #4019).

Attack Example
Attacker returns response with <script> tag containing 50,000 levels of nested arrays: [[[[...]]]] or deeply nested function calls. The gotreesitter parser attempts to build an AST with 50,000 depth, exhausting stack space or hanging in tree balancing operations that don't respect timeouts.
Suggested Fix
Implement a timeout mechanism for parser.Parse() calls using context.WithTimeout and recover from panics: ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second); defer cancel(); use a goroutine with select on ctx.Done() to abort parsing. Alternatively, add depth/size pre-validation: reject JS/CSS content >100KB or with nesting depth >500 before parsing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/js_context.go` around lines 14-19, add timeout
protection for gotreesitter parsing: wrap the parser.Parse(jsSource) call in a
goroutine with context.WithTimeout(5*time.Second), and return
ContextScriptExpression if the timeout is exceeded. Also add size validation
before parsing: if len(jsSource) > 100*1024, return ContextScriptExpression
immediately. Apply the same pattern to css_context.go:15.

… and resource limits

- Add 10MB response body size limit to prevent OOM on large responses
- Add panic recovery and size limits to CSS/JS tree-sitter parsers (100KB limit)
- Fix HTML attribute quote context matching for complex edge cases
- Add test coverage for encoded canary rejection and attribute boundary cases
- Add defer resp.Body.Close() to fix potential resource leak
- Optimize canary prefix check in tests using strings.HasPrefix
- Add exceedsNestingDepth guard against deeply nested parser inputs
- Improve attribute name matching with whole-word detection
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/fuzz/analyzers/xss/html_context.go (1)

162-250: Consider edge case: multiple attributes with the same name.

findQuoteContext iterates through the raw HTML to find the attribute and determine its quoting style. If the same attribute name appears multiple times in the document (possibly in different tags), the search may match the wrong occurrence. This could lead to incorrect quote context classification.

The current approach is pragmatic for most real-world cases, but be aware this could cause misclassification in pathological HTML with repeated attribute names across elements.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/html_context.go` around lines 162 - 250,
findQuoteContext can pick the wrong attribute instance when the same attribute
name appears multiple times; update the search so after locating a candidate key
(keyStart/keyEnd) you also confirm the key/value belong to the same tag and
correspond to this html.Attribute: (1) ensure there is no '<' between keyStart
and valueEnd (and that there is a matching '>' after valueEnd) to guarantee the
pair is inside one element, and (2) when checking rawVal, verify attr.Val
actually appears inside rawVal and use the position of attr.Val inside rawVal
(not just strings.Contains) to tie the matched value to this key occurrence
before returning ContextAttrValue*. Apply these checks in findQuoteContext where
keyStart/keyEnd, pos, valueStart, valueEnd and rawVal are computed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/fuzz/analyzers/xss/html_context.go`:
- Around line 162-250: findQuoteContext can pick the wrong attribute instance
when the same attribute name appears multiple times; update the search so after
locating a candidate key (keyStart/keyEnd) you also confirm the key/value belong
to the same tag and correspond to this html.Attribute: (1) ensure there is no
'<' between keyStart and valueEnd (and that there is a matching '>' after
valueEnd) to guarantee the pair is inside one element, and (2) when checking
rawVal, verify attr.Val actually appears inside rawVal and use the position of
attr.Val inside rawVal (not just strings.Contains) to tie the matched value to
this key occurrence before returning ContextAttrValue*. Apply these checks in
findQuoteContext where keyStart/keyEnd, pos, valueStart, valueEnd and rawVal are
computed.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1884b13 and ffb78c0.

📒 Files selected for processing (6)
  • pkg/fuzz/analyzers/xss/analyzer.go
  • pkg/fuzz/analyzers/xss/analyzer_test.go
  • pkg/fuzz/analyzers/xss/context.go
  • pkg/fuzz/analyzers/xss/css_context.go
  • pkg/fuzz/analyzers/xss/html_context.go
  • pkg/fuzz/analyzers/xss/js_context.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/fuzz/analyzers/xss/analyzer_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant