feat: switch to pure-go javascript endpoint extraction using goja#1529
feat: switch to pure-go javascript endpoint extraction using goja#1529t6harsh wants to merge 2 commits intoprojectdiscovery:devfrom
Conversation
WalkthroughRemoved platform-specific parser file and build constraints; replaced external jsluice analyzer with a pure-Go AST-based JavaScript endpoint extractor using goja and added tests; updated go.mod to add goja and sourcemap and remove several jsluice-related dependencies. Changes
Sequence DiagramsequenceDiagram
participant Code as "JavaScript Code"
participant Preprocess as "Preprocess ES6"
participant Engine as "Goja Engine"
participant AST as "AST Walker"
participant Classifier as "Type Classifier"
participant Fallback as "Regex Fallback"
participant Result as "JSLuiceEndpoint"
Code->>Preprocess: Raw JS (may include imports/exports)
Preprocess->>Engine: Cleaned JS source
Engine->>AST: Parse into AST and traverse
alt AST parse success
AST->>Classifier: Emit URL-like strings with context
Classifier->>Result: Endpoint + Type (fetch/xhr/axios/import/...)
else AST parse failure
Engine->>Fallback: Provide raw JS for regex matching
Fallback->>Result: Endpoint + Type "regex"
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@pkg/utils/jsluice.go`:
- Around line 396-400: The isXHROpen function mixes a case-insensitive suffix
check (using lower) with a case-sensitive exclusion (funcName != "window.open"),
causing names like "Window.open" to slip through; fix by comparing the exclusion
against the lowercased name (e.g., use lower != "window.open") so both the
suffix check and the "window.open" exclusion are performed case-insensitively
while keeping the existing logic in isXHROpen.
- Around line 377-394: classifyCallType currently treats any function name
containing "open" as "xhr", which misclassifies "window.open"; update
classifyCallType to check for "window.open" (e.g., strings.Contains(lower,
"window.open") or exact match) before the generic strings.Contains(lower,
"open") case and return a distinct type like "window_open" (so downstream
fmt.Sprintf("jsluice-%s", item.Type) labels it correctly); modify the switch in
classifyCallType (and keep isXHROpen logic unchanged) so "window.open" is
handled first and does not fall through to the "xhr" branch.
- Around line 84-161: The expression walker is missing handling for optional
chaining wrappers; update the walkExpression function to add cases for
*ast.Optional and *ast.OptionalChain and recursively call walkExpression on
their contained/wrapped expression(s) (i.e., unwrap the optional node and
traverse its inner expression/chain) so optional chaining like obj?.foo() or
arr?.[i] is visited; this complements the existing walkStatement and ensures
URL-like strings inside optional chains are not skipped.
🧹 Nitpick comments (4)
pkg/utils/jsluice.go (4)
16-17:urlLikeStringRegexmisses protocol-relative URLs and could over-match path segments.The regex only matches
https?://...or/...paths. It will miss:
- Protocol-relative URLs like
//cdn.example.com/resource.js- Relative paths like
./api/dataor../api/data(these are common in bundled JS)Additionally, the path branch
/[a-zA-Z0-9_\-./]+allows consecutive dots and slashes (e.g.,/foo/..//bar), which may produce noisy results.Suggested regex improvement
- urlLikeStringRegex = regexp.MustCompile(`^(?:https?://[^\s'"` + "`" + `]+|/[a-zA-Z0-9_\-./]+(?:\?[^\s'"` + "`" + `]*)?)$`) + urlLikeStringRegex = regexp.MustCompile(`^(?:https?://[^\s'"` + "`" + `]+|//[^\s'"` + "`" + `]+|\.{0,2}/[a-zA-Z0-9_\-./]+(?:\?[^\s'"` + "`" + `]*)?)$`)
66-74: ES6 preprocessing regex has limitations with multi-line and complex import forms.The regex works line-by-line (
(?m)) which means multi-line imports like:import { foo, bar } from 'baz';won't be fully stripped, potentially leaving
} from 'baz';which could cause a parse error. Also,export * from 'module';andexport { default } from 'module';(re-exports) aren't matched by the export branch.This is acceptable as a best-effort preprocessor since the regex fallback catches parse failures, but worth documenting these known limitations.
313-335:walkCallExpressionwalks the callee, which may cause unintended URL emission from nested expressions in the callee chain.Line 315 unconditionally walks the callee via
walkExpression. For simple cases (fetch,$.ajax), the callee is an Identifier or DotExpression with no URL strings. However, for computed callees likegetConfig("/api/base").fetch("/api/endpoint"), the inner call's string argument"/api/base"would be emitted as a generic"string"(fromcheckAndEmitURLviawalkExpression) rather than with proper call-context typing.This is a minor accuracy concern, not a correctness bug — the URL is still extracted.
434-451: Redundant deduplication inregexFallbackExtract.
ExtractRelativeEndpoints(frompkg/utils/regex.go:50-67) already deduplicates results using its ownuniquemap. Theseenmap inregexFallbackExtractperforms the same deduplication again.This is harmless (defensive), but if you want to keep the code minimal:
Simplified version
func regexFallbackExtract(data string) []JSLuiceEndpoint { matches := ExtractRelativeEndpoints(data) - seen := make(map[string]struct{}) var endpoints []JSLuiceEndpoint - for _, match := range matches { - if _, ok := seen[match]; ok { - continue - } - seen[match] = struct{}{} endpoints = append(endpoints, JSLuiceEndpoint{ Endpoint: match, Type: "regex", }) } return endpoints }
…nsensitive XHR exclusion
Proposed changes
This PR replaces the
github.com/BishopFox/jsluicedependency (and its CGO requirementgithub.com/smacker/go-tree-sitter) with a pure-Go JavaScript endpoint extractor usinggithub.com/dop251/goja's AST parser.Impact
jsluiceparser was previously guarded by//go:build !(386 || windows), meaning Windows/386 users had reduced functionality. This PR enables full JavaScript analysis on all platforms.jsluiceandgo-tree-sitterfrom the dependency graph.Implementation Details
dop251/goja/parseranddop251/goja/ast.fetch()callsXMLHttpRequest.open()window.open()location.href/img.srcassignments$.ajax) and axios calls.goreleaser/*.ymlto removeCGO_ENABLED=1and cross-compiler requirements.Proof
I have added a comprehensive test suite in pkg/utils/jsluice_test.go covering 25+ scenarios including all supported extraction patterns and edge cases.
New Tests Passing:
Cross-Platform Build Verification:
Builds now succeed without CGO on previously problematic platforms:
Checklist
/claim #1367
Summary by CodeRabbit
New Features
Chores
Platform
Tests