feat: switch to pure-go javascript endpoint extraction using goja by t6harsh · Pull Request #1529 · projectdiscovery/katana

t6harsh · 2026-02-11T09:07:27Z

Proposed changes

This PR replaces the github.com/BishopFox/jsluice dependency (and its CGO requirement github.com/smacker/go-tree-sitter) with a pure-Go JavaScript endpoint extractor using github.com/dop251/goja's AST parser.

Impact

Eliminates CGO: Removes all CGO requirements, significantly simplifying cross-platform builds (especially for Windows and Linux ARM/386).
Cross-Platform Consistency: The jsluice parser was previously guarded by //go:build !(386 || windows), meaning Windows/386 users had reduced functionality. This PR enables full JavaScript analysis on all platforms.
Dependency Cleanup: Removes jsluice and go-tree-sitter from the dependency graph.

Implementation Details

Rewrote pkg/utils/jsluice.go to use dop251/goja/parser and dop251/goja/ast.
Implemented a robust AST walker that detects URLs in:
- fetch() calls
- XMLHttpRequest.open()
- window.open()
- location.href / img.src assignments
- Object literals, arrays, and template literals
- jQuery ($.ajax) and axios calls
Added a regex fallback for malformed JavaScript (graceful degradation).
Updated .goreleaser/*.yml to remove CGO_ENABLED=1 and cross-compiler requirements.

Proof

I have added a comprehensive test suite in pkg/utils/jsluice_test.go covering 25+ scenarios including all supported extraction patterns and edge cases.

New Tests Passing:

=== RUN   TestExtractJsluiceEndpoints
--- PASS: TestExtractJsluiceEndpoints (0.01s)
    --- PASS: TestExtractJsluiceEndpoints/fetch_call (0.00s)
    --- PASS: TestExtractJsluiceEndpoints/XMLHttpRequest_open (0.00s)
    --- PASS: TestExtractJsluiceEndpoints/window.open (0.00s)
    --- PASS: TestExtractJsluiceEndpoints/location.href_assignment (0.00s)
    --- PASS: TestExtractJsluiceEndpoints/malformed_JS_falls_back_to_regex (0.00s)
    ...

Cross-Platform Build Verification:
Builds now succeed without CGO on previously problematic platforms:

CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build ./cmd/katana/
CGO_ENABLED=0 GOOS=darwin GOARCH=arm64 go build ./cmd/katana/
CGO_ENABLED=0 GOOS=windows GOARCH=386 go build ./cmd/katana/

Checklist

Pull request is created against the dev branch
All checks passed (lint, unit/integration/regression tests etc.) with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

/claim #1367

Summary by CodeRabbit

New Features
- Improved JavaScript endpoint extraction — more accurate detection across diverse JS patterns with robust fallback handling.
Chores
- Updated project dependencies for maintainability.
Platform
- Parser compatibility extended to all OS/architecture combinations.
Tests
- Added comprehensive tests validating extraction accuracy and preprocessing behavior.

coderabbitai · 2026-02-11T09:07:49Z

Walkthrough

Removed platform-specific parser file and build constraints; replaced external jsluice analyzer with a pure-Go AST-based JavaScript endpoint extractor using goja and added tests; updated go.mod to add goja and sourcemap and remove several jsluice-related dependencies.

Changes

Cohort / File(s)	Summary
Dependency Management `go.mod`	Removed `BishopFox/jsluice`, `smacker/go-tree-sitter`, `ditashi/jsbeautifier-go`; added `dop251/goja` and `go-sourcemap/sourcemap` as indirect deps.
Build Constraint Removal `pkg/engine/parser/parser_generic.go`, `pkg/utils/jsluice_test.go`	Removed `//go:build !(386
Platform-Specific Deletion `pkg/engine/parser/parser_nojs.go`	Deleted Windows/386-conditional file that defined public `Options` type and `(*Parser).InitWithOptions`; related conditional parser registration logic removed.
AST-Based JavaScript Extraction `pkg/utils/jsluice.go`	Replaced external jsluice analyzer with a pure-Go AST-based extractor using `goja`. Added `JSLuiceEndpoint` type, ES6 preprocessing, AST-walking and classification functions, URL-detection heuristics, and a regex fallback when parsing fails.
Tests — Extraction & Helpers `pkg/utils/jsluice_test.go`	Added extensive tests for endpoint extraction, URL-like detection, ES6 preprocessing, deduplication, and regex fallback; asserts endpoint values and inferred types across many JS patterns.

Sequence Diagram

sequenceDiagram
    participant Code as "JavaScript Code"
    participant Preprocess as "Preprocess ES6"
    participant Engine as "Goja Engine"
    participant AST as "AST Walker"
    participant Classifier as "Type Classifier"
    participant Fallback as "Regex Fallback"
    participant Result as "JSLuiceEndpoint"

    Code->>Preprocess: Raw JS (may include imports/exports)
    Preprocess->>Engine: Cleaned JS source
    Engine->>AST: Parse into AST and traverse
    alt AST parse success
        AST->>Classifier: Emit URL-like strings with context
        Classifier->>Result: Endpoint + Type (fetch/xhr/axios/import/...)
    else AST parse failure
        Engine->>Fallback: Provide raw JS for regex matching
        Fallback->>Result: Endpoint + Type "regex"
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 Hopping through code with a twitchy nose,
I trimmed old walls where platform-flagged code froze.
I parsed wild JS with a nimble paw,
Found endpoints aplenty—oh what a haul!
A tiny rabbit cheers: new paths, new prose. 🥕🐇

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 79.17% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely describes the main change: switching from jsluice to a pure-Go goja-based JavaScript endpoint extraction, which is the primary objective across all modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments

pkg/utils/jsluice.go (2)
66-73: ES6 preprocessing regex won't match multiline imports or re-exports like export * from 'mod'.

This is acceptable since failed preprocessing → failed parse → regex fallback. Just noting that patterns like multiline destructured imports or export * from '...' / export { x } from '...' will cause fallback to regex extraction.

193-197: Method parameter defaults not walked in walkClass.

MethodDefinition.Body is a *FunctionLiteral — its ParameterList defaults aren't walked here, unlike FunctionLiteral and ArrowFunctionLiteral in walkExpression (lines 251-254, 258-261). Low likelihood of URLs in parameter defaults, but worth keeping consistent.
Proposed fix
 		case *ast.MethodDefinition:
 			walkExpression(ce.Key, emit)
 			if ce.Body != nil {
 				walkStatement(ce.Body.Body, emit)
+				if ce.Body.ParameterList != nil {
+					for _, p := range ce.Body.ParameterList.List {
+						walkExpression(p.Initializer, emit)
+					}
+				}
 			}

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@pkg/utils/jsluice.go`:
- Around line 396-400: The isXHROpen function mixes a case-insensitive suffix
check (using lower) with a case-sensitive exclusion (funcName != "window.open"),
causing names like "Window.open" to slip through; fix by comparing the exclusion
against the lowercased name (e.g., use lower != "window.open") so both the
suffix check and the "window.open" exclusion are performed case-insensitively
while keeping the existing logic in isXHROpen.
- Around line 377-394: classifyCallType currently treats any function name
containing "open" as "xhr", which misclassifies "window.open"; update
classifyCallType to check for "window.open" (e.g., strings.Contains(lower,
"window.open") or exact match) before the generic strings.Contains(lower,
"open") case and return a distinct type like "window_open" (so downstream
fmt.Sprintf("jsluice-%s", item.Type) labels it correctly); modify the switch in
classifyCallType (and keep isXHROpen logic unchanged) so "window.open" is
handled first and does not fall through to the "xhr" branch.
- Around line 84-161: The expression walker is missing handling for optional
chaining wrappers; update the walkExpression function to add cases for
*ast.Optional and *ast.OptionalChain and recursively call walkExpression on
their contained/wrapped expression(s) (i.e., unwrap the optional node and
traverse its inner expression/chain) so optional chaining like obj?.foo() or
arr?.[i] is visited; this complements the existing walkStatement and ensures
URL-like strings inside optional chains are not skipped.

🧹 Nitpick comments (4)

pkg/utils/jsluice.go (4)
16-17: urlLikeStringRegex misses protocol-relative URLs and could over-match path segments.

The regex only matches https?://... or /... paths. It will miss:

Protocol-relative URLs like //cdn.example.com/resource.js

Relative paths like ./api/data or ../api/data (these are common in bundled JS)

Additionally, the path branch /[a-zA-Z0-9_\-./]+ allows consecutive dots and slashes (e.g., /foo/..//bar), which may produce noisy results.
Suggested regex improvement
-	urlLikeStringRegex = regexp.MustCompile(`^(?:https?://[^\s'"` + "`" + `]+|/[a-zA-Z0-9_\-./]+(?:\?[^\s'"` + "`" + `]*)?)$`)
+	urlLikeStringRegex = regexp.MustCompile(`^(?:https?://[^\s'"` + "`" + `]+|//[^\s'"` + "`" + `]+|\.{0,2}/[a-zA-Z0-9_\-./]+(?:\?[^\s'"` + "`" + `]*)?)$`)
66-74: ES6 preprocessing regex has limitations with multi-line and complex import forms.

The regex works line-by-line ((?m)) which means multi-line imports like:
import {
  foo,
  bar
} from 'baz';
won't be fully stripped, potentially leaving } from 'baz'; which could cause a parse error. Also, export * from 'module'; and export { default } from 'module'; (re-exports) aren't matched by the export branch.

This is acceptable as a best-effort preprocessor since the regex fallback catches parse failures, but worth documenting these known limitations.

313-335: walkCallExpression walks the callee, which may cause unintended URL emission from nested expressions in the callee chain.

Line 315 unconditionally walks the callee via walkExpression. For simple cases (fetch, $.ajax), the callee is an Identifier or DotExpression with no URL strings. However, for computed callees like getConfig("/api/base").fetch("/api/endpoint"), the inner call's string argument "/api/base" would be emitted as a generic "string" (from checkAndEmitURL via walkExpression) rather than with proper call-context typing.

This is a minor accuracy concern, not a correctness bug — the URL is still extracted.

434-451: Redundant deduplication in regexFallbackExtract.

ExtractRelativeEndpoints (from pkg/utils/regex.go:50-67) already deduplicates results using its own unique map. The seen map in regexFallbackExtract performs the same deduplication again.

This is harmless (defensive), but if you want to keep the code minimal:
Simplified version
 func regexFallbackExtract(data string) []JSLuiceEndpoint {
 	matches := ExtractRelativeEndpoints(data)
-	seen := make(map[string]struct{})
 	var endpoints []JSLuiceEndpoint
-
 	for _, match := range matches {
-		if _, ok := seen[match]; ok {
-			continue
-		}
-		seen[match] = struct{}{}
 		endpoints = append(endpoints, JSLuiceEndpoint{
 			Endpoint: match,
 			Type:     "regex",
 		})
 	}
 	return endpoints
 }

pkg/utils/jsluice.go

…nsensitive XHR exclusion

feat: switch to pure-go javascript endpoint extraction using goja

81a1606

algora-pbc bot added the 🙋 Bounty claim label Feb 11, 2026

algora-pbc bot mentioned this pull request Feb 11, 2026

Feature / Question: go-tree-sitter dependency #1367

Open

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

pkg/utils/jsluice.go Show resolved Hide resolved

pkg/utils/jsluice.go Show resolved Hide resolved

pkg/utils/jsluice.go Show resolved Hide resolved

fix: handle optional chaining, window.open classification, and case-i…

2176301

…nsensitive XHR exclusion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: switch to pure-go javascript endpoint extraction using goja#1529

feat: switch to pure-go javascript endpoint extraction using goja#1529
t6harsh wants to merge 2 commits intoprojectdiscovery:devfrom
t6harsh:feat/pure-go-js-extractor

t6harsh commented Feb 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

t6harsh commented Feb 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Impact

Implementation Details

Proof

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

t6harsh commented Feb 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 11, 2026 •

edited

Loading