Skip to content

fix(deps): replace go-tree-sitter with pure-Go goja parser#1510

Open
dalledajay-coder wants to merge 2 commits intoprojectdiscovery:devfrom
dalledajay-coder:fix/remove-cgo-dependency
Open

fix(deps): replace go-tree-sitter with pure-Go goja parser#1510
dalledajay-coder wants to merge 2 commits intoprojectdiscovery:devfrom
dalledajay-coder:fix/remove-cgo-dependency

Conversation

@dalledajay-coder
Copy link

@dalledajay-coder dalledajay-coder commented Feb 2, 2026

Proposed changes

This commit removes the CGO dependency on go-tree-sitter by replacing BishopFox's jsluice with a pure-Go implementation using dop251/goja parser.

Changes:

  • Replace jsluice dependency with dop251/goja (pure-Go JavaScript parser)
  • Rewrite ExtractJsluiceEndpoints using goja's AST walker
  • Remove platform-specific build constraints (parser_nojs.go deleted)
  • Enable jsluice functionality on all platforms (Windows, 32-bit, darwin/arm64)

Benefits:

  • No CGO required (CGO_ENABLED=0 builds work)
  • Simplified cross-platform compilation
  • Works on darwin/arm64 without cross-compilers
  • All existing tests pass

/claim #1367

Proof

Checklist

  • Pull request is created against the dev branch
  • All checks passed (lint, unit/integration/regression tests etc.) with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Summary by CodeRabbit

  • New Features

    • Extended platform support to Windows and 386.
  • Improvements

    • Enhanced JavaScript endpoint extraction with a complete parser-based approach and regex fallback for greater accuracy and reliability.
  • Removals

    • Removed some parser configuration options for JS scraping, automatic form fill, and redirect handling.
  • Tests

    • Added new tests covering the JavaScript endpoint extractor.
  • Chores

    • Updated internal module dependencies.

This commit removes the CGO dependency on go-tree-sitter by replacing
BishopFox's jsluice with a pure-Go implementation using dop251/goja parser.

Changes:
- Replace jsluice dependency with dop251/goja (pure-Go JavaScript parser)
- Rewrite ExtractJsluiceEndpoints using goja's AST walker
- Remove platform-specific build constraints (parser_nojs.go deleted)
- Enable jsluice functionality on all platforms (Windows, 32-bit, darwin/arm64)

Benefits:
- No CGO required (CGO_ENABLED=0 builds work)
- Simplified cross-platform compilation
- Works on darwin/arm64 without cross-compilers
- All existing tests pass

Fixes projectdiscovery#1367
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 2, 2026

Walkthrough

Replaces the jsluice dependency with a goja/parser-based JavaScript AST extractor (with a regex fallback), removes Windows/386-specific parser file and the Options + InitWithOptions API, and updates go.mod to add goja and sourcemap while removing several prior indirect dependencies.

Changes

Cohort / File(s) Summary
Dependency Management
go.mod
Removed github.com/BishopFox/jsluice, github.com/ditashi/jsbeautifier-go, github.com/smacker/go-tree-sitter; added github.com/dop251/goja and github.com/go-sourcemap/sourcemap (indirect).
Parser Platform Unification
pkg/engine/parser/parser_generic.go, pkg/engine/parser/parser_nojs.go
Removed file-level build constraint and deleted Windows/386-specific parser implementation; removed Options type and (*Parser).InitWithOptions, consolidating initialization surface.
JS Endpoint Extraction Refactor
pkg/utils/jsluice.go, pkg/utils/jsluice_test.go
Replaced jsluice-based extraction with a pure-Go goja AST traversal extractor plus regex fallback; added comprehensive AST handlers for strings, templates, concatenation, calls (fetch/open/etc.), constructors (WebSocket/URL/Request), deduplication, and new tests.

Sequence Diagram

sequenceDiagram
    participant Input as JavaScript Input
    participant Parser as goja Parser
    participant AST as AST Traverser
    participant Extractor as Endpoint Extractor
    participant Regex as Regex Fallback
    participant Output as Endpoints

    Input->>Parser: Parse JavaScript code
    alt Parse Success
        Parser->>AST: Provide AST nodes
        AST->>Extractor: Traverse nodes (strings, templates, calls, constructors)
        Extractor->>Extractor: Extract, synthesize, deduplicate endpoints
        Extractor->>Output: Return endpoints
    else Parse Failure
        Parser->>Regex: Fallback regex scan
        Regex->>Output: Return regex-extracted endpoints
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I swapped the jsluice for goja's keen sight,
I hop through AST branches, day and night.
When parsing trips, regex saves the race,
Cross-platform strides with a lighter pace,
Endpoints gathered — a joyful, nimble chase!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: replacing a C-dependent go-tree-sitter dependency with a pure-Go goja parser for JavaScript extraction.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pkg/utils/jsluice.go`:
- Around line 679-697: The regexes in endpointExtractor.extractWithRegex are
compiled on every call; move their compilation to package initialization by
creating a package-level variable (e.g., fallbackURLPatterns []*regexp.Regexp)
containing the four regexp.MustCompile(...) entries, and then update
extractWithRegex to iterate over fallbackURLPatterns instead of recompiling
patterns; ensure the new var name (fallbackURLPatterns) is used in
extractWithRegex and that the package imports regexp remains intact.
🧹 Nitpick comments (1)
pkg/utils/jsluice_test.go (1)

103-119: Consider verifying endpoint count to catch false positives.

The current test only verifies that expected URLs are present but doesn't check for unexpected URLs being extracted. This could mask regressions where the extractor starts extracting spurious endpoints.

♻️ Suggested improvement
 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
 			endpoints := ExtractJsluiceEndpoints(tt.input)

 			// Create a map of found URLs for easier checking
 			foundURLs := make(map[string]bool)
 			for _, ep := range endpoints {
 				foundURLs[ep.Endpoint] = true
 			}

 			// Check that all expected URLs are found
 			for _, wantURL := range tt.wantURLs {
 				if !foundURLs[wantURL] {
 					t.Errorf("ExtractJsluiceEndpoints() missing expected URL %q, got %v", wantURL, endpoints)
 				}
 			}
+
+			// Check that no unexpected URLs were extracted
+			if len(endpoints) != len(tt.wantURLs) {
+				t.Errorf("ExtractJsluiceEndpoints() returned %d endpoints, want %d", len(endpoints), len(tt.wantURLs))
+			}
 		})
 	}

- Move fallback regex patterns to package-level initialization
- Add endpoint count verification in tests to catch false positives
- Fix window.open extraction to not capture HTTP methods from xhr.open

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@dogancanbakir
Copy link
Member

@dalledajay-coder Please make sure you to add necessary details, this is the PR template:

Proposed changes

Proof

Checklist

  • Pull request is created against the dev branch
  • All checks passed (lint, unit/integration/regression tests etc.) with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

I've updated the description.

@dogancanbakir
Copy link
Member

@dalledajay-coder Reminder for updating the PR description according to the PR Template.

Copy link
Member

@dogancanbakir dogancanbakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • merge conflict
  • UPDATE docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants