Skip to content

Fix cgo dependency 1367#1564

Open
Wesley7711 wants to merge 5 commits intoprojectdiscovery:devfrom
Wesley7711:fix-cgo-dependency-1367
Open

Fix cgo dependency 1367#1564
Wesley7711 wants to merge 5 commits intoprojectdiscovery:devfrom
Wesley7711:fix-cgo-dependency-1367

Conversation

@Wesley7711
Copy link

@Wesley7711 Wesley7711 commented Mar 4, 2026

Changes

Files Added

  • pkg/utils/jsluice_stub.go - Pure Go fallback implementation
  • pkg/utils/jsluice_stub_test.go - Tests for fallback implementation

Files Modified

  • pkg/utils/jsluice.go - Updated build tag to make jsluice optional

Note

go.mod will be automatically cleaned up by go mod tidy during CI, which will remove unused indirect dependencies.

Summary by CodeRabbit

  • New Features

    • JavaScript endpoint extraction with URL and API-like detection, deduplication, and classification.
    • Primary extraction path now requires an optional component to be enabled; a portable fallback implementation is included for broader compatibility.
  • Tests

    • Added comprehensive tests covering endpoint extraction and validation behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

Walkthrough

Broadened the build constraint on the CGO jsluice file to require the jsluice tag; added a pure-Go regex-based fallback implementation and unit tests for extracting and validating JavaScript endpoints when the tag or CGO is absent (still excluding 386 and windows).

Changes

Cohort / File(s) Summary
JSluice Build Configuration
pkg/utils/jsluice.go
Changed build constraint from `//go:build !(386
JSluice Fallback Implementation & Tests
pkg/utils/jsluice_stub.go, pkg/utils/jsluice_stub_test.go
Added pure-Go extractor using urlPattern and apiPattern regexes, new JSLuiceEndpoint type, ExtractJsluiceEndpoints(data string) []JSLuiceEndpoint, isValidEndpoint helper with deduplication and validation, and table-driven tests covering extraction and validity.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hop through scripts with regex nose,
Sniffing endpoints where the JavaScript goes.
If jsluice naps behind a build-tag door,
My Go-made hops still fetch each API shore,
I trim, dedupe, and leave prints on the floor.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix cgo dependency 1367' directly addresses the main objective: making the jsluice CGO dependency optional by adding a pure Go fallback implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/utils/jsluice_stub_test.go (1)

93-103: Expand isValidEndpoint tests for scheme boundary/case variants.

Please add cases for exact data: and mixed-case JavaScript: so the filter behavior is locked down for edge inputs.

Suggested test additions
 		{
 			name:  "data URI",
 			input: "data:image/png;base64,iVBORw0KGgo...",
 			want:  false,
 		},
+		{
+			name:  "exact data scheme",
+			input: "data:",
+			want:  false,
+		},
 		{
 			name:  "javascript URI",
 			input: "javascript:void(0)",
 			want:  false,
 		},
+		{
+			name:  "mixed-case javascript URI",
+			input: "JavaScript:void(0)",
+			want:  false,
+		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/utils/jsluice_stub_test.go` around lines 93 - 103, Add two test cases to
the isValidEndpoint table-driven tests in jsluice_stub_test.go: one where input
is exactly "data:" (to verify a bare data scheme is rejected) and another with
mixed-case scheme "JavaScript:alert(1)" (or similar) to verify scheme matching
is case-insensitive/correctly normalized; update the test table (the slice of
test structs near the existing "data URI" and "javascript URI" entries) with
entries named e.g. "data scheme only" and "mixed-case JavaScript scheme" and set
want to false so isValidEndpoint is exercised for these boundary/case variants.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/utils/jsluice_stub.go`:
- Around line 72-76: In isValidEndpoint, the current prefix logic is fragile
(checks len>5 and "javas") and misses exact or mixed-case schemes like "data:"
and "JavaScript:". Change it to normalize the input (strings.ToLower(s)) and use
explicit full-scheme checks (e.g., strings.HasPrefix(lower, "data:") ||
strings.HasPrefix(lower, "javascript:")) and remove the brittle length-based
branch so these schemes reliably return false; update references in
jsluice_stub.go's isValidEndpoint accordingly.

---

Nitpick comments:
In `@pkg/utils/jsluice_stub_test.go`:
- Around line 93-103: Add two test cases to the isValidEndpoint table-driven
tests in jsluice_stub_test.go: one where input is exactly "data:" (to verify a
bare data scheme is rejected) and another with mixed-case scheme
"JavaScript:alert(1)" (or similar) to verify scheme matching is
case-insensitive/correctly normalized; update the test table (the slice of test
structs near the existing "data URI" and "javascript URI" entries) with entries
named e.g. "data scheme only" and "mixed-case JavaScript scheme" and set want to
false so isValidEndpoint is exercised for these boundary/case variants.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a3f4bf17-a77b-449e-a275-682a761e951e

📥 Commits

Reviewing files that changed from the base of the PR and between 3689151 and ddad4c8.

📒 Files selected for processing (3)
  • pkg/utils/jsluice.go
  • pkg/utils/jsluice_stub.go
  • pkg/utils/jsluice_stub_test.go

@neo-by-projectdiscovery-dev
Copy link

neo-by-projectdiscovery-dev bot commented Mar 4, 2026

🔧 Hit a snag — please try again.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/utils/jsluice_stub.go (1)

5-7: ⚠️ Potential issue | 🔴 Critical

Missing strings import causes a build break.

Lines 72 and 75 use strings.ToLower and strings.HasPrefix, but strings is not imported. The import block contains only "regexp", so this file fails to compile with undefined: strings.

Proposed fix
 import (
 	"regexp"
+	"strings"
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/utils/jsluice_stub.go` around lines 5 - 7, The build fails due to missing
strings import in jsluice_stub.go: add "strings" to the existing import block
(alongside "regexp") so calls to strings.ToLower and strings.HasPrefix compile;
locate the functions in this file that call strings (the references around where
strings.ToLower and strings.HasPrefix are used) and update the import list
accordingly, then run go build to verify the fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/utils/jsluice_stub.go`:
- Around line 5-7: The build fails due to missing strings import in
jsluice_stub.go: add "strings" to the existing import block (alongside "regexp")
so calls to strings.ToLower and strings.HasPrefix compile; locate the functions
in this file that call strings (the references around where strings.ToLower and
strings.HasPrefix are used) and update the import list accordingly, then run go
build to verify the fix.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 485f88af-e125-47be-8f1b-a5fa3e1b817d

📥 Commits

Reviewing files that changed from the base of the PR and between ddad4c8 and 3aa8efc.

📒 Files selected for processing (1)
  • pkg/utils/jsluice_stub.go

@Wesley7711
Copy link
Author

@ Thanks for catching that! I've added the missing strings import to fix the compilation error.

Neo - PR Security Review

No security issues found

Highlights

  • Added case-insensitive validation for data: and javascript: URI schemes in isValidEndpoint
  • Improved endpoint filtering to normalize URLs before validation

Hardening Notes

  • The strings package import is missing from jsluice_stub.go:5-7, causing compilation failure for strings.ToLower (line 72) and strings.HasPrefix (line 75). While not a security vulnerability, this prevents the code from building.
  • Consider validating extracted URLs against an allowlist of schemes (http/https only) in isValidEndpoint to prevent potential SSRF if katana processes malicious JavaScript containing protocol-relative URLs like '//127.0.0.1/api' or '//169.254.169.254/metadata'
  • Add explicit validation to reject URLs with IPv4 address literals in private/link-local ranges (127.x, 169.254.x, 10.x, 192.168.x) if appropriate for katana's threat model

Comment @neo help for available commands. · Open in Neo

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/utils/jsluice_stub.go (1)

10-16: Consider enforcing matched quote pairs for stricter parsing.

The current patterns allow mismatched quotes (e.g., "url'). For a fallback implementation this is likely acceptable, but stricter parsing would reduce false positives.

♻️ Alternative pattern with matched quotes
 var (
 	// URL patterns for extracting endpoints from JavaScript
-	urlPattern = regexp.MustCompile(`(?i)(?:"|'|` + "`" + `)((?:https?:)?//[^\s"'` + "`" + `<>]+|/[^\s"'` + "`" + `<>]+)(?:"|'|` + "`" + `)`)
+	urlPatternDQ = regexp.MustCompile(`(?i)"((?:https?:)?//[^\s"<>]+|/[^\s"<>]+)"`)
+	urlPatternSQ = regexp.MustCompile(`(?i)'((?:https?:)?//[^\s'<>]+|/[^\s'<>]+)'`)
+	urlPatternBT = regexp.MustCompile("(?i)`((?:https?:)?//[^\\s`<>]+|/[^\\s`<>]+)`")

Then union the match results in ExtractJsluiceEndpoints.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/utils/jsluice_stub.go` around lines 10 - 16, Update the regexes to
enforce matching quote pairs by capturing the opening quote and using a
backreference instead of allowing any closing quote: modify urlPattern and
apiPattern to capture the quote char (e.g., (?P<q>["'`])) and reference it at
the end so mismatched quotes like "url' are not accepted, then in
ExtractJsluiceEndpoints ensure you union the results from both patterns
(urlPattern matches and apiPattern matches) into the final endpoint set to avoid
duplicates and preserve stricter parsing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/utils/jsluice_stub.go`:
- Around line 28-63: The ExtractJsluiceEndpoints stub currently returns generic
types "url" and "api" which differ from the real jsluice CGO semantics (e.g.,
"linkHref", "fetch", "xhr") and breaks downstream formatting that builds
attributes with fmt.Sprintf("jsluice-%s", item.Type); update
ExtractJsluiceEndpoints (and JSLuiceEndpoint.Type values it creates) to map URL
and API regex matches to the concrete jsluice type strings used by the real
library (e.g., map link href matches to "linkHref", fetch calls to "fetch", XHR
to "xhr", etc.), or if you intentionally want a fallback behavior, add a clear
comment in the function declaring that it returns coarse-grained "url"/"api"
types and list the expected CGO-specific type names so callers know the
difference; ensure references in parser_generic.go that format "jsluice-%s" will
receive the expected type strings.

---

Nitpick comments:
In `@pkg/utils/jsluice_stub.go`:
- Around line 10-16: Update the regexes to enforce matching quote pairs by
capturing the opening quote and using a backreference instead of allowing any
closing quote: modify urlPattern and apiPattern to capture the quote char (e.g.,
(?P<q>["'`])) and reference it at the end so mismatched quotes like "url' are
not accepted, then in ExtractJsluiceEndpoints ensure you union the results from
both patterns (urlPattern matches and apiPattern matches) into the final
endpoint set to avoid duplicates and preserve stricter parsing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ff1243d3-a0a7-4d18-a369-4b57bc9dfea5

📥 Commits

Reviewing files that changed from the base of the PR and between 3aa8efc and e64df2f.

📒 Files selected for processing (1)
  • pkg/utils/jsluice_stub.go

Comment on lines +28 to +63
func ExtractJsluiceEndpoints(data string) []JSLuiceEndpoint {
var endpoints []JSLuiceEndpoint
seen := make(map[string]bool)

// Extract URLs using URL pattern
urlMatches := urlPattern.FindAllStringSubmatch(data, -1)
for _, match := range urlMatches {
if len(match) > 1 {
url := match[1]
if !seen[url] && isValidEndpoint(url) {
seen[url] = true
endpoints = append(endpoints, JSLuiceEndpoint{
Endpoint: url,
Type: "url",
})
}
}
}

// Extract API endpoints
apiMatches := apiPattern.FindAllStringSubmatch(data, -1)
for _, match := range apiMatches {
if len(match) > 1 {
endpoint := match[1]
if !seen[endpoint] && isValidEndpoint(endpoint) {
seen[endpoint] = true
endpoints = append(endpoints, JSLuiceEndpoint{
Endpoint: endpoint,
Type: "api",
})
}
}
}

return endpoints
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Type classification differs from CGO version semantics.

The stub returns only "url" or "api", while the jsluice library returns specific types like "linkHref", "fetch", "xhr", etc. Downstream code constructs attributes via fmt.Sprintf("jsluice-%s", item.Type) (see parser_generic.go), so endpoints will be labeled differently:

  • CGO: jsluice-linkHref, jsluice-fetch, etc.
  • Stub: jsluice-url, jsluice-api

This may affect filtering, output formatting, or any logic that depends on specific type values. If this is intentional fallback behavior, consider documenting it in the function's comments.

📝 Suggested documentation addition
 // ExtractJsluiceEndpoints extracts endpoints from JavaScript using pure Go regex.
 // This is a fallback implementation when jsluice (which requires CGO) is not available.
 //
 // Note: This implementation uses regex patterns and may not be as accurate as jsluice,
 // but it eliminates the CGO dependency for cross-platform compilation.
+//
+// Type classification: This stub returns generic types ("url" or "api") rather than
+// the specific types returned by jsluice (e.g., "linkHref", "fetch", "xhr").
 func ExtractJsluiceEndpoints(data string) []JSLuiceEndpoint {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/utils/jsluice_stub.go` around lines 28 - 63, The ExtractJsluiceEndpoints
stub currently returns generic types "url" and "api" which differ from the real
jsluice CGO semantics (e.g., "linkHref", "fetch", "xhr") and breaks downstream
formatting that builds attributes with fmt.Sprintf("jsluice-%s", item.Type);
update ExtractJsluiceEndpoints (and JSLuiceEndpoint.Type values it creates) to
map URL and API regex matches to the concrete jsluice type strings used by the
real library (e.g., map link href matches to "linkHref", fetch calls to "fetch",
XHR to "xhr", etc.), or if you intentionally want a fallback behavior, add a
clear comment in the function declaring that it returns coarse-grained
"url"/"api" types and list the expected CGO-specific type names so callers know
the difference; ensure references in parser_generic.go that format "jsluice-%s"
will receive the expected type strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant