Skip to content

Conversation

@jmagar
Copy link

@jmagar jmagar commented Nov 4, 2025

Description

This PR adds support for configuring a custom Firecrawl API base URL via the FIRECRAWL_API_BASE_URL environment variable.

Motivation

Currently, the Firecrawl API URL is hardcoded to https://api.firecrawl.dev/v1/scrape. This prevents users from:

  • Using self-hosted Firecrawl instances
  • Routing requests through custom proxy endpoints
  • Testing against development/staging environments

Changes

  1. Modified firecrawl-scrape.ts: Updated the scrapeWithFirecrawl function to read the base URL from process.env.FIRECRAWL_API_BASE_URL, defaulting to https://api.firecrawl.dev when not set.

  2. Updated .env.example: Added documentation for the new FIRECRAWL_API_BASE_URL environment variable.

Usage

Users can now set a custom Firecrawl API base URL:

# .env
FIRECRAWL_API_BASE_URL=https://my-custom-firecrawl.example.com

If not set, the default (https://api.firecrawl.dev) is used, maintaining backward compatibility.

Testing

  • ✅ Backward compatible (defaults to original URL)
  • ✅ Allows custom URL configuration
  • ✅ No breaking changes to existing functionality

Allow users to configure a custom Firecrawl API base URL via the
FIRECRAWL_API_BASE_URL environment variable. This is useful for:
- Self-hosted Firecrawl instances
- Custom proxy endpoints
- Development/testing environments

Defaults to https://api.firecrawl.dev when not set.
Document the new FIRECRAWL_API_BASE_URL environment variable in the
example configuration file.
Copilot AI review requested due to automatic review settings November 4, 2025 23:11
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for configurable Firecrawl API base URLs, enabling the use of self-hosted Firecrawl instances instead of hardcoding the production API endpoint.

  • Introduced FIRECRAWL_API_BASE_URL environment variable with fallback to the default production URL
  • Updated .env.example with documentation for the new configuration option

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
productionized/pulse-fetch/shared/src/scraping-client/lib/firecrawl-scrape.ts Replaced hardcoded Firecrawl API URL with configurable base URL from environment variable
productionized/pulse-fetch/.env.example Added documentation and example configuration for the new FIRECRAWL_API_BASE_URL environment variable

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}> {
try {
const response = await fetch('https://api.firecrawl.dev/v1/scrape', {
const baseUrl = process.env.FIRECRAWL_API_BASE_URL || 'https://api.firecrawl.dev';
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base URL should be validated to prevent injection attacks. Consider validating that it starts with 'http://' or 'https://' and doesn't contain path traversal characters. For example: if (baseUrl && !/^https?:\/\/.+$/.test(baseUrl)) { throw new Error('Invalid FIRECRAWL_API_BASE_URL'); }

Suggested change
const baseUrl = process.env.FIRECRAWL_API_BASE_URL || 'https://api.firecrawl.dev';
const baseUrl = process.env.FIRECRAWL_API_BASE_URL || 'https://api.firecrawl.dev';
// Validate baseUrl to prevent injection attacks
if (
baseUrl &&
(!/^https?:\/\/[^\\]+$/.test(baseUrl) || baseUrl.includes('..'))
) {
throw new Error('Invalid FIRECRAWL_API_BASE_URL');
}

Copilot uses AI. Check for mistakes.
}> {
try {
const response = await fetch('https://api.firecrawl.dev/v1/scrape', {
const baseUrl = process.env.FIRECRAWL_API_BASE_URL || 'https://api.firecrawl.dev';
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base URL is resolved on every function call. Consider moving this to a module-level constant to avoid repeated environment variable lookups and improve performance.

Copilot uses AI. Check for mistakes.
- Add validation to prevent URL injection attacks
- Move baseUrl resolution to module-level constant for performance
- Validate that URL starts with http:// or https://
- Prevent path traversal attacks

Co-authored-by: GitHub Copilot
Copy link
Contributor

@tadasant tadasant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

If you want this in a release feel free to prep a patch version bump in accordance to how it's been done in the past.

Edit: Sorry if you could get CI green, I will merge :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants