Skip to content

Add HTTP proxy support for docs crawler and indexer#308

Open
adeelehsan wants to merge 1 commit intomainfrom
add_proxy
Open

Add HTTP proxy support for docs crawler and indexer#308
adeelehsan wants to merge 1 commit intomainfrom
add_proxy

Conversation

@adeelehsan
Copy link
Contributor

@adeelehsan adeelehsan commented Dec 19, 2025

Summary

  • Add HTTP/HTTPS proxy support for docs_crawler and indexer to resolve connection issues in restricted network environments
  • Add configure_session_for_proxy() utility function for configuring requests sessions with proxy settings
  • Update documentation with proxy configuration examples

Context

Users running vectara-ingest in Kubernetes environments were encountering DNS resolution errors when trying to crawl external URLs:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='external-host', port=443):
Max retries exceeded... Failed to establish a new connection: [Errno -2] Name or service not known

Add configure_session_for_proxy() function to support environments
that require a proxy to access external URLs (e.g., Kubernetes pods
in corporate networks).

Proxy settings can be configured via:
- Environment variables: HTTP_PROXY, HTTPS_PROXY, NO_PROXY
- Config file: http_proxy, https_proxy, no_proxy in crawler config

Config file settings take precedence over environment variables.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
2. **Configuration File** - Add proxy settings directly in your crawler config:

```yaml
# For docs_crawler
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to document this in the crawlers.MD under docs_crawler?

Copy link
Collaborator

@ofermend ofermend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. didn't test locally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants