Add HTTP proxy support for docs crawler and indexer#308
Open
adeelehsan wants to merge 1 commit intomainfrom
Open
Add HTTP proxy support for docs crawler and indexer#308adeelehsan wants to merge 1 commit intomainfrom
adeelehsan wants to merge 1 commit intomainfrom
Conversation
Add configure_session_for_proxy() function to support environments that require a proxy to access external URLs (e.g., Kubernetes pods in corporate networks). Proxy settings can be configured via: - Environment variables: HTTP_PROXY, HTTPS_PROXY, NO_PROXY - Config file: http_proxy, https_proxy, no_proxy in crawler config Config file settings take precedence over environment variables. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
8fc46db to
282f336
Compare
ofermend
reviewed
Dec 19, 2025
| 2. **Configuration File** - Add proxy settings directly in your crawler config: | ||
|
|
||
| ```yaml | ||
| # For docs_crawler |
Collaborator
There was a problem hiding this comment.
Don't we want to document this in the crawlers.MD under docs_crawler?
ofermend
approved these changes
Dec 19, 2025
Collaborator
ofermend
left a comment
There was a problem hiding this comment.
LGTM. didn't test locally
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
configure_session_for_proxy()utility function for configuring requests sessions with proxy settingsContext
Users running vectara-ingest in Kubernetes environments were encountering DNS resolution errors when trying to crawl external URLs:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='external-host', port=443):
Max retries exceeded... Failed to establish a new connection: [Errno -2] Name or service not known