You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add flexible embedding provider configuration to support both OpenAI
and custom embedding endpoints (e.g., LiteLLM, local models). This
enables users to use alternative embedding services while maintaining
backward compatibility with OpenAI.
Signed-off-by: ER Hapal <[email protected]>
@@ -84,27 +96,32 @@ Configuration is managed through two files:
84
96
2. **`config.yaml` file:**
85
97
This file defines the sources to process and how to handle them. Create a `config.yaml` file (or use a different name and pass it as an argument).
86
98
99
+
**Embedding Provider Configuration:**
100
+
101
+
Embedding providers are now configured via environment variables:
102
+
- `OPENAI_API_KEY`: API key used for both providers
103
+
- `PROVIDER`: Set to "openai" (default) or "custom"
104
+
- `EMBEDDING_MODEL`: Model to use (default: "text-embedding-3-large")
105
+
- `EMBEDDING_VECTOR_SIZE`: Vector size of the custom embedding model (default: 3072)
106
+
- `CUSTOM_ENDPOINT`: Required when using custom provider (e.g., "http://localhost:8000/v1/embeddings")
107
+
87
108
**Structure:**
88
109
89
110
*`sources`: An array of source configurations.
90
111
*`type`: Either `'website'`, `'github'`, `'local_directory'`, or `'zendesk'`
91
-
92
112
For websites (`type: 'website'`):
93
113
*`url`: The starting URL for crawling the documentation site.
94
114
*`sitemap_url`: (Optional) URL to the site's XML sitemap for discovering additional pages not linked in navigation.
95
-
96
115
For GitHub repositories (`type: 'github'`):
97
116
* `repo`: Repository name in the format `'owner/repo'` (e.g., `'istio/istio'`).
98
117
* `start_date`: (Optional) Starting date to fetch issues from (e.g., `'2025-01-01'`).
99
-
100
118
For local directories (`type: 'local_directory'`):
101
119
* `path`: Path to the local directory to process.
102
120
* `include_extensions`: (Optional) Array of file extensions to include (e.g., `['.md', '.txt', '.pdf']`). Defaults to `['.md', '.txt', '.html', '.htm', '.pdf']`.
103
121
* `exclude_extensions`: (Optional) Array of file extensions to exclude.
104
122
* `recursive`: (Optional) Whether to traverse subdirectories (defaults to `true`).
0 commit comments