Skip to content

Commit e4cea91

Browse files
authored
Merge pull request #107 from thushan/fix/proxy-path
fix: Proxy Path issue & sensible defaults
2 parents 1256949 + 85729c4 commit e4cea91

19 files changed

Lines changed: 1132 additions & 54 deletions

File tree

docs/content/concepts/health-checking.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,17 @@ keywords: ["health checking", "endpoint monitoring", "circuit breaker", "olla he
1010
> ```yaml
1111
> endpoints:
1212
> - url: "http://localhost:11434"
13-
> check_interval: 30s
14-
> check_timeout: 5s
13+
> check_interval: 5s
14+
> check_timeout: 2s
1515
> ```
1616
> **Supported Settings**:
1717
>
18-
> - `check_interval` _(default: 30s)_ - Time between health checks
19-
> - `check_timeout` _(default: 5s)_ - Maximum time to wait for response
18+
> - `check_interval` _(default: 5s)_ - Time between health checks
19+
> - `check_timeout` _(default: 2s)_ - Maximum time to wait for response
2020
> - `check_path` _(auto-detected)_ - Health check endpoint path
2121
>
22+
> **Note**: Both `check_interval` and `check_timeout` are optional with sensible defaults (5s and 2s respectively), so you don't need to specify them for basic setups.
23+
>
2224
> **Environment Variables**: Per-endpoint settings not supported via env vars
2325
2426
Olla continuously monitors the health of all configured endpoints to ensure requests are only routed to available backends. The health checking system is automatic and requires minimal configuration.

docs/content/concepts/model-unification.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,13 @@ keywords: model unification, model catalogue, ollama models, lm studio models, m
1212
> model_discovery:
1313
> enabled: true
1414
> interval: 5m
15-
> concurrent_workers: 3
15+
> concurrent_workers: 5
1616
> ```
1717
> **Supported Settings**:
1818
>
1919
> - `enabled` _(default: true)_ - Enable automatic model discovery
2020
> - `interval` _(default: 5m)_ - How often to refresh model lists
21-
> - `concurrent_workers` _(default: 3)_ - Parallel discovery workers
21+
> - `concurrent_workers` _(default: 5)_ - Parallel discovery workers
2222
>
2323
> **Environment Variables**:
2424
> - `OLLA_DISCOVERY_MODEL_DISCOVERY_ENABLED`

docs/content/configuration/overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -232,14 +232,14 @@ discovery:
232232

233233
### Endpoint Configuration
234234

235-
Each endpoint requires:
235+
Each endpoint requires `url`, `name`, and `type`. The `priority` field is optional:
236236

237237
| Field | Description | Example |
238238
|-------|-------------|---------|
239239
| **url** | Base URL of the endpoint | `http://localhost:11434` |
240240
| **name** | Unique identifier | `local-ollama` |
241241
| **type** | Platform type | `llamacpp`, `vllm`, `openai` (See [integrations](../integrations/overview.md#backend-endpoints)) |
242-
| **priority** | Selection priority (higher = preferred) | `100` |
242+
| **priority** | Selection priority (higher = preferred, default: `100`) | `100` |
243243

244244
Current list of supported types can be found in [integrations](../integrations/overview.md#backend-endpoints).
245245

docs/content/configuration/practices/performance.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -128,18 +128,18 @@ proxy:
128128
129129
### Health Check Optimisation
130130
131-
Balance detection speed vs overhead:
131+
Balance detection speed vs overhead (the default `check_interval` is `5s`):
132132
133133
```yaml
134134
endpoints:
135135
- url: "http://localhost:11434"
136-
check_interval: 30s # Not too frequent
136+
check_interval: 30s # Increase from 5s default to reduce overhead
137137
check_timeout: 2s # Fast failure detection
138138
```
139139
140140
Too frequent checks waste resources:
141141
142-
- 5s interval = 12 checks/minute/endpoint
142+
- 5s interval (default) = 12 checks/minute/endpoint
143143
- 30s interval = 2 checks/minute/endpoint
144144
- With 10 endpoints, that's 120 vs 20 checks/minute
145145
@@ -162,7 +162,7 @@ Typical memory usage:
162162
# Memory-conscious configuration
163163
server:
164164
request_limits:
165-
max_body_size: 5242880 # 5MB instead of 50MB
165+
max_body_size: 5242880 # 5MB instead of default 100MB
166166
max_header_size: 65536 # 64KB instead of 512KB
167167
168168
model_registry:

docs/content/configuration/reference.md

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -237,12 +237,12 @@ discovery:
237237
| `static.endpoints[].url` | string | Yes | Endpoint base URL |
238238
| `static.endpoints[].name` | string | Yes | Unique endpoint name |
239239
| `static.endpoints[].type` | string | Yes | Backend type (`ollama`, `lm-studio`, `llamacpp`, `vllm`, `sglang`, `lemonade`, `litellm`, `openai`) |
240-
| `static.endpoints[].priority` | int | No | Selection priority (higher=preferred) |
240+
| `static.endpoints[].priority` | int | No | Selection priority (higher=preferred, default: `100`) |
241241
| `static.endpoints[].preserve_path` | bool | No | Preserve base path in URL when proxying (default: `false`) |
242242
| `static.endpoints[].health_check_url` | string | No | Health check path (optional, uses profile default if not specified) |
243243
| `static.endpoints[].model_url` | string | No | Model discovery path (optional, uses profile default if not specified) |
244-
| `static.endpoints[].check_interval` | duration | No | Health check interval |
245-
| `static.endpoints[].check_timeout` | duration | No | Health check timeout |
244+
| `static.endpoints[].check_interval` | duration | No | Health check interval (default: `5s`) |
245+
| `static.endpoints[].check_timeout` | duration | No | Health check timeout (default: `2s`) |
246246
| `static.endpoints[].model_filter` | object | No | Model filtering for this endpoint |
247247
248248
#### URL Configuration
@@ -377,7 +377,7 @@ discovery:
377377
| `model_discovery.timeout` | duration | `30s` | Discovery timeout |
378378
| `model_discovery.concurrent_workers` | int | `5` | Parallel workers |
379379
| `model_discovery.retry_attempts` | int | `3` | Retry attempts |
380-
| `model_discovery.retry_backoff` | duration | `5s` | Retry backoff |
380+
| `model_discovery.retry_backoff` | duration | `1s` | Retry backoff |
381381
382382
Example:
383383
@@ -389,7 +389,7 @@ discovery:
389389
timeout: 30s
390390
concurrent_workers: 10
391391
retry_attempts: 3
392-
retry_backoff: 5s
392+
retry_backoff: 1s
393393
```
394394
395395
## Model Registry Configuration
@@ -455,7 +455,7 @@ model_registry:
455455
|-------|------|---------|-------------|
456456
| `unification.enabled` | bool | `true` | Enable unification |
457457
| `unification.stale_threshold` | duration | `24h` | Model retention time |
458-
| `unification.cleanup_interval` | duration | `10m` | Cleanup frequency |
458+
| `unification.cleanup_interval` | duration | `5m` | Cleanup frequency |
459459
| `unification.cache_ttl` | duration | `10m` | Cache TTL |
460460
461461
Example:
@@ -764,7 +764,7 @@ discovery:
764764
timeout: 30s
765765
concurrent_workers: 5
766766
retry_attempts: 3
767-
retry_backoff: 5s
767+
retry_backoff: 1s
768768
static:
769769
endpoints: []
770770
@@ -780,7 +780,7 @@ model_registry:
780780
unification:
781781
enabled: true
782782
stale_threshold: 24h
783-
cleanup_interval: 10m
783+
cleanup_interval: 5m
784784
cache_ttl: 10m
785785
custom_rules: []
786786
@@ -814,6 +814,14 @@ Olla validates configuration on startup:
814814
- Ports must be in valid range (1-65535)
815815
- CIDR blocks must be valid
816816
817+
Additionally, Olla's `Validate()` method catches dangerous zero or empty configuration values that would cause panics or silent failures at runtime. It runs after all config sources (file, environment overrides) have been merged, so the final state is what gets checked. The following conditions produce clear error messages at startup:
818+
819+
- `proxy.engine` is empty
820+
- `proxy.load_balancer` is empty
821+
- `discovery.type` is empty
822+
- `server.port` is zero or negative
823+
- When `model_discovery.enabled` is `true`: `interval`, `concurrent_workers`, or `timeout` is zero
824+
817825
## Next Steps
818826
819827
- [Configuration Examples](examples.md) - Common configurations

docs/content/faq.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -117,18 +117,18 @@ proxy:
117117
118118
server:
119119
request_limits:
120-
max_body_size: 5242880 # 5MB instead of default 50MB
120+
max_body_size: 5242880 # 5MB instead of default 100MB
121121
```
122122

123123
### Models not appearing
124124

125-
If models aren't being discovered:
125+
Model discovery is enabled by default. If models aren't being discovered:
126126

127-
1. Check model discovery is enabled:
127+
1. Verify it hasn't been explicitly disabled in your configuration:
128128
```yaml
129129
discovery:
130130
model_discovery:
131-
enabled: true
131+
enabled: false # Remove this line or set to true
132132
```
133133

134134
2. Verify endpoints are healthy:

docs/content/getting-started/quickstart.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,13 +58,14 @@ discovery:
5858
name: "local-ollama"
5959
type: "ollama"
6060
priority: 100
61-
health_check_url: "/"
6261

6362
logging:
6463
level: "info"
6564
format: "json"
6665
```
6766
67+
Settings like `check_interval`, `check_timeout`, and `priority` are optional -- Olla provides sensible defaults for each backend type via its profile system.
68+
6869
The rest will be from the shipped defaults.
6970

7071
### 2. Start Olla

docs/content/integrations/backend/ollama.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -88,10 +88,10 @@ discovery:
8888
name: "local-ollama"
8989
type: "ollama"
9090
priority: 100
91-
model_url: "/api/tags"
92-
health_check_url: "/"
93-
check_interval: 2s
94-
check_timeout: 1s
91+
model_url: "/api/tags" # optional, profile default: /api/tags
92+
health_check_url: "/" # optional, profile default: /
93+
check_interval: 2s # optional, default: 5s
94+
check_timeout: 1s # optional, default: 2s
9595
```
9696
9797
### Multiple Ollama Instances

0 commit comments

Comments
 (0)