Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions inference-platforms/archgw/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,9 @@ and anything added in Arch Gateway's [wasm filter][archgw-wasm].
instructions to run from Docker (to avoid nested docker).
* Traces come from Envoy, whose configuration is written by `archgw`. At the
moment, this hard-codes aspects including default ports.
* Prometheus metrics show the cluster as "openai_host" - the provider_interface
plus the first segment of the hostname (dots truncate the rest). The "host"
comes from "host.docker.internal".
* Until [this][openai-responses] resolves, don't use `--use-responses-api`.

The chat prompt was designed to be idempotent, but the results are not. You may
Expand Down
3 changes: 2 additions & 1 deletion inference-platforms/archgw/arch_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ listeners:
timeout: 30s

llm_providers:
- model: ollama/qwen3:0.6b
# We don't use 'ollama', as we want to use 'openai' provider_interface.
- model: local/qwen3:0.6b
provider_interface: openai
# This configuration is converted to Envoy and run inside Docker.
base_url: http://host.docker.internal:11434
Expand Down
3 changes: 1 addition & 2 deletions inference-platforms/archgw/docker-compose-elastic.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@ configs:
# Configuration is simplified from archgw here:
# https://github.com/katanemo/archgw/blob/main/docs/source/guides/observability/monitoring.rst
#
# Note: The prometheus cluster name for qwen3:0.65b will shows up as '6b'
# See https://github.com/katanemo/archgw/issues/504
# Note: The cluster name for openai + host.docker.internal = openai_host
prometheus-pump-config:
content: |
receivers:
Expand Down