-
Notifications
You must be signed in to change notification settings - Fork 149
Description
Description:
AIGatewayRoute requests fail with response_payload_too_large when backend responses are large.
With some troubleshooting it appears to be related to the ext_proc filter configuration. The generated filter does not set max_message_size, which may be causing Envoy to use a small default buffer limit when response_body_mode: BUFFERED is configured?
I came across #1213 which addressed the gRPC server-side receive limit (-maxRecvMsgSize), but I think there's a separate limit on the Envoy filter side that isn't being configured.
Expected behavior: Embedding responses (and other large AI responses) should pass through the gateway successfully
Repro steps:
- Deploy an AIGatewayRoute pointing to an embedding model backend (e.g., local llama.cpp server with qwen3-embedding-8b, or text-embedding-large)
curl "https://<gateway-host>/v1/embeddings" \
-H "Content-Type: application/json" \
-d '{"input": "Hello world", "model": "qwen3-embedding-8b"}'
Internal Server Error%- Direct to the pod, this works.
kubectl run curl-test --rm -it --image=curlimages/curl --restart=Never -- \
curl -s http://<backend-service>:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"input": "test", "model": "qwen3-embedding-8b"}' | wc -c
# Returns: 88171- Smaller embedding models, this works.
curl "https://<gateway-host>/v1/embeddings" \
-H "Content-Type: application/json" \
-d '{"input": "Hello world", "model": "text-embedding-3-small"}'
....
This is the current AIGateway Route, I've also tried adjusting Backend and CLient Traffic Policies:
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: qwen3-embedding
namespace: ai
spec:
parentRefs:
- name: envoy
kind: Gateway
group: gateway.networking.k8s.io
namespace: default
rules:
- matches:
- headers:
- type: Exact
name: x-ai-eg-model
value: qwen3-embedding-8b
backendRefs:
- name: qwen3-embeddingEnvironment:
Envoy AI Gateway: v0.4.0 (docker.io/envoyproxy/ai-gateway-extproc:v0.4.0)
Envoy Gateway: v1.6.1
Envoy Proxy: v1.36.3
Kubernetes: Talos Linux
Logs:
envoy {":authority":"qwen3-embedding-8b-gguf.ai.svc.cluster.local","bytes_received":55,"bytes_sent":21,"connection_termination_details":null,"downstream_local_address":"10.158.1.92:10443","downstream_remote_address":"10.4.16.75:55637","duration":100,"method":"POST","protocol":"HTTP/2","requested_server_name":"ai.juanah.net","response_code":500,"response_code_details":"response_payload_too_large","response_flags":"-","route_name":"httproute/ai/qwen3-embedding/rule/0/match/0/*_juanah_net","start_time":"2025-12-16T11:56:37.259Z","upstream_cluster":"httproute/ai/qwen3-embedding/rule/0","upstream_host":"10.242.112.4:8080","upstream_local_address":"10.158.1.92:43580","upstream_transport_failure_reason":null,"user-agent":"curl/8.7.1","x-envoy-origin-path":"/v1/embeddings","x-envoy-upstream-service-time":null,"x-forwarded-for":"10.4.16.75","x-request-id":"dd9fc6d6-c557-4729-bff8-107306ed85f8"}
envoy {":authority":"api.openai.com","bytes_received":59,"bytes_sent":21,"connection_termination_details":null,"downstream_local_address":"10.158.1.92:10443","downstream_remote_address":"10.4.16.75:55653","duration":1018,"method":"POST","protocol":"HTTP/2","requested_server_name":"ai.juanah.net","response_code":500,"response_code_details":"response_payload_too_large","response_flags":"-","route_name":"httproute/openai/openai-reasoning/rule/6/match/0/*_juanah_net","start_time":"2025-12-16T11:56:48.116Z","upstream_cluster":"httproute/openai/openai-reasoning/rule/6","upstream_host":"162.159.140.245:443","upstream_local_address":"10.158.1.92:49948","upstream_transport_failure_reason":null,"user-agent":"curl/8.7.1","x-envoy-origin-path":"/v1/embeddings","x-envoy-upstream-service-time":null,"x-forwarded-for":"10.4.16.75","x-request-id":"7e4f4f34-ae20-4fd3-857c-04f3c16ea242"}
ai-gateway-extproc 2025/12/16 11:56:57 traces export: Post "http://phoenix-svc.phoenix.svc.cluster.local:6006/v1/traces": dial tcp: lookup phoenix-svc.phoenix.svc.cluster.local on 10.242.0.10:53: no such host
envoy {":authority":"api.openai.com","bytes_received":59,"bytes_sent":33244,"connection_termination_details":null,"downstream_local_address":"10.158.1.92:10443","downstream_remote_address":"10.4.16.75:55655","duration":991,"method":"POST","protocol":"HTTP/2","requested_server_name":"ai.juanah.net","response_code":200,"response_code_details":"via_upstream","response_flags":"-","route_name":"httproute/openai/openai-reasoning/rule/5/match/0/*_juanah_net","start_time":"2025-12-16T11:56:53.357Z","upstream_cluster":"httproute/openai/openai-reasoning/rule/5","upstream_host":"172.66.0.243:443","upstream_local_address":"10.158.1.92:47964","upstream_transport_failure_reason":null,"user-agent":"curl/8.7.1","x-envoy-origin-path":"/v1/embeddings","x-envoy-upstream-service-time":"430","x-forwarded-for":"10.4.16.75","x-request-id":"ea649416-819c-4c45-9269-6b7178bb5db2"}
{
"name": "envoy.filters.http.ext_proc/aigateway",
"typed_config": {
"@type": "type.googleapis.com/envoy.extensions.filters.http.ext_proc.v3.ExternalProcessor",
"grpc_service": {
"envoy_grpc": {
"cluster_name": "ai-gateway-extproc-uds"
}
},
"processing_mode": {
"request_header_mode": "SEND",
"response_header_mode": "SEND",
"request_body_mode": "BUFFERED",
"response_body_mode": "BUFFERED",
"request_trailer_mode": "SKIP",
"response_trailer_mode": "SKIP"
},
"message_timeout": "10s",
"allow_mode_override": true
}
}