AIGatewayRoute requests fail with "response_payload_too_large" for large embeddings/responses

_Description_:

AIGatewayRoute requests fail with response_payload_too_large when backend responses are large. 

With some troubleshooting it appears to be related to the ext_proc filter configuration. The generated filter does not set max_message_size, which may be causing Envoy to use a small default buffer limit when response_body_mode: BUFFERED is configured?

I came across #1213 which addressed the gRPC server-side receive limit (-maxRecvMsgSize), but I think there's a separate limit on the Envoy filter side that isn't being configured.

Expected behavior: Embedding responses (and other large AI responses) should pass through the gateway successfully

_Repro steps_:

1. Deploy an AIGatewayRoute pointing to an embedding model backend (e.g., local llama.cpp server with qwen3-embedding-8b, or text-embedding-large)

```bash
curl "https://<gateway-host>/v1/embeddings" \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello world", "model": "qwen3-embedding-8b"}'

Internal Server Error%
```

2. Direct to the pod, this works. 
```bash
kubectl run curl-test --rm -it --image=curlimages/curl --restart=Never -- \
  curl -s http://<backend-service>:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input": "test", "model": "qwen3-embedding-8b"}' | wc -c
# Returns: 88171
```

3. Smaller embedding models, this works.
```
curl "https://<gateway-host>/v1/embeddings" \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello world", "model": "text-embedding-3-small"}'

....
``` 

This is the current AIGateway Route, I've also tried adjusting Backend and CLient Traffic Policies:

```yaml
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
  name: qwen3-embedding
  namespace: ai
spec:
  parentRefs:
    - name: envoy
      kind: Gateway
      group: gateway.networking.k8s.io
      namespace: default
  rules:
    - matches:
        - headers:
            - type: Exact
              name: x-ai-eg-model
              value: qwen3-embedding-8b
      backendRefs:
        - name: qwen3-embedding
```

_Environment_:

Envoy AI Gateway: v0.4.0 (docker.io/envoyproxy/ai-gateway-extproc:v0.4.0)
Envoy Gateway: v1.6.1
Envoy Proxy: v1.36.3
Kubernetes: Talos Linux

_Logs_:

```
envoy {":authority":"qwen3-embedding-8b-gguf.ai.svc.cluster.local","bytes_received":55,"bytes_sent":21,"connection_termination_details":null,"downstream_local_address":"10.158.1.92:10443","downstream_remote_address":"10.4.16.75:55637","duration":100,"method":"POST","protocol":"HTTP/2","requested_server_name":"ai.juanah.net","response_code":500,"response_code_details":"response_payload_too_large","response_flags":"-","route_name":"httproute/ai/qwen3-embedding/rule/0/match/0/*_juanah_net","start_time":"2025-12-16T11:56:37.259Z","upstream_cluster":"httproute/ai/qwen3-embedding/rule/0","upstream_host":"10.242.112.4:8080","upstream_local_address":"10.158.1.92:43580","upstream_transport_failure_reason":null,"user-agent":"curl/8.7.1","x-envoy-origin-path":"/v1/embeddings","x-envoy-upstream-service-time":null,"x-forwarded-for":"10.4.16.75","x-request-id":"dd9fc6d6-c557-4729-bff8-107306ed85f8"}
envoy {":authority":"api.openai.com","bytes_received":59,"bytes_sent":21,"connection_termination_details":null,"downstream_local_address":"10.158.1.92:10443","downstream_remote_address":"10.4.16.75:55653","duration":1018,"method":"POST","protocol":"HTTP/2","requested_server_name":"ai.juanah.net","response_code":500,"response_code_details":"response_payload_too_large","response_flags":"-","route_name":"httproute/openai/openai-reasoning/rule/6/match/0/*_juanah_net","start_time":"2025-12-16T11:56:48.116Z","upstream_cluster":"httproute/openai/openai-reasoning/rule/6","upstream_host":"162.159.140.245:443","upstream_local_address":"10.158.1.92:49948","upstream_transport_failure_reason":null,"user-agent":"curl/8.7.1","x-envoy-origin-path":"/v1/embeddings","x-envoy-upstream-service-time":null,"x-forwarded-for":"10.4.16.75","x-request-id":"7e4f4f34-ae20-4fd3-857c-04f3c16ea242"}
ai-gateway-extproc 2025/12/16 11:56:57 traces export: Post "http://phoenix-svc.phoenix.svc.cluster.local:6006/v1/traces": dial tcp: lookup phoenix-svc.phoenix.svc.cluster.local on 10.242.0.10:53: no such host
envoy {":authority":"api.openai.com","bytes_received":59,"bytes_sent":33244,"connection_termination_details":null,"downstream_local_address":"10.158.1.92:10443","downstream_remote_address":"10.4.16.75:55655","duration":991,"method":"POST","protocol":"HTTP/2","requested_server_name":"ai.juanah.net","response_code":200,"response_code_details":"via_upstream","response_flags":"-","route_name":"httproute/openai/openai-reasoning/rule/5/match/0/*_juanah_net","start_time":"2025-12-16T11:56:53.357Z","upstream_cluster":"httproute/openai/openai-reasoning/rule/5","upstream_host":"172.66.0.243:443","upstream_local_address":"10.158.1.92:47964","upstream_transport_failure_reason":null,"user-agent":"curl/8.7.1","x-envoy-origin-path":"/v1/embeddings","x-envoy-upstream-service-time":"430","x-forwarded-for":"10.4.16.75","x-request-id":"ea649416-819c-4c45-9269-6b7178bb5db2"}
```

```json
{
  "name": "envoy.filters.http.ext_proc/aigateway",
  "typed_config": {
    "@type": "type.googleapis.com/envoy.extensions.filters.http.ext_proc.v3.ExternalProcessor",
    "grpc_service": {
      "envoy_grpc": {
        "cluster_name": "ai-gateway-extproc-uds"
      }
    },
    "processing_mode": {
      "request_header_mode": "SEND",
      "response_header_mode": "SEND",
      "request_body_mode": "BUFFERED",
      "response_body_mode": "BUFFERED",
      "request_trailer_mode": "SKIP",
      "response_trailer_mode": "SKIP"
    },
    "message_timeout": "10s",
    "allow_mode_override": true
  }
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AIGatewayRoute requests fail with "response_payload_too_large" for large embeddings/responses #1662

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AIGatewayRoute requests fail with "response_payload_too_large" for large embeddings/responses #1662

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions