Skip to content

AIGatewayRoute requests fail with "response_payload_too_large" for large embeddings/responses #1662

@ganawaj

Description

@ganawaj

Description:

AIGatewayRoute requests fail with response_payload_too_large when backend responses are large.

With some troubleshooting it appears to be related to the ext_proc filter configuration. The generated filter does not set max_message_size, which may be causing Envoy to use a small default buffer limit when response_body_mode: BUFFERED is configured?

I came across #1213 which addressed the gRPC server-side receive limit (-maxRecvMsgSize), but I think there's a separate limit on the Envoy filter side that isn't being configured.

Expected behavior: Embedding responses (and other large AI responses) should pass through the gateway successfully

Repro steps:

  1. Deploy an AIGatewayRoute pointing to an embedding model backend (e.g., local llama.cpp server with qwen3-embedding-8b, or text-embedding-large)
curl "https://<gateway-host>/v1/embeddings" \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello world", "model": "qwen3-embedding-8b"}'

Internal Server Error%
  1. Direct to the pod, this works.
kubectl run curl-test --rm -it --image=curlimages/curl --restart=Never -- \
  curl -s http://<backend-service>:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input": "test", "model": "qwen3-embedding-8b"}' | wc -c
# Returns: 88171
  1. Smaller embedding models, this works.
curl "https://<gateway-host>/v1/embeddings" \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello world", "model": "text-embedding-3-small"}'

....

This is the current AIGateway Route, I've also tried adjusting Backend and CLient Traffic Policies:

apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
  name: qwen3-embedding
  namespace: ai
spec:
  parentRefs:
    - name: envoy
      kind: Gateway
      group: gateway.networking.k8s.io
      namespace: default
  rules:
    - matches:
        - headers:
            - type: Exact
              name: x-ai-eg-model
              value: qwen3-embedding-8b
      backendRefs:
        - name: qwen3-embedding

Environment:

Envoy AI Gateway: v0.4.0 (docker.io/envoyproxy/ai-gateway-extproc:v0.4.0)
Envoy Gateway: v1.6.1
Envoy Proxy: v1.36.3
Kubernetes: Talos Linux

Logs:

envoy {":authority":"qwen3-embedding-8b-gguf.ai.svc.cluster.local","bytes_received":55,"bytes_sent":21,"connection_termination_details":null,"downstream_local_address":"10.158.1.92:10443","downstream_remote_address":"10.4.16.75:55637","duration":100,"method":"POST","protocol":"HTTP/2","requested_server_name":"ai.juanah.net","response_code":500,"response_code_details":"response_payload_too_large","response_flags":"-","route_name":"httproute/ai/qwen3-embedding/rule/0/match/0/*_juanah_net","start_time":"2025-12-16T11:56:37.259Z","upstream_cluster":"httproute/ai/qwen3-embedding/rule/0","upstream_host":"10.242.112.4:8080","upstream_local_address":"10.158.1.92:43580","upstream_transport_failure_reason":null,"user-agent":"curl/8.7.1","x-envoy-origin-path":"/v1/embeddings","x-envoy-upstream-service-time":null,"x-forwarded-for":"10.4.16.75","x-request-id":"dd9fc6d6-c557-4729-bff8-107306ed85f8"}
envoy {":authority":"api.openai.com","bytes_received":59,"bytes_sent":21,"connection_termination_details":null,"downstream_local_address":"10.158.1.92:10443","downstream_remote_address":"10.4.16.75:55653","duration":1018,"method":"POST","protocol":"HTTP/2","requested_server_name":"ai.juanah.net","response_code":500,"response_code_details":"response_payload_too_large","response_flags":"-","route_name":"httproute/openai/openai-reasoning/rule/6/match/0/*_juanah_net","start_time":"2025-12-16T11:56:48.116Z","upstream_cluster":"httproute/openai/openai-reasoning/rule/6","upstream_host":"162.159.140.245:443","upstream_local_address":"10.158.1.92:49948","upstream_transport_failure_reason":null,"user-agent":"curl/8.7.1","x-envoy-origin-path":"/v1/embeddings","x-envoy-upstream-service-time":null,"x-forwarded-for":"10.4.16.75","x-request-id":"7e4f4f34-ae20-4fd3-857c-04f3c16ea242"}
ai-gateway-extproc 2025/12/16 11:56:57 traces export: Post "http://phoenix-svc.phoenix.svc.cluster.local:6006/v1/traces": dial tcp: lookup phoenix-svc.phoenix.svc.cluster.local on 10.242.0.10:53: no such host
envoy {":authority":"api.openai.com","bytes_received":59,"bytes_sent":33244,"connection_termination_details":null,"downstream_local_address":"10.158.1.92:10443","downstream_remote_address":"10.4.16.75:55655","duration":991,"method":"POST","protocol":"HTTP/2","requested_server_name":"ai.juanah.net","response_code":200,"response_code_details":"via_upstream","response_flags":"-","route_name":"httproute/openai/openai-reasoning/rule/5/match/0/*_juanah_net","start_time":"2025-12-16T11:56:53.357Z","upstream_cluster":"httproute/openai/openai-reasoning/rule/5","upstream_host":"172.66.0.243:443","upstream_local_address":"10.158.1.92:47964","upstream_transport_failure_reason":null,"user-agent":"curl/8.7.1","x-envoy-origin-path":"/v1/embeddings","x-envoy-upstream-service-time":"430","x-forwarded-for":"10.4.16.75","x-request-id":"ea649416-819c-4c45-9269-6b7178bb5db2"}
{
  "name": "envoy.filters.http.ext_proc/aigateway",
  "typed_config": {
    "@type": "type.googleapis.com/envoy.extensions.filters.http.ext_proc.v3.ExternalProcessor",
    "grpc_service": {
      "envoy_grpc": {
        "cluster_name": "ai-gateway-extproc-uds"
      }
    },
    "processing_mode": {
      "request_header_mode": "SEND",
      "response_header_mode": "SEND",
      "request_body_mode": "BUFFERED",
      "response_body_mode": "BUFFERED",
      "request_trailer_mode": "SKIP",
      "response_trailer_mode": "SKIP"
    },
    "message_timeout": "10s",
    "allow_mode_override": true
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    configurationEnvoy Proxy Configuration RelatedquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions