-
Notifications
You must be signed in to change notification settings - Fork 521
Description
Description:
We noticed that envoy CPU spikes whenever a pod restarts. This worsens when multiple deployments roll out together. The perf output shows most of these CPU spent on parsing and compiling CORS regex. This spike persists even zero traffic sent to the envoy.

We can re-produce the same issue in a test environment with a gateway and multiple Httproutes configured in different namespaces (see details below). After digging into a little deeper, we identified the following issues/facts.
- Every pod, with at least an Httproute, restarts triggers a full route configuration reload. It also re-compiles paths and cores domains with regex patterns.
- We can confirm this by watching
Envoy::Regex::Utility::parseRegex
calls. see screenshots. - We also noticed that, after every pod restarts, the order of the route list changes. See attached screenshots; 1 = before first restart, 2 = after first pod restart, 3 = after second pod restarts.
- Applying Patchpolicy consistently results in this behavior (See the attached sample config).
- I can see this behaviour without patch policy when envoy proxy is newly started. But it is reproducible only one or two times. Not sure if too many pod churn can produce this.
I think this is the reason envoy spending too much CPU in the CORS re-compile. In our setup, we have 400+ routes and most of them are prefix or domain match. But we have 15+ regex based global cors domains. This means, 6000+ regex compilation on every pod restarts.

Repro steps:
- Apply a patch policy on
type.googleapis.com/envoy.config.listener.v3.Listener
- Create a single Gateway config and allow cross-namespace reference.
- Create multiple http routes in multiple namespaces with no domain names(i.e. *). Note: It should have regex pattern in few routes(or a global cors with wildcard match) to watch via bpftrace.
- Take config dump
- Restart one of the pod from any namespaces.
- Wait till new pod is back and endpoint is updated.
- Take second config dump
- Compare route order in config dumps.
How to watch via bpftrace?
You can run following onliner in host machine to list total Envoy::Regex::Utility::parseRegex
call made after a pod restarts.
bpftrace --unsafe -e ‘BEGIN{print(“Probing Envoy::Regex::Utility::parseRegex call”)} uprobe:/proc/<pid>/root/usr/local/bin/envoy:0x2087a00 { @total = count();print(“call”)}’ 2>/dev/null
Note that, the production image of envoy has a stripped binary and you need to calculate offset of fucntion manually from corresponding debug image. The offset 0x2087a00
may only work with envoyproxy/envoy:contrib-v1.32.5
.
Alternatively, you can use debug image itself as proxy image and run bpftrace -l <path>|grep parseRegex
to list the mangled symbol.
Environment
Gateway version: v1.2.7
Proxy version/image: envoyproxy/envoy:contrib-v1.32.5
Sample config: