ext_authz cache #37953

naoitoi-v · 2025-01-09T23:21:10Z

Commit Message: This is a draft PR to discuss the design of the response cache in the ext_authz filter.

We are seeking feedback on the design to make sure this is headed to the right direction.

Similar requests were discussed in #37449 and #3023

Additional Description:

Purpose:
We would like to cache the response from an external authorization server in the ext_authz filter for scalability. Depending on the use case, if you do not have to check the authentication token on every request, you can reduce the number of calls to the server by caching the response from it.

Functionality:
To use this functionality, you need to configure the name(s) of the header(s) in which the auth token is carried, e.g., "Authorization" or "x-auth-token." You can also configure the max cache size and TTL. When this is configured, ext_authz will cache the response from the external authorization server.

Implementation:
We implemented a simple cache. Responses are stored in a simple unordered_map. We implemented a simple FIFO eviction policy over the LRU policy to make the cache size small. In our benchmark, for each cache key (35 bytes), we need ~100 bytes of memory.

We store a response in the cache in the onComplete() method.
We read a cached response in the decoderHeaders() method.

Next Steps:

Write tests.
Make sure the code complies with the coding style.
Do pre-commit checks.
May add an option to NOT cache 4xx errors to avoid cache poisoning.
May add an option to inject randomness in TTL to avoid synchronization.

Risk Level: Low. This functionality would be turned off by default.

Testing: TBD
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

repokitteh-read-only · 2025-01-09T23:21:16Z

Hi @naoitoi-v, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #37953 was opened by naoitoi-v.

see: more, trace.

repokitteh-read-only · 2025-01-09T23:21:16Z

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #37953 was opened by naoitoi-v.

see: more, trace.

repokitteh-read-only · 2025-01-09T23:21:21Z

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @markdroth
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #37953 was opened by naoitoi-v.

see: more, trace.

ggreenway · 2025-01-10T00:45:17Z

Some early feedback:

This doesn't do FIFO eviction. I think the evicted element is the one with the lowest hash value. You'll need additional bookkeeping to know the order the elements were inserted.
The semantics of having multiple possible headers that contain the cache key are confusing, and possibly insecure.
You'll need to refactor all of this to conform to Envoy style and norms

adisuissa · 2025-01-10T13:48:32Z

A drive-by suggestion: it may be beneficial to generalize the caching for both ext_authz and ext_proc.
I suggest creating the cache configuration (an implementation) separate, and use it only for ext_authz in the first step.

ggreenway · 2025-01-10T16:08:44Z

Related to what Adi said, I think you need to cache the response message and replay it on each request. It may add or remove headers, etc. Just looking at the status code is insufficient.

naoitoi-v · 2025-01-10T18:37:38Z

Thank you for your early and insightful feedback, @ggreenway, @adisuissa.

This cache eviction implementation is not FIFO.

Great point!

In our (Verkada's) use case, it's important to keep the memory footprint as small as possible. We expect few evictions. We'll change this implementation to random eviction.

In the future, there may be users who want more sophisficated eviction policies (e.g., LRU, random sampling-based LRU). We'll restructure the code so that one can provide different cache implementation.

Looking for auth tokens in multiple headers

Good point. For now we'll make it to look at only one header.

Our use case may require it to look at multiple headers for leagacy reasons. If we have to go that route, we'll use header + auth token as a cache key, so we wouldn't mix auth tokens from multiple headers.

Compliance to Envoy coding style

Yes, we started this effort. Fixing things pointed out by //tools/code_format:check_format and //tools/code:check .

Making cache config and code one detached from ext_authz so that it can be reused in ext_proc

This makes sense. We'll try it.

Storing response message and headers

This makes sense.

In our use case, we'd only need the status code. It's important to minimize the memory footprint.

We'll write the code to store and replay the response messages and all the headers. We'll make this configurable so that the user can decide to store response & headers, or not.

**

Thanks for the great discussion, again. We'll keep you updated on the progress.

Nao

Signed-off-by: Nao Itoi <[email protected]>

naoitoi-v · 2025-01-11T01:06:04Z

I took care of (3) coding style.

Signed-off-by: Nao Itoi <[email protected]>

…his is from review feedback. Signed-off-by: Nao Itoi <[email protected]>

naoitoi-v · 2025-01-13T18:16:45Z

I took care of (2) Looking for auth tokens in multiple headers.
Now it reads only one header.

naoitoi-v · 2025-01-17T02:55:38Z

I added an eviction mechanism to address (1).

It emulates LRU by:

Advance iterator on the cache randomly
Read 1000 items to decide average TTL
Look at some (by default, 1% of the whole cache) items, and delete items whose TTL are older than a fraction (by default, 50%) of the average.

With these default, it was able to complete eviction in around 1ms on my testing (running on Docker - 12 CPU and 24GB memory - on Mac Book Pro).

Signed-off-by: Nao Itoi <[email protected]>

… ext_authz and ext_proc. Signed-off-by: Nao Itoi <[email protected]>

naoitoi-v · 2025-01-17T03:28:55Z

Moved the cache code to a common location to address (4).

markdroth · 2025-01-17T20:37:23Z

api/envoy/extensions/filters/http/ext_authz/v3/ext_authz.proto

+  uint32 response_cache_ttl = 31;
+
+  // Array of header names. ext_authz will use the first header that appears in the HTTP request as the cache key.
+  string response_cache_header_name = 32;


This seems to imply that in order for the ext_authz caching to function, the data plane request needs to have a particular header. This seems sub-optimal, because it basically requires the sending application to have all of the logic needed to set this header.

In practice, I think the cache key will wind up needing to be a combination of various attributes of the data plane request, but the exact set of those attributes is going to depend primarily on the ext_authz filter config, which determines which attributes of the data plane request to include in the ext_authz request. (It may also depend on the ext_authz server itself, which might only use a subset of the fields in the ext_authz request, but that will probably be reflected in the ext_authz filter config in most cases.) So this woud basically mean that the application would need logic to determine which attributes of the data plane request to take into account as part of the cache key, thus duplicating the logic in the ext_authz filter config -- which would mean that the ext_authz filter config can't be changed later without also changing the application logic.

I think it would make more sense to have some matching rules here that dictate the cache key as a function of the attributes of the data plane request. That way, all of the logic is self-contained in the ext_authz filter config and can be changed as needed, without breaking the application.

Thank you, @markdroth

I'd like to make sure I understand you correctly.

I believe the problem you are raising is that, currently, the cache is configured by specifying one header.

I have looked into Matcher: https://www.envoyproxy.io/docs/envoy/latest/xds/type/matcher/v3/matcher.proto#envoy-v3-api-msg-xds-type-matcher-v3-matcher

Matcher can express the choice of cache key very richly. For example, we can change the proto file like this:

message ExtAuthz {
// Other fields...

// Configuration for the response cache matcher envoy.type.matcher.v3.Matcher response_cache_matcher = 32;

}

Then, we can give a configuration like this:

resp_cache:
enabled: true
response_cache_matcher:
matcher_tree:
input:
name: request-headers
typed_config:
"@type": type.googleapis.com/envoy.type.matcher.v3.HttpRequestHeaderMatchInput
header_name: "auth-token"
exact_match_map:
map:
".*":
action:
name: "auth-token"

This would let ext_authz to use a header named "auth-token" as the cache key.

In addition, configuration can be changed to use combinations of a lot more parameters - path, query parameters, metadata, etc.

Would this address your concern?

Thank you,

Nao

I agree with Mark, this is much flexible design which allows us to build the cache key with more than just header field. E.g. We can define a cache policy like below

message RequestCachingPolicy {
// Specifies how the cache key should be built.
CacheKeyBuilder key_builder = 1;

// Specifies default expiration policy for the cache entries.
// It can be overridden by individual policy service by setting the
// TTLs field in ext_authz response.
CacheExpirationPolicy expiration_policy = 2;
}

message CacheKeyBuilder {
// A list of CEL expressions used to build a composite cache key.
// An empty list indicates no caching.
// The CEL expression will be evaluated against the envoy attributes
repeated string cel_expressions = 1
[(datapol.semantic_type) = ST_NOT_REQUIRED];
}

message CacheExpirationPolicy {
// Specifies the cache staleness TTL for all cache entries unless it is
// overwritten by the CacheExpirationPolicy in the ext_authz response.
google.protobuf.Duration max_age = 1;
}

An sample policy will be like:

caching_policy_matchers:
matcher_list:
matchers:
// Requests with header["x-goog-policy-set-name"]='something' will
// have cache key on "x-goog-proxy-auth-bin" and "x-goog-request-path"
// headers.

predicate:
single_predicate:
input:
typed_config:
'@type':
type.googleapis.com/envoy.type.matcher.v3.HttpRequestHeaderMatchInput
header_name: x-goog-policy-set-name
value_match:
exact: something
on_match:
action:
typed_config:
'@type': type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.CachingPolicy
key_builder:

cel_expression:request.headers["x-goog-request-path"]

cel_expression:request.headers["x-goog-proxy-auth-bin"]
expiration_policy:
max_age:
seconds: 300

@markdroth

I have spent a couple of days trying to do this with CEL (Common Expression Language). It turns out to be significant amount of work. It also made us re-think the request.

We agree that there may be different requirements for cache key creation. For example, we can imagine:

If header A exists, use that as cache key. If not, try header B.

Use regex to extract some part of header value.

Combine some parts of request, e.g., URI, path, query parameters.

WASM filters (e.g. Lua filter) are capable of this type of modification.

Our recommendation is to keep ext_authz's cache key simple in this PR: configure one header. For use cases that require custom cache key, we'd recommend using Lua filter or other WASM filters to construct the cache key and set it to a header, which ext_authz would read from.

If this approach becomes unacceptable in some cases (e.g., for performance reasons), we can consider implementing optimization to solve that problem. Keep it simple, and revisit it when we know more about the problem.

This PR is getting bigger and I still have some ways to go, so it would be really helpful if I can limit the scope and focus on getting it done well.

Please let us know if this would work.

Thank you,

Nao

@naoitoi-v it's a hassle, both to implement and for users, to change configuration after it is released. I understand that you don't want to make this PR too big, but it is important that we think through what the config will look like, both immediately and for anticipated future changes, and do something sensible.

The more we discuss this and I understand your very-specific requirements, I think the entire thing should be an extension point. I think the interface in the ext_authz filter should be kept very simple: a function call that passes in the entire request, which returns either a message CheckResponse (which can be constructed from some cached data; it doesn't need to be a copy of the response from the original ext_authz service), or a response indicating "this wasn't in the cache; do the normal side-call". And a function call to add items to cache after a successful side-call.

Then the only configuration in the ext_authz filter will be an optional core.v3.TypedExtensionConfig cache_config.

WDYT?

@ggreenway

I think we are thinking on similar tracks. I have just pushed a commit to split the cache interface (RespCache) from this particular implementation (SimpleCache).

I believe what you are suggesting re: core.v3.TypedExtensionConfig cache_config is -

cache_config specifies which type of cache implementation user wants to use (e.g. SimpleCache), and provides all the configurations necessary for that implementation. We just need to add one message to ext_authz.proto.

Is this correct?
I'll work on this based on my understanding. Please LMK if this is not consistent with your understanding.

Yes, that's correct.

@ggreenway

I pushed a commit to take care of this.

As discussed, ext_authz.proto has this:

optional envoy.config.core.v3.TypedExtensionConfig response_cache_config = 30;

And it's up to the cache implementation to interpret the config. I've created simple_cache.proto for this.

I left only one other config, response_cache_remember_body_headers, in ext_authz. I considered moving this to simple_cache.proto as well. However, doing so would require SimpleCache to be aware of what's stored in cache (e.g., Filters::Common::ExtAuthz::Response, which is an ext_authz specific data structure), which makes cache less generic. I kept what's stored in cache a template type, that way cache implementation doesn't have to know the type of stored data.

duckya · 2025-01-18T01:46:59Z

api/envoy/extensions/filters/http/ext_authz/v3/ext_authz.proto

+  // Configuration parameters for ext_authz response cache
+  uint32 response_cache_max_size = 30;
+
+  uint32 response_cache_ttl = 31;


We will need more than 1 TTL, at minimum, 1 TTL when the ext_authz server is healthy and 1TTL when the server is not available. The https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control has more TTLs that controls the cache behavior.

I would recommend grouping TTLs into a separate message for extensibility.

Thank you, @duckya

I believe the motivation for your suggestion of multiple TTLs is - If the external authz server is unavailable, we want to rely on the cached decision longer. This would enhance the system's availability. Is this correct?

To do this, having multiple TTLs wouldn't be enough. We'd have to implement a new feature, that is, "if ext_authz detects the authz service to be down, extend the TTL of the current cached objects."

This makes sense, but, as we still have some ways to go in this PR (cache headers & response body, use matcher, create a new proto message, write tests, rebase, etc.), we'd like to consider this as a feature request in the future, instead of including it in this PR. Would this work?

Thanks,

Nao

I am not asking implementing the multiple TTLs for now and I understand we could put it as a new feature request. But I think we should design the configuration in a extensible way to accommodate future improvement. We could have a proto message called CacheExpirationPolicy, and moving response_cache_ttl field there, we could even rename response_cache_ttl to max_age to match the generic cache-control semantic.

duckya · 2025-01-18T01:50:33Z

api/envoy/extensions/filters/http/ext_authz/v3/ext_authz.proto

@@ -310,6 +310,21 @@ message ExtAuthz {
  // Field ``latency_us`` is exposed for CEL and logging when using gRPC or HTTP service.
  // Fields ``bytesSent`` and ``bytesReceived`` are exposed for CEL and logging only when using gRPC service.
  bool emit_filter_state_stats = 29;
+
+  // Configuration parameters for ext_authz response cache
+  uint32 response_cache_max_size = 30;


I think it probably makes more sense to group cache related settings into one message.

Thank you, @duckya

I agree. We can create a new proto file resp_cache.proto and move all these cache related settings in there, as a message "RespCacheSettings".

Then import resp_cache.proto from ext_authz.proto.

This would open a possibility for another filter, e.g., ext_proc, to use the response cache.

Does this address your concern?

Thanks,

Nao

Yes, that's exactly what want. Maybe we don't specify the "Resp" in the name, just go with CacheSettings.

…e() so that part of it can be called by onDecodeHeaders() when cache is hit. Signed-off-by: Nao Itoi <[email protected]>

naoitoi-v · 2025-01-21T22:32:33Z

We'd like to summarize where we are. (Especially for @markdroth , who is the assigned reviewer.)

Thank you all for the great feedback.

These are the remaining tasks for this PR:

Cache response body and headers from ext_authz, and apply them on cache hit.
Make cache key more configurable, perhaps by using mapper.
Make configuration more contained and extensible, e.g., by introducing new proto messages like CacheSettings.
Write tests and provide coverage.
Rebase - it may need meaningful amount of work to rebase this PR.

We'd like to complete and merge this PR ASAP, so that we can start running this in our environments to get more data points (that may help us improve it further).

We'd like to consider further enhancements as feature requirements post-PR.

Please let us know if this makes sense.

Thank you,

Nao

ggreenway · 2025-01-21T23:10:20Z

source/common/resp_cache/resp_cache.h

+  // than a fraction (by default, 50%) of the average. Eviction takes two configuration parameters:
+  // - Eviction candidate ratio: The fraction of the cache to read for potential eviction
+  // - Eviction threshold ratio: Evict items with TTL lower than X% of the average TTL
+  void Evict() {


This eviction function is problematic. You're always starting at begin() and iterating forward, so the removed items are going to be biased towards certain hash values.

Also, since you're doing a bunch of them at once, while holding a mutex, you'll block all envoy workers during the eviction, which isn't acceptable.

If you really don't want to do a traditional LRU, you'll need to come up with something better than this.

Please see the code starting on line 117, "//Step 1: Advance pointer randomly". It advances the iterator randomly to avoid the bias.

I understand your perspective on this eviction policy.

We believe this eviction policy will work well for our use case. Let me walk through the thought process:

We want to minimize the memory footprint of the cache because we run Envoy on multiple K8 pods.

Our cache key space is predictable. We plan to make cache large enough to fit all cache keys. Therefore, eviction should be rare.

We are storing each item's TTL, so we can use this information.

We measured this eviction mechanism. It seems fast enough for our use case (takes ~1ms to delete 2500 items out of 1 million cached items). It is configurable, so we can lower the number to delete if we need eviction to take shorter amount of time.

I suspect the best eviction policy would depend on the use case.

We will make eviction policy configurable. Hopefully, different policies will be implemented, and users will be able to pick the right one for their use case.

Do you have a particular policy in mind? If we have a policy that can achieve better results without increasing memory footprint, we'd be very interested.

We believe this eviction policy will work well for our use case.

I'm trying to make sure that we end up with something that will work for the general case, not just your specific use case.

If you make the policy (and thus the datastructure) configurable, I think that will be fine. You could add an extension point ext_authz caching, and then it would be easy to create any policy you want.

Please see the code starting on line 117, "//Step 1: Advance pointer randomly". It advances the iterator randomly to avoid the bias.

I missed that. That's a very expensive operation on a large cache. It will likely blow out all levels of the CPU cache.

I have measured eviction.

Environment: Linux Docker container running on my Mac (Apple M3 Pro). I took average of 3 tries of CPU time. I fixed advance_steps to 500,000 to avoid randomness in measurement.

Cache max size is 1,000,000. eviction_candidate_ratio is 1%, and eviction_threshold_ratio is 50%. It would result in 1M * 0.01 * 0.25 = ~2500 items getting evicted.

Advancing iterator: 593 microseconds

Eviction (including (1)): 6774 microseconds

Advancing iterator is a small part of whole Evict(). On average, eviction overall takes ~7ms.
If the user needs shorter eviction time, they can decrease eviction_candidate_ratio / increase eviction_threshold_ratio.

As we discussed earlier, we will make cache replaceable. We suggest that we move forward with this algorithm in this PR. We can adjust as we receive performance data.

It would be useful to build a speed_test (see other envoy test examples) for this benchmark so it's easy for others to run, run on different hardware, and run with different cache config parameters

ggreenway · 2025-01-21T23:12:11Z

source/common/resp_cache/resp_cache.h

+
+  bool Insert(const char* key, uint16_t value, int ttl_seconds = -1) {
+    Thread::LockGuard lock{mutex_};
+    const char* c_key = strdup(key);


Make the hashmap key be std::string and get rid of all the c-style string handling.

When we had std::string, it increased memory footprint. We tried c-style string, and it helped reduce the memory footprint. We could revisit this (avoid c-style string while making sure memory footprint doesn't increase), but would prefer to do it after the initial PR.

No, I won't accept this PR with this style of string/memory management. At a minimum, it all needs to be wrapped in RAII-style objects.

Understood, will work on this.

I took care of this (convert usages of char* to std::string) in the latest commit.

ggreenway · 2025-01-21T23:15:45Z

source/common/resp_cache/resp_cache.h

@@ -0,0 +1,209 @@
+#pragma once


This only has a single use right now, in ext_authz. Move this file into that extension.

We had the cache code inside ext_authz initially.

We moved it to the common location in response to this comment by @adisuissa :

A drive-by suggestion: it may be beneficial to generalize the caching for both ext_authz and ext_proc.
I suggest creating the cache configuration (an implementation) separate, and use it only for ext_authz in the first step.

I think this feedback makes sense.

I don't think we know enough yet to know whether we'll be able to use the same cache structure for ext_authz and ext_proc. I think that at the xDS layer, we should have a separate API for each one. If it turns out that the underlying implementations can actually share code, they can do so easily enough.

…fications, metadata). Signed-off-by: Nao Itoi <[email protected]>

naoitoi-v · 2025-01-22T00:57:48Z

Pushed a commit to take care of "Cache response body and headers from ext_authz, and apply them on cache hit."

Signed-off-by: Nao Itoi <[email protected]>

source/common/resp_cache/simple_cache.h

…mplementation can be replaced without modifying ext_authz. Signed-off-by: Nao Itoi <[email protected]>

ggreenway · 2025-01-29T23:47:33Z

api/envoy/extensions/filters/http/ext_authz/v3/ext_authz.proto

-
-  bool response_cache_remember_body_headers = 35;
+  // Cache configuration to store response from external auth server.
+  optional envoy.config.core.v3.TypedExtensionConfig response_cache_config = 30;


Suggested change

optional envoy.config.core.v3.TypedExtensionConfig response_cache_config = 30;

envoy.config.core.v3.TypedExtensionConfig response_cache_config = 30;

optional is the default in proto3

Good point, thank you. I'll fix it in the next commit.

ggreenway · 2025-01-29T23:48:08Z

api/envoy/extensions/filters/http/ext_authz/v3/ext_authz.proto

+  // Cache configuration to store response from external auth server.
+  optional envoy.config.core.v3.TypedExtensionConfig response_cache_config = 30;
+  // Tell cache whether to remember the whole response from authz server (response body, header modification, meta data, status) or just status.
+  optional bool response_cache_remember_body_headers = 31;


This should be in the extension's configuration; it's likely that some cache types wouldn't support this option.

We had put in a lot of thoughts into this choice. Let me explain the pros & cons.

Option 1: "remember_body_headers" config is in SimpleCache:
Pro: All cache related configurations are in one place (inside response_cache_config).
Con: This would make SimpleCache specific to ext_authz. The full response is defined by Filters::Common::ExtAuthz::Response, which is ext_authz data structure. If SimpleCache uses this, then it can only used for ext_authz. Earlier, we received a feedback to try to make cache independent of the use case, so that it can be used from another use case (e.g. ext_proc).

Option 2: "remember_body_headers" config is in ExtAuthz:
Pro: Keeps RespCache and SimpleCache agnostic to the type of stored data. We think this is generally a good design - makes cache more flexible.
Con: ExtAuthz has two cache related configurations, instead of one.

The way we think is the choice of caching the full response, or just status code, is a functionality of ExtAuthz. ExtAuthz does different things depending on this configuration.

I do agree with an earlier feedback that suggested to decouple the cache from ExtAuthz, so that it can be used in other contexts.

You'll need another layer that is the implementation behind the typed_config. It will have this boolean setting. It will instantiate a cache (however that is done) and return a pointer to an interface that is implementation-agnostic.

This setting definitely belongs in the typed_config, not here.

I agree that if we have this field, it belongs in the cache config extension, not directly here in the ext_authz filter config.

That having been said, I don't think we should actually have this field at all. See my comment elsewhere about performance considerations: I don't think we want to have to construct a CheckRequest proto or decode a CheckResponse proto in the cache-hit case. Given that, I don't think we want to be caching the CheckResponse proto or its associated headers in the first place.

ggreenway · 2025-01-29T23:50:20Z

source/common/resp_cache/simple_cache.proto

@@ -0,0 +1,20 @@
+syntax = "proto3";


This file needs to be under api/. This is a new type of extension. It could either go in api/envoy/extensions/ext_authz_cache/simple, or maybe it belongs under api/envoy/extensions/filters/http/ext_authz/cache/simple. @envoyproxy/api-shepherds can you provide guidance on the correct location?

I think api/envoy/extensions/ext_authz_cache/simple/v3 is probably the right place.

ggreenway · 2025-01-29T23:51:34Z

source/extensions/filters/http/ext_authz/ext_authz.cc

-    const std::string auth_header_str(auth_header[0]->value().getStringView());
-
+  if (config_->responseCacheConfig().has_value()) {
+#ifdef CACHE_PERF_TEST


This type of code should be moved to somewhere under /test. Search around for *_speed_test.cc to see many examples of speed-tests and how they're setup, parameterized, etc.

ggreenway · 2025-01-29T23:55:38Z

source/extensions/filters/http/ext_authz/ext_authz.cc

+
+  if (response_cache_remember_body_headers_) {
+    // Initialize the cache that stores full response (body, header modification, metadata, status).
+    auto cache_result = Envoy::Common::RespCache::createCache<Filters::Common::ExtAuthz::Response>(


With an extension config, you'll declare a factory interface, and then call something like

Config::Utility::getAndCheckFactory<ExtAuthzFactory>( config)

And each extension will register it's factory with that type.

ggreenway · 2025-01-30T00:50:21Z

source/extensions/filters/http/ext_authz/ext_authz.cc

+    if (config_->responseCacheRememberBodyHeaders()) {
+      auto cached_response =
+          config_->responseCache<Filters::Common::ExtAuthz::Response>().Get(headers);
+      if (cached_response.has_value()) {
+        onCompleteSub(std::make_unique<Filters::Common::ExtAuthz::Response>(
+            std::move(cached_response.value())));
+        return Http::FilterHeadersStatus::StopIteration;
+      }
+    } else {
+      // Retrieve the HTTP status code from the cache
+      auto cached_status_code = config_->responseCache<uint16_t>().Get(headers);
+      if (cached_status_code.has_value()) {
+        if (*cached_status_code >= 200 && *cached_status_code < 300) {
+          // Any 2xx response is a success: let the request proceed
+          return Http::FilterHeadersStatus::Continue;
+        } else {
+          // Non-2xx response: reject the request
+          decoder_callbacks_->streamInfo().setResponseFlag(
+              StreamInfo::CoreResponseFlag::UnauthorizedExternalService);
+          decoder_callbacks_->sendLocalReply(
+              static_cast<Http::Code>(*cached_status_code), "Unauthorized", nullptr, absl::nullopt,
+              Filters::Common::ExtAuthz::ResponseCodeDetails::get().AuthzDenied);
+
+          return Http::FilterHeadersStatus::StopIteration;
+        }


All of this logic belongs in the cache extension implementation.

The changes here (in the core ext_authz filter) should be the following:

Right before ext_authz gRPC call is made, pass the generated CheckRequest to the cache via absl::optional<CheckResponse> cached = config_->responseCacheConfig().getCachedResponse(check_request); If this has a value, onComplete(cached.value()); return;

When receiving a response from the ext_authz server, call config_->responseCacheConfig().cacheResponse(check_response);

All work to extract various headers, response codes, etc, should be done in your cache extension. The code here should be extremely minimal.

I don't actually think that's actually the right layer at which to be doing this caching. This would mean that even in the cache-hit case, we'd still need to construct the request to the ext_authz server, and we'd still need to decode the cached response from the ext_authz server. For performance reasons, at least in gRPC, I don't think that's going to be feasible.

I think what we actually want here is to use the attributes of the data plane request (see here; I'm not sure what data structure Envoy's implementation uses for this internally) to determine the cache key, and then to have the contents of the cache entry be the already-decoded authz decision. In other words, when we see the data plane request, we do something like this:

absl::optional<AuthzDecision> cached = config_->cache().getCachedResponse(data_plane_request_properties); if (cached.has_value()) { // ...use *cached to determine how to handle data plane request... } else { // ...construct CheckRequest and send it to the ext_authz server... }

Then when receiving a response from the ext_authz server:

AuthzDecision authz_decision = decodeCheckResponse(check_response); config_->cache().cacheAuthzDecision(authz_decision); // ...use authz_decision to determine how to handle data plane request...

markdroth

Just to be clear, I am the assigned reviewer for the xDS API, not for the Envoy implementation. It looks like @ggreenway is the reviewer for the Envoy implementation. The reason for this separation is that the xDS API is not just for Envoy, it is also used by other data planes, such as gRPC, so we need to consider more than just what works for Envoy before we approve API changes.

Please let me know if you have any questions about that.

markdroth · 2025-01-30T20:47:21Z

source/extensions/filters/http/ext_authz/ext_authz.cc

+    if (config_->responseCacheRememberBodyHeaders()) {
+      auto cached_response =
+          config_->responseCache<Filters::Common::ExtAuthz::Response>().Get(headers);
+      if (cached_response.has_value()) {
+        onCompleteSub(std::make_unique<Filters::Common::ExtAuthz::Response>(
+            std::move(cached_response.value())));
+        return Http::FilterHeadersStatus::StopIteration;
+      }
+    } else {
+      // Retrieve the HTTP status code from the cache
+      auto cached_status_code = config_->responseCache<uint16_t>().Get(headers);
+      if (cached_status_code.has_value()) {
+        if (*cached_status_code >= 200 && *cached_status_code < 300) {
+          // Any 2xx response is a success: let the request proceed
+          return Http::FilterHeadersStatus::Continue;
+        } else {
+          // Non-2xx response: reject the request
+          decoder_callbacks_->streamInfo().setResponseFlag(
+              StreamInfo::CoreResponseFlag::UnauthorizedExternalService);
+          decoder_callbacks_->sendLocalReply(
+              static_cast<Http::Code>(*cached_status_code), "Unauthorized", nullptr, absl::nullopt,
+              Filters::Common::ExtAuthz::ResponseCodeDetails::get().AuthzDenied);
+
+          return Http::FilterHeadersStatus::StopIteration;
+        }


I don't actually think that's actually the right layer at which to be doing this caching. This would mean that even in the cache-hit case, we'd still need to construct the request to the ext_authz server, and we'd still need to decode the cached response from the ext_authz server. For performance reasons, at least in gRPC, I don't think that's going to be feasible.

I think what we actually want here is to use the attributes of the data plane request (see here; I'm not sure what data structure Envoy's implementation uses for this internally) to determine the cache key, and then to have the contents of the cache entry be the already-decoded authz decision. In other words, when we see the data plane request, we do something like this:

absl::optional<AuthzDecision> cached = config_->cache().getCachedResponse(data_plane_request_properties); if (cached.has_value()) { // ...use *cached to determine how to handle data plane request... } else { // ...construct CheckRequest and send it to the ext_authz server... }

Then when receiving a response from the ext_authz server:

AuthzDecision authz_decision = decodeCheckResponse(check_response); config_->cache().cacheAuthzDecision(authz_decision); // ...use authz_decision to determine how to handle data plane request...

markdroth · 2025-01-30T20:58:27Z

source/common/resp_cache/simple_cache.proto

@@ -0,0 +1,20 @@
+syntax = "proto3";


I think api/envoy/extensions/ext_authz_cache/simple/v3 is probably the right place.

markdroth · 2025-01-30T21:00:37Z

source/common/resp_cache/resp_cache.h

@@ -0,0 +1,209 @@
+#pragma once


I don't think we know enough yet to know whether we'll be able to use the same cache structure for ext_authz and ext_proc. I think that at the xDS layer, we should have a separate API for each one. If it turns out that the underlying implementations can actually share code, they can do so easily enough.

markdroth · 2025-01-30T21:02:38Z

api/envoy/extensions/filters/http/ext_authz/v3/ext_authz.proto

+  // Cache configuration to store response from external auth server.
+  optional envoy.config.core.v3.TypedExtensionConfig response_cache_config = 30;
+  // Tell cache whether to remember the whole response from authz server (response body, header modification, meta data, status) or just status.
+  optional bool response_cache_remember_body_headers = 31;


I agree that if we have this field, it belongs in the cache config extension, not directly here in the ext_authz filter config.

That having been said, I don't think we should actually have this field at all. See my comment elsewhere about performance considerations: I don't think we want to have to construct a CheckRequest proto or decode a CheckResponse proto in the cache-hit case. Given that, I don't think we want to be caching the CheckResponse proto or its associated headers in the first place.

andrii-korotkov-verkada · 2025-02-05T17:33:33Z

Hey, Andrii from Verkada and Argo CD community here. I'm curious to see what's going on here as seems like more and more feature requests keep getting added. @markdroth, @ggreenway, for the sake of @naoitoi-v's time, is it possible to define clear goals and steps of what's needed to get the proposal and changes accepted, please? @naoitoi-v is excited about open source contributions, but this effort might just die out if he's being pushed back too much. Can I help somehow?

ggreenway · 2025-02-07T19:37:38Z

Hey, Andrii from Verkada and Argo CD community here. I'm curious to see what's going on here as seems like more and more feature requests keep getting added. @markdroth, @ggreenway, for the sake of @naoitoi-v's time, is it possible to define clear goals and steps of what's needed to get the proposal and changes accepted, please? @naoitoi-v is excited about open source contributions, but this effort might just die out if he's being pushed back too much. Can I help somehow?

We're having a separate discussion on slack about this.

The reason for most of the additional work is that the initial proposal was a solution optimized for a specific use case, not a general solution, and we are asking for additional work so that adding a general solution, or other specifically-optimized solutions, makes sense in the way envoy is configured. It is painful to change the configuration after it has been released, so it's important to get it right the first time, but that does involve some extra work.

naoitoi-v · 2025-02-14T22:25:21Z

Remaining tasks identified so far:

Move simple_cache.proto to api/envoy/extensions/ext_authz_cache/simple/v3
Move all cache related configurations to simple_cache.proto.
Make cache an extension point.
Write speed test for eviction.
Write unit tests.

github-actions · 2025-03-22T20:01:19Z

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions · 2025-03-30T00:04:03Z

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

naoitoi-v added 4 commits January 6, 2025 11:19

Added TTL Cache to ext_authz.

9b83957

Make cache size and TTL configurable.

f3d2265

Simplified cache to FIFO cache to save memory footprint.

6072e25

Read auth header names from configuration

24e0560

repokitteh-read-only bot added the api label Jan 9, 2025

repokitteh-read-only bot assigned markdroth Jan 9, 2025

naoitoi-v changed the title ~~Ext authz cache~~ ext_authz cache Jan 9, 2025

Fix style and libraries according to Envoy guideline.

bc52c1e

Signed-off-by: Nao Itoi <[email protected]>

naoitoi-v added 2 commits January 13, 2025 09:37

Implement simple eviction policy - remove the first 0.1% of objects.

3ea5167

Signed-off-by: Nao Itoi <[email protected]>

Read auth token from one header instead of trying multiple headers. T…

7a5a5d0

…his is from review feedback. Signed-off-by: Nao Itoi <[email protected]>

arkodg mentioned this pull request Jan 16, 2025

Caching support for External Authorization envoyproxy/gateway#3956

Open

naoitoi-v force-pushed the ext_authz_cache branch from 42e5e8d to 53c15c9 Compare January 17, 2025 02:42

Implemented an eviction mechanism

5f8b7e4

Signed-off-by: Nao Itoi <[email protected]>

naoitoi-v force-pushed the ext_authz_cache branch from 53c15c9 to 5f8b7e4 Compare January 17, 2025 03:00

Moved cache to common location so that it can be used by modules like…

68bdd8e

… ext_authz and ext_proc. Signed-off-by: Nao Itoi <[email protected]>

markdroth reviewed Jan 17, 2025

View reviewed changes

duckya reviewed Jan 18, 2025

View reviewed changes

Added configuration to cache response and headers. Refactor onComplet…

4d8419d

…e() so that part of it can be called by onDecodeHeaders() when cache is hit. Signed-off-by: Nao Itoi <[email protected]>

ggreenway requested changes Jan 21, 2025

View reviewed changes

If configured, remember the whole response (status, body, header modi…

b424598

…fications, metadata). Signed-off-by: Nao Itoi <[email protected]>

naoitoi-v added 2 commits January 22, 2025 16:12

Removed unnecessary test code

225e544

Signed-off-by: Nao Itoi <[email protected]>

Convert cache keys to std::string type.

207e82c

Signed-off-by: Nao Itoi <[email protected]>

naoitoi-v force-pushed the ext_authz_cache branch from 0b8b33f to 207e82c Compare January 23, 2025 19:29

Split implementation of SimpleCache from interface RespCache.

c8ad57f

Signed-off-by: Nao Itoi <[email protected]>

AhmadAbdullah668 reviewed Jan 28, 2025

View reviewed changes

source/common/resp_cache/simple_cache.h Outdated Show resolved Hide resolved

Separate SimpleCache configuration and implementation so that cache i…

34bee47

…mplementation can be replaced without modifying ext_authz. Signed-off-by: Nao Itoi <[email protected]>

ggreenway requested changes Jan 29, 2025

View reviewed changes

ggreenway reviewed Jan 30, 2025

View reviewed changes

ggreenway self-assigned this Jan 30, 2025

markdroth reviewed Jan 30, 2025

View reviewed changes

ggreenway added the waiting:any label Feb 13, 2025

repokitteh-read-only bot removed the waiting:any label Feb 14, 2025

ggreenway added the waiting label Feb 20, 2025

github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 22, 2025

github-actions bot closed this Mar 30, 2025

	optional envoy.config.core.v3.TypedExtensionConfig response_cache_config = 30;
	envoy.config.core.v3.TypedExtensionConfig response_cache_config = 30;

ext_authz cache #37953

ext_authz cache #37953

Conversation

naoitoi-v commented Jan 9, 2025

repokitteh-read-only bot commented Jan 9, 2025

repokitteh-read-only bot commented Jan 9, 2025

repokitteh-read-only bot commented Jan 9, 2025

ggreenway commented Jan 10, 2025

adisuissa commented Jan 10, 2025

ggreenway commented Jan 10, 2025

naoitoi-v commented Jan 10, 2025

naoitoi-v commented Jan 11, 2025

naoitoi-v commented Jan 13, 2025

naoitoi-v commented Jan 17, 2025

naoitoi-v commented Jan 17, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

duckya Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naoitoi-v commented Jan 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naoitoi-v commented Jan 22, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markdroth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrii-korotkov-verkada commented Feb 5, 2025

ggreenway commented Feb 7, 2025

naoitoi-v commented Feb 14, 2025

github-actions bot commented Mar 22, 2025

github-actions bot commented Mar 30, 2025

duckya Jan 21, 2025 •

edited

Loading