-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ext_authz cache #37953
ext_authz cache #37953
Conversation
Hi @naoitoi-v, welcome and thank you for your contribution. We will try to review your Pull Request as quickly as possible. In the meantime, please take a look at the contribution guidelines if you have not done so already. |
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
Some early feedback:
|
A drive-by suggestion: it may be beneficial to generalize the caching for both ext_authz and ext_proc. |
Related to what Adi said, I think you need to cache the response message and replay it on each request. It may add or remove headers, etc. Just looking at the status code is insufficient. |
Thank you for your early and insightful feedback, @ggreenway, @adisuissa.
Great point! In our (Verkada's) use case, it's important to keep the memory footprint as small as possible. We expect few evictions. We'll change this implementation to random eviction. In the future, there may be users who want more sophisficated eviction policies (e.g., LRU, random sampling-based LRU). We'll restructure the code so that one can provide different cache implementation.
Good point. For now we'll make it to look at only one header. Our use case may require it to look at multiple headers for leagacy reasons. If we have to go that route, we'll use header + auth token as a cache key, so we wouldn't mix auth tokens from multiple headers.
Yes, we started this effort. Fixing things pointed out by //tools/code_format:check_format and //tools/code:check .
This makes sense. We'll try it.
This makes sense. In our use case, we'd only need the status code. It's important to minimize the memory footprint. We'll write the code to store and replay the response messages and all the headers. We'll make this configurable so that the user can decide to store response & headers, or not. ** Thanks for the great discussion, again. We'll keep you updated on the progress. Nao |
Signed-off-by: Nao Itoi <[email protected]>
I took care of (3) coding style. |
Signed-off-by: Nao Itoi <[email protected]>
…his is from review feedback. Signed-off-by: Nao Itoi <[email protected]>
I took care of (2) Looking for auth tokens in multiple headers. |
42e5e8d
to
53c15c9
Compare
I added an eviction mechanism to address (1). It emulates LRU by:
With these default, it was able to complete eviction in around 1ms on my testing (running on Docker - 12 CPU and 24GB memory - on Mac Book Pro). |
Signed-off-by: Nao Itoi <[email protected]>
53c15c9
to
5f8b7e4
Compare
… ext_authz and ext_proc. Signed-off-by: Nao Itoi <[email protected]>
Moved the cache code to a common location to address (4). |
uint32 response_cache_ttl = 31; | ||
|
||
// Array of header names. ext_authz will use the first header that appears in the HTTP request as the cache key. | ||
string response_cache_header_name = 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to imply that in order for the ext_authz caching to function, the data plane request needs to have a particular header. This seems sub-optimal, because it basically requires the sending application to have all of the logic needed to set this header.
In practice, I think the cache key will wind up needing to be a combination of various attributes of the data plane request, but the exact set of those attributes is going to depend primarily on the ext_authz filter config, which determines which attributes of the data plane request to include in the ext_authz request. (It may also depend on the ext_authz server itself, which might only use a subset of the fields in the ext_authz request, but that will probably be reflected in the ext_authz filter config in most cases.) So this woud basically mean that the application would need logic to determine which attributes of the data plane request to take into account as part of the cache key, thus duplicating the logic in the ext_authz filter config -- which would mean that the ext_authz filter config can't be changed later without also changing the application logic.
I think it would make more sense to have some matching rules here that dictate the cache key as a function of the attributes of the data plane request. That way, all of the logic is self-contained in the ext_authz filter config and can be changed as needed, without breaking the application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @markdroth
I'd like to make sure I understand you correctly.
I believe the problem you are raising is that, currently, the cache is configured by specifying one header.
I have looked into Matcher: https://www.envoyproxy.io/docs/envoy/latest/xds/type/matcher/v3/matcher.proto#envoy-v3-api-msg-xds-type-matcher-v3-matcher
Matcher can express the choice of cache key very richly. For example, we can change the proto file like this:
message ExtAuthz {
// Other fields...
// Configuration for the response cache matcher
envoy.type.matcher.v3.Matcher response_cache_matcher = 32;
}
Then, we can give a configuration like this:
resp_cache:
enabled: true
response_cache_matcher:
matcher_tree:
input:
name: request-headers
typed_config:
"@type": type.googleapis.com/envoy.type.matcher.v3.HttpRequestHeaderMatchInput
header_name: "auth-token"
exact_match_map:
map:
".*":
action:
name: "auth-token"
This would let ext_authz to use a header named "auth-token" as the cache key.
In addition, configuration can be changed to use combinations of a lot more parameters - path, query parameters, metadata, etc.
Would this address your concern?
Thank you,
Nao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Mark, this is much flexible design which allows us to build the cache key with more than just header field. E.g. We can define a cache policy like below
message RequestCachingPolicy {
// Specifies how the cache key should be built.
CacheKeyBuilder key_builder = 1;
// Specifies default expiration policy for the cache entries.
// It can be overridden by individual policy service by setting the
// TTLs field in ext_authz response.
CacheExpirationPolicy expiration_policy = 2;
}
message CacheKeyBuilder {
// A list of CEL expressions used to build a composite cache key.
// An empty list indicates no caching.
// The CEL expression will be evaluated against the envoy attributes
repeated string cel_expressions = 1
[(datapol.semantic_type) = ST_NOT_REQUIRED];
}
message CacheExpirationPolicy {
// Specifies the cache staleness TTL for all cache entries unless it is
// overwritten by the CacheExpirationPolicy in the ext_authz response.
google.protobuf.Duration max_age = 1;
}
An sample policy will be like:
caching_policy_matchers:
matcher_list:
matchers:
// Requests with header["x-goog-policy-set-name"]='something' will
// have cache key on "x-goog-proxy-auth-bin" and "x-goog-request-path"
// headers.
- predicate:
single_predicate:
input:
typed_config:
'@type':
type.googleapis.com/envoy.type.matcher.v3.HttpRequestHeaderMatchInput
header_name: x-goog-policy-set-name
value_match:
exact: something
on_match:
action:
typed_config:
'@type': type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.CachingPolicy
key_builder:- cel_expression:request.headers["x-goog-request-path"]
- cel_expression:request.headers["x-goog-proxy-auth-bin"]
expiration_policy:
max_age:
seconds: 300
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have spent a couple of days trying to do this with CEL (Common Expression Language). It turns out to be significant amount of work. It also made us re-think the request.
We agree that there may be different requirements for cache key creation. For example, we can imagine:
- If header A exists, use that as cache key. If not, try header B.
- Use regex to extract some part of header value.
- Combine some parts of request, e.g., URI, path, query parameters.
WASM filters (e.g. Lua filter) are capable of this type of modification.
Our recommendation is to keep ext_authz's cache key simple in this PR: configure one header. For use cases that require custom cache key, we'd recommend using Lua filter or other WASM filters to construct the cache key and set it to a header, which ext_authz would read from.
If this approach becomes unacceptable in some cases (e.g., for performance reasons), we can consider implementing optimization to solve that problem. Keep it simple, and revisit it when we know more about the problem.
This PR is getting bigger and I still have some ways to go, so it would be really helpful if I can limit the scope and focus on getting it done well.
Please let us know if this would work.
Thank you,
Nao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@naoitoi-v it's a hassle, both to implement and for users, to change configuration after it is released. I understand that you don't want to make this PR too big, but it is important that we think through what the config will look like, both immediately and for anticipated future changes, and do something sensible.
The more we discuss this and I understand your very-specific requirements, I think the entire thing should be an extension point. I think the interface in the ext_authz filter should be kept very simple: a function call that passes in the entire request, which returns either a message CheckResponse
(which can be constructed from some cached data; it doesn't need to be a copy of the response from the original ext_authz service), or a response indicating "this wasn't in the cache; do the normal side-call". And a function call to add items to cache after a successful side-call.
Then the only configuration in the ext_authz filter will be an optional core.v3.TypedExtensionConfig cache_config
.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are thinking on similar tracks. I have just pushed a commit to split the cache interface (RespCache) from this particular implementation (SimpleCache).
I believe what you are suggesting re: core.v3.TypedExtensionConfig cache_config
is -
cache_config
specifies which type of cache implementation user wants to use (e.g. SimpleCache), and provides all the configurations necessary for that implementation. We just need to add one message to ext_authz.proto.
Is this correct?
I'll work on this based on my understanding. Please LMK if this is not consistent with your understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a commit to take care of this.
As discussed, ext_authz.proto has this:
optional envoy.config.core.v3.TypedExtensionConfig response_cache_config = 30;
And it's up to the cache implementation to interpret the config. I've created simple_cache.proto for this.
I left only one other config, response_cache_remember_body_headers, in ext_authz. I considered moving this to simple_cache.proto as well. However, doing so would require SimpleCache to be aware of what's stored in cache (e.g., Filters::Common::ExtAuthz::Response, which is an ext_authz specific data structure), which makes cache less generic. I kept what's stored in cache a template type, that way cache implementation doesn't have to know the type of stored data.
// Configuration parameters for ext_authz response cache | ||
uint32 response_cache_max_size = 30; | ||
|
||
uint32 response_cache_ttl = 31; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need more than 1 TTL, at minimum, 1 TTL when the ext_authz server is healthy and 1TTL when the server is not available. The https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control has more TTLs that controls the cache behavior.
I would recommend grouping TTLs into a separate message for extensibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @duckya
I believe the motivation for your suggestion of multiple TTLs is - If the external authz server is unavailable, we want to rely on the cached decision longer. This would enhance the system's availability. Is this correct?
To do this, having multiple TTLs wouldn't be enough. We'd have to implement a new feature, that is, "if ext_authz detects the authz service to be down, extend the TTL of the current cached objects."
This makes sense, but, as we still have some ways to go in this PR (cache headers & response body, use matcher, create a new proto message, write tests, rebase, etc.), we'd like to consider this as a feature request in the future, instead of including it in this PR. Would this work?
Thanks,
Nao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not asking implementing the multiple TTLs for now and I understand we could put it as a new feature request. But I think we should design the configuration in a extensible way to accommodate future improvement. We could have a proto message called CacheExpirationPolicy, and moving response_cache_ttl field there, we could even rename response_cache_ttl to max_age to match the generic cache-control semantic.
@@ -310,6 +310,21 @@ message ExtAuthz { | |||
// Field ``latency_us`` is exposed for CEL and logging when using gRPC or HTTP service. | |||
// Fields ``bytesSent`` and ``bytesReceived`` are exposed for CEL and logging only when using gRPC service. | |||
bool emit_filter_state_stats = 29; | |||
|
|||
// Configuration parameters for ext_authz response cache | |||
uint32 response_cache_max_size = 30; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it probably makes more sense to group cache related settings into one message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @duckya
I agree. We can create a new proto file resp_cache.proto and move all these cache related settings in there, as a message "RespCacheSettings".
Then import resp_cache.proto from ext_authz.proto.
This would open a possibility for another filter, e.g., ext_proc, to use the response cache.
Does this address your concern?
Thanks,
Nao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's exactly what want. Maybe we don't specify the "Resp" in the name, just go with CacheSettings.
…e() so that part of it can be called by onDecodeHeaders() when cache is hit. Signed-off-by: Nao Itoi <[email protected]>
We'd like to summarize where we are. (Especially for @markdroth , who is the assigned reviewer.) Thank you all for the great feedback. These are the remaining tasks for this PR:
We'd like to complete and merge this PR ASAP, so that we can start running this in our environments to get more data points (that may help us improve it further). We'd like to consider further enhancements as feature requirements post-PR. Please let us know if this makes sense. Thank you, Nao |
// than a fraction (by default, 50%) of the average. Eviction takes two configuration parameters: | ||
// - Eviction candidate ratio: The fraction of the cache to read for potential eviction | ||
// - Eviction threshold ratio: Evict items with TTL lower than X% of the average TTL | ||
void Evict() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This eviction function is problematic. You're always starting at begin()
and iterating forward, so the removed items are going to be biased towards certain hash values.
Also, since you're doing a bunch of them at once, while holding a mutex, you'll block all envoy workers during the eviction, which isn't acceptable.
If you really don't want to do a traditional LRU, you'll need to come up with something better than this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the code starting on line 117, "//Step 1: Advance pointer randomly". It advances the iterator randomly to avoid the bias.
I understand your perspective on this eviction policy.
We believe this eviction policy will work well for our use case. Let me walk through the thought process:
- We want to minimize the memory footprint of the cache because we run Envoy on multiple K8 pods.
- Our cache key space is predictable. We plan to make cache large enough to fit all cache keys. Therefore, eviction should be rare.
- We are storing each item's TTL, so we can use this information.
We measured this eviction mechanism. It seems fast enough for our use case (takes ~1ms to delete 2500 items out of 1 million cached items). It is configurable, so we can lower the number to delete if we need eviction to take shorter amount of time.
I suspect the best eviction policy would depend on the use case.
We will make eviction policy configurable. Hopefully, different policies will be implemented, and users will be able to pick the right one for their use case.
Do you have a particular policy in mind? If we have a policy that can achieve better results without increasing memory footprint, we'd be very interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We believe this eviction policy will work well for our use case.
I'm trying to make sure that we end up with something that will work for the general case, not just your specific use case.
If you make the policy (and thus the datastructure) configurable, I think that will be fine. You could add an extension point ext_authz caching, and then it would be easy to create any policy you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the code starting on line 117, "//Step 1: Advance pointer randomly". It advances the iterator randomly to avoid the bias.
I missed that. That's a very expensive operation on a large cache. It will likely blow out all levels of the CPU cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have measured eviction.
Environment: Linux Docker container running on my Mac (Apple M3 Pro). I took average of 3 tries of CPU time. I fixed advance_steps to 500,000 to avoid randomness in measurement.
Cache max size is 1,000,000. eviction_candidate_ratio is 1%, and eviction_threshold_ratio is 50%. It would result in 1M * 0.01 * 0.25 = ~2500 items getting evicted.
- Advancing iterator: 593 microseconds
- Eviction (including (1)): 6774 microseconds
Advancing iterator is a small part of whole Evict(). On average, eviction overall takes ~7ms.
If the user needs shorter eviction time, they can decrease eviction_candidate_ratio / increase eviction_threshold_ratio.
As we discussed earlier, we will make cache replaceable. We suggest that we move forward with this algorithm in this PR. We can adjust as we receive performance data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to build a speed_test (see other envoy test examples) for this benchmark so it's easy for others to run, run on different hardware, and run with different cache config parameters
|
||
bool Insert(const char* key, uint16_t value, int ttl_seconds = -1) { | ||
Thread::LockGuard lock{mutex_}; | ||
const char* c_key = strdup(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the hashmap key be std::string
and get rid of all the c-style string handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we had std::string, it increased memory footprint. We tried c-style string, and it helped reduce the memory footprint. We could revisit this (avoid c-style string while making sure memory footprint doesn't increase), but would prefer to do it after the initial PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I won't accept this PR with this style of string/memory management. At a minimum, it all needs to be wrapped in RAII-style objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, will work on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took care of this (convert usages of char* to std::string) in the latest commit.
@@ -0,0 +1,209 @@ | |||
#pragma once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only has a single use right now, in ext_authz. Move this file into that extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had the cache code inside ext_authz initially.
We moved it to the common location in response to this comment by @adisuissa :
A drive-by suggestion: it may be beneficial to generalize the caching for both ext_authz and ext_proc.
I suggest creating the cache configuration (an implementation) separate, and use it only for ext_authz in the first step.
I think this feedback makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we know enough yet to know whether we'll be able to use the same cache structure for ext_authz and ext_proc. I think that at the xDS layer, we should have a separate API for each one. If it turns out that the underlying implementations can actually share code, they can do so easily enough.
…fications, metadata). Signed-off-by: Nao Itoi <[email protected]>
Pushed a commit to take care of "Cache response body and headers from ext_authz, and apply them on cache hit." |
Signed-off-by: Nao Itoi <[email protected]>
Signed-off-by: Nao Itoi <[email protected]>
0b8b33f
to
207e82c
Compare
Signed-off-by: Nao Itoi <[email protected]>
…mplementation can be replaced without modifying ext_authz. Signed-off-by: Nao Itoi <[email protected]>
|
||
bool response_cache_remember_body_headers = 35; | ||
// Cache configuration to store response from external auth server. | ||
optional envoy.config.core.v3.TypedExtensionConfig response_cache_config = 30; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional envoy.config.core.v3.TypedExtensionConfig response_cache_config = 30; | |
envoy.config.core.v3.TypedExtensionConfig response_cache_config = 30; |
optional
is the default in proto3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thank you. I'll fix it in the next commit.
// Cache configuration to store response from external auth server. | ||
optional envoy.config.core.v3.TypedExtensionConfig response_cache_config = 30; | ||
// Tell cache whether to remember the whole response from authz server (response body, header modification, meta data, status) or just status. | ||
optional bool response_cache_remember_body_headers = 31; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in the extension's configuration; it's likely that some cache types wouldn't support this option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had put in a lot of thoughts into this choice. Let me explain the pros & cons.
-
Option 1: "remember_body_headers" config is in SimpleCache:
Pro: All cache related configurations are in one place (inside response_cache_config).
Con: This would make SimpleCache specific to ext_authz. The full response is defined by Filters::Common::ExtAuthz::Response, which is ext_authz data structure. If SimpleCache uses this, then it can only used for ext_authz. Earlier, we received a feedback to try to make cache independent of the use case, so that it can be used from another use case (e.g. ext_proc). -
Option 2: "remember_body_headers" config is in ExtAuthz:
Pro: Keeps RespCache and SimpleCache agnostic to the type of stored data. We think this is generally a good design - makes cache more flexible.
Con: ExtAuthz has two cache related configurations, instead of one.
The way we think is the choice of caching the full response, or just status code, is a functionality of ExtAuthz. ExtAuthz does different things depending on this configuration.
I do agree with an earlier feedback that suggested to decouple the cache from ExtAuthz, so that it can be used in other contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need another layer that is the implementation behind the typed_config. It will have this boolean setting. It will instantiate a cache (however that is done) and return a pointer to an interface that is implementation-agnostic.
This setting definitely belongs in the typed_config, not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that if we have this field, it belongs in the cache config extension, not directly here in the ext_authz filter config.
That having been said, I don't think we should actually have this field at all. See my comment elsewhere about performance considerations: I don't think we want to have to construct a CheckRequest proto or decode a CheckResponse proto in the cache-hit case. Given that, I don't think we want to be caching the CheckResponse proto or its associated headers in the first place.
@@ -0,0 +1,20 @@ | |||
syntax = "proto3"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file needs to be under api/
. This is a new type of extension. It could either go in api/envoy/extensions/ext_authz_cache/simple
, or maybe it belongs under api/envoy/extensions/filters/http/ext_authz/cache/simple
. @envoyproxy/api-shepherds can you provide guidance on the correct location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think api/envoy/extensions/ext_authz_cache/simple/v3
is probably the right place.
const std::string auth_header_str(auth_header[0]->value().getStringView()); | ||
|
||
if (config_->responseCacheConfig().has_value()) { | ||
#ifdef CACHE_PERF_TEST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type of code should be moved to somewhere under /test
. Search around for *_speed_test.cc
to see many examples of speed-tests and how they're setup, parameterized, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
|
||
if (response_cache_remember_body_headers_) { | ||
// Initialize the cache that stores full response (body, header modification, metadata, status). | ||
auto cache_result = Envoy::Common::RespCache::createCache<Filters::Common::ExtAuthz::Response>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With an extension config, you'll declare a factory interface, and then call something like
Config::Utility::getAndCheckFactory<ExtAuthzFactory>(
config)
And each extension will register it's factory with that type.
if (config_->responseCacheRememberBodyHeaders()) { | ||
auto cached_response = | ||
config_->responseCache<Filters::Common::ExtAuthz::Response>().Get(headers); | ||
if (cached_response.has_value()) { | ||
onCompleteSub(std::make_unique<Filters::Common::ExtAuthz::Response>( | ||
std::move(cached_response.value()))); | ||
return Http::FilterHeadersStatus::StopIteration; | ||
} | ||
} else { | ||
// Retrieve the HTTP status code from the cache | ||
auto cached_status_code = config_->responseCache<uint16_t>().Get(headers); | ||
if (cached_status_code.has_value()) { | ||
if (*cached_status_code >= 200 && *cached_status_code < 300) { | ||
// Any 2xx response is a success: let the request proceed | ||
return Http::FilterHeadersStatus::Continue; | ||
} else { | ||
// Non-2xx response: reject the request | ||
decoder_callbacks_->streamInfo().setResponseFlag( | ||
StreamInfo::CoreResponseFlag::UnauthorizedExternalService); | ||
decoder_callbacks_->sendLocalReply( | ||
static_cast<Http::Code>(*cached_status_code), "Unauthorized", nullptr, absl::nullopt, | ||
Filters::Common::ExtAuthz::ResponseCodeDetails::get().AuthzDenied); | ||
|
||
return Http::FilterHeadersStatus::StopIteration; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this logic belongs in the cache extension implementation.
The changes here (in the core ext_authz filter) should be the following:
Right before ext_authz gRPC call is made, pass the generated CheckRequest
to the cache via absl::optional<CheckResponse> cached = config_->responseCacheConfig().getCachedResponse(check_request);
If this has a value, onComplete(cached.value()); return;
When receiving a response from the ext_authz server, call config_->responseCacheConfig().cacheResponse(check_response);
All work to extract various headers, response codes, etc, should be done in your cache extension. The code here should be extremely minimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually think that's actually the right layer at which to be doing this caching. This would mean that even in the cache-hit case, we'd still need to construct the request to the ext_authz server, and we'd still need to decode the cached response from the ext_authz server. For performance reasons, at least in gRPC, I don't think that's going to be feasible.
I think what we actually want here is to use the attributes of the data plane request (see here; I'm not sure what data structure Envoy's implementation uses for this internally) to determine the cache key, and then to have the contents of the cache entry be the already-decoded authz decision. In other words, when we see the data plane request, we do something like this:
absl::optional<AuthzDecision> cached = config_->cache().getCachedResponse(data_plane_request_properties);
if (cached.has_value()) {
// ...use *cached to determine how to handle data plane request...
} else {
// ...construct CheckRequest and send it to the ext_authz server...
}
Then when receiving a response from the ext_authz server:
AuthzDecision authz_decision = decodeCheckResponse(check_response);
config_->cache().cacheAuthzDecision(authz_decision);
// ...use authz_decision to determine how to handle data plane request...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, I am the assigned reviewer for the xDS API, not for the Envoy implementation. It looks like @ggreenway is the reviewer for the Envoy implementation. The reason for this separation is that the xDS API is not just for Envoy, it is also used by other data planes, such as gRPC, so we need to consider more than just what works for Envoy before we approve API changes.
Please let me know if you have any questions about that.
if (config_->responseCacheRememberBodyHeaders()) { | ||
auto cached_response = | ||
config_->responseCache<Filters::Common::ExtAuthz::Response>().Get(headers); | ||
if (cached_response.has_value()) { | ||
onCompleteSub(std::make_unique<Filters::Common::ExtAuthz::Response>( | ||
std::move(cached_response.value()))); | ||
return Http::FilterHeadersStatus::StopIteration; | ||
} | ||
} else { | ||
// Retrieve the HTTP status code from the cache | ||
auto cached_status_code = config_->responseCache<uint16_t>().Get(headers); | ||
if (cached_status_code.has_value()) { | ||
if (*cached_status_code >= 200 && *cached_status_code < 300) { | ||
// Any 2xx response is a success: let the request proceed | ||
return Http::FilterHeadersStatus::Continue; | ||
} else { | ||
// Non-2xx response: reject the request | ||
decoder_callbacks_->streamInfo().setResponseFlag( | ||
StreamInfo::CoreResponseFlag::UnauthorizedExternalService); | ||
decoder_callbacks_->sendLocalReply( | ||
static_cast<Http::Code>(*cached_status_code), "Unauthorized", nullptr, absl::nullopt, | ||
Filters::Common::ExtAuthz::ResponseCodeDetails::get().AuthzDenied); | ||
|
||
return Http::FilterHeadersStatus::StopIteration; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually think that's actually the right layer at which to be doing this caching. This would mean that even in the cache-hit case, we'd still need to construct the request to the ext_authz server, and we'd still need to decode the cached response from the ext_authz server. For performance reasons, at least in gRPC, I don't think that's going to be feasible.
I think what we actually want here is to use the attributes of the data plane request (see here; I'm not sure what data structure Envoy's implementation uses for this internally) to determine the cache key, and then to have the contents of the cache entry be the already-decoded authz decision. In other words, when we see the data plane request, we do something like this:
absl::optional<AuthzDecision> cached = config_->cache().getCachedResponse(data_plane_request_properties);
if (cached.has_value()) {
// ...use *cached to determine how to handle data plane request...
} else {
// ...construct CheckRequest and send it to the ext_authz server...
}
Then when receiving a response from the ext_authz server:
AuthzDecision authz_decision = decodeCheckResponse(check_response);
config_->cache().cacheAuthzDecision(authz_decision);
// ...use authz_decision to determine how to handle data plane request...
@@ -0,0 +1,20 @@ | |||
syntax = "proto3"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think api/envoy/extensions/ext_authz_cache/simple/v3
is probably the right place.
@@ -0,0 +1,209 @@ | |||
#pragma once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we know enough yet to know whether we'll be able to use the same cache structure for ext_authz and ext_proc. I think that at the xDS layer, we should have a separate API for each one. If it turns out that the underlying implementations can actually share code, they can do so easily enough.
// Cache configuration to store response from external auth server. | ||
optional envoy.config.core.v3.TypedExtensionConfig response_cache_config = 30; | ||
// Tell cache whether to remember the whole response from authz server (response body, header modification, meta data, status) or just status. | ||
optional bool response_cache_remember_body_headers = 31; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that if we have this field, it belongs in the cache config extension, not directly here in the ext_authz filter config.
That having been said, I don't think we should actually have this field at all. See my comment elsewhere about performance considerations: I don't think we want to have to construct a CheckRequest proto or decode a CheckResponse proto in the cache-hit case. Given that, I don't think we want to be caching the CheckResponse proto or its associated headers in the first place.
Hey, Andrii from Verkada and Argo CD community here. I'm curious to see what's going on here as seems like more and more feature requests keep getting added. @markdroth, @ggreenway, for the sake of @naoitoi-v's time, is it possible to define clear goals and steps of what's needed to get the proposal and changes accepted, please? @naoitoi-v is excited about open source contributions, but this effort might just die out if he's being pushed back too much. Can I help somehow? |
We're having a separate discussion on slack about this. The reason for most of the additional work is that the initial proposal was a solution optimized for a specific use case, not a general solution, and we are asking for additional work so that adding a general solution, or other specifically-optimized solutions, makes sense in the way envoy is configured. It is painful to change the configuration after it has been released, so it's important to get it right the first time, but that does involve some extra work. |
Remaining tasks identified so far:
|
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Commit Message: This is a draft PR to discuss the design of the response cache in the ext_authz filter.
We are seeking feedback on the design to make sure this is headed to the right direction.
Similar requests were discussed in #37449 and #3023
Additional Description:
Purpose:
We would like to cache the response from an external authorization server in the ext_authz filter for scalability. Depending on the use case, if you do not have to check the authentication token on every request, you can reduce the number of calls to the server by caching the response from it.
Functionality:
To use this functionality, you need to configure the name(s) of the header(s) in which the auth token is carried, e.g., "Authorization" or "x-auth-token." You can also configure the max cache size and TTL. When this is configured, ext_authz will cache the response from the external authorization server.
Implementation:
We implemented a simple cache. Responses are stored in a simple unordered_map. We implemented a simple FIFO eviction policy over the LRU policy to make the cache size small. In our benchmark, for each cache key (35 bytes), we need ~100 bytes of memory.
We store a response in the cache in the onComplete() method.
We read a cached response in the decoderHeaders() method.
Next Steps:
Risk Level: Low. This functionality would be turned off by default.
Testing: TBD
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]