-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: add metric pgrst_jwt_cache_size_bytes in admin server #3802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Don't forget the feedback about the actual size in bytes #3801 (comment) Ultimately, I believe we're going to need an LRU cache for the JWT cache to be production ready. So having this metric in bytes will be useful now and later. |
1c8fef6
to
9a23b43
Compare
Calculating "actual" byte size of cache is sort of tricky (haskell laziness is a lot to deal with sometimes) and I still haven't figured it out YET. In the meantime, I have written some code to approximate the cache size in bytes. It works as follows:
Now, we can use |
1254ed5
to
11727ab
Compare
This is pretty cool, so I do see the size starting at 0 then increasing as I do requests: # $ nix-shell
# $ PGRST_ADMIN_SERVER_PORT=3001 PGRST_JWT_CACHE_MAX_LIFETIME=30000 postgrest-with-postgresql-16 -f test/spec/fixtures/load.sql postgrest-run
$ curl localhost:3001/metrics
# HELP pgrst_jwt_cache_size The number of cached JWTs
# TYPE pgrst_jwt_cache_size gauge
pgrst_jwt_cache_size 0.0
$ curl localhost:3000/authors_only -H "Authorization: Bearer $(postgrest-gen-jwt --exp 10 postgrest_test_author)"
[]
$ curl localhost:3001/metrics
..
pgrst_jwt_cache_size 72.0
$ curl localhost:3000/authors_only -H "Authorization: Bearer $(postgrest-gen-jwt --exp 10 postgrest_test_author)"
[]
$ curl localhost:3001/metrics
..
pgrst_jwt_cache_size 144.0
$ curl localhost:3000/authors_only -H "Authorization: Bearer $(postgrest-gen-jwt --exp 10 postgrest_test_author)"
[]
$ curl localhost:3001/metrics
..
pgrst_jwt_cache_size 216.0 Of course this doesn't drop down after a while because we need #3801 for that. One issue that I've noticed is that we're printing empty log lines for each request:
This is due to the addition of the new Observation postgrest/src/PostgREST/Observation.hs Line 60 in 9a23b43
And the following line here: postgrest/src/PostgREST/Observation.hs Line 146 in 9a23b43
This is surprising behavior, I'll try to refactor this on a new PR. For now, how about printing a message like:
This should only happen for a log-level greater than debug, check how this is done on: postgrest/src/PostgREST/Logger.hs Lines 79 to 96 in 9a23b43
|
The above is understandable. What we really need is a good enough approximation so we have an order-of-magnitude understanding to see if the cache size is in KB, MB or GB. So far this seems enough. We should definitely document why we do an approximation though. |
I've just noticed that perf badly drops down on this PR:
https://github.com/PostgREST/postgrest/pull/3802/checks?check_run_id=34517395528 The recursiveSize function says:
@taimoorzaeem What could we do to avoid this drop? Maybe calculate the cache size periodically on a background thread? Any thoughts? |
11727ab
to
145c236
Compare
Now that I have gotten better at haskell 🚀, I solved the issue and we don't need an "approximation". We CAN calculate full cache size in bytes. |
4b1afd1
to
d9de43a
Compare
Load test and memory test failing with:
Is there any additional configuration that needs to be added for profiling? |
nix/overlays/haskell-packages.nix
Outdated
# nixpkgs have ghc-datasize-0.2.7 marked as broken | ||
ghc-datasize = lib.markUnbroken prev.ghc-datasize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# nixpkgs have ghc-datasize-0.2.7 marked as broken | |
ghc-datasize = lib.markUnbroken prev.ghc-datasize; | |
# TODO: Remove this once https://github.com/NixOS/nixpkgs/pull/375121 | |
# has made it to us. | |
ghc-datasize = lib.markUnbroken prev.ghc-datasize; |
Does it happen locally, too? I am confused, can't exactly spot what's wrong right now. |
Yes, it does happen locally.
|
7ff02db
to
6eb7d3c
Compare
Need some way to run |
Yeah, running the loadtest in CI is not working when dependencies are changed, because it needs to run against the base branch, which doesn't have the dependencies, yet... and then cabal just breaks it somehow. You can run something like this locally to get the same markdown output:
(but it's likely that it fails the same way... :D) |
The thing I don't understand about this error message is, that it appears in contexts where we don't use profiling libs. We do for the memory test, so the error message makes "kind of sense" (I still don't understand why it happens, though). But for loadtest and the regular build on MacOS... those don't use profiling. Hm... actually - we don't do a regular dynamically linked linux build via nix, I think. So in fact it fails for every nix build except the static build. Still don't know what's happening, though. |
I get a similar error when trying this locally too.
To unblock the PR, how about only adding the dependency in another PR and then merging it? Then I assume this PR would run the loadtest? |
No I don't think so. It seems the dependency issue was fixed a while ago in 0c5d2e5. Also the fact that all nix builds fail, the memory test, on darwin, etc. - indicates that there is something else going on. |
ada4cf3
to
6122044
Compare
6122044
to
7763527
Compare
@steve-chavez I think the PR is ready. Could you review/bench this? |
7763527
to
feb9c9d
Compare
The above remarks from #3802 (comment) are no longer true. They look fixed. There's still a dropdown in perf when lots of new JWTs arrive: # $ export PGRSTBENCH_EC2_CLIENT_INSTANCE_TYPE="m5a.4xlarge"
# $ postgrest-bench-deploy
# using v12.2.8
$ postgrest-bench-k6 10 k6/GETSingleDifferentJWT.js
Running k6 with 10 vus
Pseudo-terminal will not be allocated because stdin is not a terminal.
Pseudo-terminal will not be allocated because stdin is not a terminal.
data_received..................: 2.2 MB 73 kB/s
data_sent......................: 3.5 MB 116 kB/s
http_req_blocked...............: avg=3.98µs min=1.14µs med=2.4µs max=1.34ms p(90)=2.95µs p(95)=3.25µs
http_req_connecting............: avg=802ns min=0s med=0s max=908.62µs p(90)=0s p(95)=0s
✓ http_req_duration..............: avg=43.05ms min=2.94ms med=32.66ms max=379.5ms p(90)=84.76ms p(95)=115.8ms
{ expected_response:true }...: avg=43.05ms min=2.94ms med=32.66ms max=379.5ms p(90)=84.76ms p(95)=115.8ms
✓ http_req_failed................: 0.00% ✓ 0 ✗ 6950
http_req_receiving.............: avg=45.35µs min=22.56µs med=44.76µs max=532.59µs p(90)=52.06µs p(95)=63.82µs
http_req_sending...............: avg=19.47µs min=7.81µs med=13.14µs max=757.21µs p(90)=28.87µs p(95)=30.25µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=42.99ms min=2.83ms med=32.6ms max=379.42ms p(90)=84.71ms p(95)=115.71ms
http_reqs......................: 6950 231.201961/s
iteration_duration.............: avg=43.2ms min=3.13ms med=32.82ms max=379.62ms p(90)=84.89ms p(95)=115.93ms
iterations.....................: 6950 231.201961/s
vus............................: 10 min=10 max=10
vus_max........................: 10 min=10 max=10
# Using this PR
$ postgrest-bench-k6 10 k6/GETSingleDifferentJWT.js
Running k6 with 10 vus
Pseudo-terminal will not be allocated because stdin is not a terminal.
Pseudo-terminal will not be allocated because stdin is not a terminal.
data_received..................: 392 kB 13 kB/s
data_sent......................: 625 kB 21 kB/s
http_req_blocked...............: avg=10.13µs min=1.81µs med=2.36µs max=1.09ms p(90)=2.87µs p(95)=3.23µs
http_req_connecting............: avg=4.29µs min=0s med=0s max=610.92µs p(90)=0s p(95)=0s
✓ http_req_duration..............: avg=239.85ms min=33.81ms med=226.4ms max=685.51ms p(90)=374.79ms p(95)=423.53ms
{ expected_response:true }...: avg=239.85ms min=33.81ms med=226.4ms max=685.51ms p(90)=374.79ms p(95)=423.53ms
✓ http_req_failed................: 0.00% ✓ 0 ✗ 1252
http_req_receiving.............: avg=48.18µs min=24.22µs med=46.93µs max=155.42µs p(90)=54.81µs p(95)=72.56µs
http_req_sending...............: avg=19.26µs min=8.57µs med=12.78µs max=193.15µs p(90)=29.16µs p(95)=30.92µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=239.79ms min=33.75ms med=226.34ms max=685.42ms p(90)=374.73ms p(95)=423.45ms
http_reqs......................: 1252 41.592355/s
iteration_duration.............: avg=240.02ms min=33.96ms med=226.53ms max=685.66ms p(90)=374.95ms p(95)=423.66ms
iterations.....................: 1252 41.592355/s
vus............................: 10 min=10 max=10
vus_max........................: 10 m So from ~231 req/s to 41 req/s, ~82% loss. @taimoorzaeem Any idea why this happens? |
@steve-chavez This happened because of a lot of new JWTs, which means the expensive operation of cache entry size calculation happens of every request (essentially several system calls happen in the backgroud) with every request which makes it slow. If we save all the JWTs in the first benchmark run and then re-use the saved JWTs in the second benchmark run, I think the performance would be good (given appropriate expiration time in the test script which is 30s for now). |
I thought the debounce logic we now had would prevent this though? Shouldn't the expensive calculation only happen twice every second max regardless of the amount of requests? Edit: right, so maybe we need to increase the debounce frequency. |
-- debounceFreq is set to default 1 second | ||
{ debounceAction = calculateSizeThenLog | ||
, debounceEdge = leadingEdge -- logs at the start and the end | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sets it to 1 second. Maybe we need to increase it for better perf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taimoorzaeem How about adjusting debounceFreq
and see how it affect performance? Maybe try 2, 5 and 10 seconds.
Also you were able to run the benchmark locally before #3802 (comment), could you try here again? (it's going to take me longer to deploy AWS machines with postgrest-benchmark, we no longer really need to do this).
The debounce logic would have prevented it if we were calculating the size of cache in one go i.e. one atomic operation, but we are not. As discussed in this #3802 (comment), we need to calculate the size at the time of storing the entry in cache (it has its own tradeoffs), which is why debouncing isn't helping much (except for only logging the size once). |
Ah, right. That is bad news. There's no way around it.
At this point it's not looking like this perf loss is worth it for new requests, and doing a syscall every second is definitely looking bad. It seems that GHC/Haskell is just not good at this 😿 Maybe we should just concede to an LRU cache where we can only limit the number of entries, but not the memory consumption. Notes:
|
Agreed.
Yes, this seems like our best bet for now. |
@steve-chavez Where should we go from here? If this metric is very useful, maybe we can run this calculation on the admin server endpoint |
@steve-chavez @taimoorzaeem @wolfgangwalther Maybe I am missing something but it should be OK to estimate the size of the cache entry as
Where q > 2 There is no decompression step when converting from raw token to its decoded form. |
IIUC, the goal is to calculate the size of the haskell structures. The token is decoded, the payload is extracted and then it ends up as this: data AuthResult = AuthResult
{ authClaims :: KM.KeyMap JSON.Value
, authRole :: BS.ByteString
} Depending on whether we are dealing, for example, with many small claims or few bigger claims, I assume the actual memory this structure needs would be different. Edit: Whether the difference between extrapolating from the raw token length and the exact measurement would be big enough to justify this exact calculation, I don't know. |
One thing about the JWT cache is not clear to me: We are storing the final This would require much less memory, I guess. But we'd very likely lose performance. |
The size of the cache entry should be O(length raw_token) so it should be fine to just multiply it by a constant I think (we could measure the constant to make it more exact but my guess is that 3 is a safe bet) |
I remember I reported the issue describing the need for the cache because of asymmetric cryptography is CPU intensive and slow: I would say caching just validation results would be fine. On the other hand I am not sure if JOSE provides API that would allow you to decode token without validating it. |
That's not too hard to do manually. Split the JWT by |
@taimoorzaeem I think we should not continue pursuing this PR and instead go right away to the LRU cache solution, limited by entries. It's clear from the research here that Haskell is not suited for memory usage calculation. It's complicated to implement, has heavy perf implications, and overall I see too much that can go wrong. |
Closing this PR as we've decided to not pursue it anymore. |
Add metric
pgrst_jwt_cache_size_bytes
in admin server which shows the cache size in bytes.