Skip to content

feat: add metric pgrst_jwt_cache_size_bytes in admin server #3802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

taimoorzaeem
Copy link
Collaborator

@taimoorzaeem taimoorzaeem commented Nov 26, 2024

Add metric pgrst_jwt_cache_size_bytes in admin server which shows the cache size in bytes.

@steve-chavez
Copy link
Member

Don't forget the feedback about the actual size in bytes #3801 (comment)

Ultimately, I believe we're going to need an LRU cache for the JWT cache to be production ready. So having this metric in bytes will be useful now and later.

@taimoorzaeem
Copy link
Collaborator Author

Don't forget the feedback about the actual size in bytes #3801 (comment)

Calculating "actual" byte size of cache is sort of tricky (haskell laziness is a lot to deal with sometimes) and I still haven't figured it out YET. In the meantime, I have written some code to approximate the cache size in bytes.

It works as follows:

Data.Cache gives a function toList which returns that cache entries in a list of tuples ([(ByteString, AuthResult, Maybe TimeSpec)] in our case).

Now, we can use ghc-datasize library code to calculate the byte size of ByteString and AuthResult but not Maybe TimeSpec (because recursiveSizeNF only works on types that are an instance of NFData typeclass, hence I am calling it an "approximation").

@taimoorzaeem taimoorzaeem force-pushed the metric/jwt-cache-size branch 2 times, most recently from 1254ed5 to 11727ab Compare December 17, 2024 06:50
@steve-chavez
Copy link
Member

This is pretty cool, so I do see the size starting at 0 then increasing as I do requests:

# $ nix-shell
# $ PGRST_ADMIN_SERVER_PORT=3001  PGRST_JWT_CACHE_MAX_LIFETIME=30000 postgrest-with-postgresql-16  -f test/spec/fixtures/load.sql postgrest-run

$ curl localhost:3001/metrics
# HELP pgrst_jwt_cache_size The number of cached JWTs
# TYPE pgrst_jwt_cache_size gauge
pgrst_jwt_cache_size 0.0

$ curl localhost:3000/authors_only -H "Authorization: Bearer $(postgrest-gen-jwt --exp 10 postgrest_test_author)"
[]
$ curl localhost:3001/metrics
..
pgrst_jwt_cache_size 72.0

$ curl localhost:3000/authors_only -H "Authorization: Bearer $(postgrest-gen-jwt --exp 10 postgrest_test_author)"
[]
$ curl localhost:3001/metrics
..
pgrst_jwt_cache_size 144.0

$ curl localhost:3000/authors_only -H "Authorization: Bearer $(postgrest-gen-jwt --exp 10 postgrest_test_author)"
[]
$ curl localhost:3001/metrics
..
pgrst_jwt_cache_size 216.0

Of course this doesn't drop down after a while because we need #3801 for that.

One issue that I've noticed is that we're printing empty log lines for each request:

[nix-shell:~/Projects/postgrest]$ PGRST_ADMIN_SERVER_PORT=3001  PGRST_JWT_CACHE_MAX_LIFETIME=30000 postgrest-with-postgresql-16  -f test/spec/fixtures/load.sql postgrest-run
...
18/Jan/2025:18:16:32 -0500: Schema cache loaded in 17.1 milliseconds
18/Jan/2025:18:16:34 -0500: 
18/Jan/2025:18:16:38 -0500: 
18/Jan/2025:18:16:42 -0500: 

This is due to the addition of the new Observation

And the following line here:

This is surprising behavior, I'll try to refactor this on a new PR.

For now, how about printing a message like:

JWTCache sz-> "The JWT Cache size increased to " <> sz <> "bytes"

This should only happen for a log-level greater than debug, check how this is done on:

observationLogger :: LoggerState -> LogLevel -> ObservationHandler
observationLogger loggerState logLevel obs = case obs of
o@(PoolAcqTimeoutObs _) -> do
when (logLevel >= LogError) $ do
logWithDebounce loggerState $
logWithZTime loggerState $ observationMessage o
o@(QueryErrorCodeHighObs _) -> do
when (logLevel >= LogError) $ do
logWithZTime loggerState $ observationMessage o
o@(HasqlPoolObs _) -> do
when (logLevel >= LogDebug) $ do
logWithZTime loggerState $ observationMessage o
PoolRequest ->
pure ()
PoolRequestFullfilled ->
pure ()
o ->
logWithZTime loggerState $ observationMessage o

@steve-chavez
Copy link
Member

Calculating "actual" byte size of cache is sort of tricky (haskell laziness is a lot to deal with sometimes) and I still haven't figured it out YET. In the meantime, I have written some code to approximate the cache size in bytes.

The above is understandable. What we really need is a good enough approximation so we have an order-of-magnitude understanding to see if the cache size is in KB, MB or GB. So far this seems enough. We should definitely document why we do an approximation though.

@steve-chavez
Copy link
Member

I've just noticed that perf badly drops down on this PR:

param v12.2.3 head main
throughput 448 121 399

https://github.com/PostgREST/postgrest/pull/3802/checks?check_run_id=34517395528

The recursiveSize function says:

This function works very quickly on small data structures, but can be slow on large and complex ones. If speed is an issue it's probably possible to get the exact size of a small portion of the data structure and then estimate the total size from that.

@taimoorzaeem What could we do to avoid this drop? Maybe calculate the cache size periodically on a background thread? Any thoughts?

@taimoorzaeem taimoorzaeem force-pushed the metric/jwt-cache-size branch from 11727ab to 145c236 Compare January 19, 2025 14:41
@taimoorzaeem
Copy link
Collaborator Author

Calculating "actual" byte size of cache is sort of tricky (haskell laziness is a lot to deal with sometimes) and I still haven't figured it out YET. In the meantime, I have written some code to approximate the cache size in bytes.

The above is understandable. What we really need is a good enough approximation so we have an order-of-magnitude understanding to see if the cache size is in KB, MB or GB. So far this seems enough. We should definitely document why we do an approximation though.

Now that I have gotten better at haskell 🚀, I solved the issue and we don't need an "approximation". We CAN calculate full cache size in bytes.

@taimoorzaeem taimoorzaeem force-pushed the metric/jwt-cache-size branch 3 times, most recently from 4b1afd1 to d9de43a Compare January 19, 2025 18:11
@taimoorzaeem
Copy link
Collaborator Author

taimoorzaeem commented Jan 19, 2025

Load test and memory test failing with:

src/PostgREST/Auth.hs:44:1: error:
    Could not find module ‘GHC.DataSize’
    Perhaps you haven't installed the profiling libraries for package ‘ghc-datasize-0.2.7’?
    Use -v (or `:set -v` in ghci) to see a list of the files searched for.
   |
44 | import GHC.DataSize            (recursiveSizeNF)

Is there any additional configuration that needs to be added for profiling?

Comment on lines 53 to 55
# nixpkgs have ghc-datasize-0.2.7 marked as broken
ghc-datasize = lib.markUnbroken prev.ghc-datasize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# nixpkgs have ghc-datasize-0.2.7 marked as broken
ghc-datasize = lib.markUnbroken prev.ghc-datasize;
# TODO: Remove this once https://github.com/NixOS/nixpkgs/pull/375121
# has made it to us.
ghc-datasize = lib.markUnbroken prev.ghc-datasize;

@wolfgangwalther
Copy link
Member

Load test and memory test failing with:

Does it happen locally, too?

I am confused, can't exactly spot what's wrong right now.

@taimoorzaeem
Copy link
Collaborator Author

Does it happen locally, too?

I am confused, can't exactly spot what's wrong right now.

Yes, it does happen locally.

[nix-shell]$ postgrest-loadtest
...
src/PostgREST/Auth.hs:44:1: error:
    Could not find module ‘GHC.DataSize’
    Perhaps you haven't installed the profiling libraries for package ‘ghc-datasize-0.2.7’?
    Use -v (or `:set -v` in ghci) to see a list of the files searched for.
   |
44 | import GHC.DataSize            (recursiveSizeNF)
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

@taimoorzaeem taimoorzaeem force-pushed the metric/jwt-cache-size branch 2 times, most recently from 7ff02db to 6eb7d3c Compare January 20, 2025 17:19
@taimoorzaeem
Copy link
Collaborator Author

Need some way to run postgrest-loadtest on the CI. Currently it is failing because of building with Nix. Running PGRST_BUILD_CABAL=1 posgrest-loadtest works locally but I am not sure how to set this up on CI temporarily to check loadtest results. Does running loadtest on CI equivalent to running it locally? Would I get same results?

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Jan 20, 2025

Yeah, running the loadtest in CI is not working when dependencies are changed, because it needs to run against the base branch, which doesn't have the dependencies, yet... and then cabal just breaks it somehow.

You can run something like this locally to get the same markdown output:

          postgrest-loadtest-against main v12.2.5
          postgrest-loadtest-report

(but it's likely that it fails the same way... :D)

@wolfgangwalther
Copy link
Member

Perhaps you haven't installed the profiling libraries for package ‘ghc-datasize-0.2.7’?

The thing I don't understand about this error message is, that it appears in contexts where we don't use profiling libs. We do for the memory test, so the error message makes "kind of sense" (I still don't understand why it happens, though). But for loadtest and the regular build on MacOS... those don't use profiling.

Hm... actually - we don't do a regular dynamically linked linux build via nix, I think. So in fact it fails for every nix build except the static build. Still don't know what's happening, though.

@steve-chavez
Copy link
Member

[nix-shell]$ postgrest-loadtest
...
src/PostgREST/Auth.hs:44:1: error:

I get a similar error when trying this locally too.

Yeah, running the loadtest in CI is not working when dependencies are changed, because it needs to run against the base branch, which doesn't have the dependencies, yet... and then cabal just breaks it somehow.

To unblock the PR, how about only adding the dependency in another PR and then merging it? Then I assume this PR would run the loadtest?

@wolfgangwalther
Copy link
Member

To unblock the PR, how about only adding the dependency in another PR and then merging it? Then I assume this PR would run the loadtest?

No I don't think so. It seems the dependency issue was fixed a while ago in 0c5d2e5. Also the fact that all nix builds fail, the memory test, on darwin, etc. - indicates that there is something else going on.

@taimoorzaeem taimoorzaeem force-pushed the metric/jwt-cache-size branch 3 times, most recently from ada4cf3 to 6122044 Compare March 23, 2025 17:28
@taimoorzaeem taimoorzaeem marked this pull request as draft March 23, 2025 17:29
@taimoorzaeem taimoorzaeem force-pushed the metric/jwt-cache-size branch from 6122044 to 7763527 Compare March 24, 2025 07:10
@taimoorzaeem taimoorzaeem marked this pull request as ready for review March 24, 2025 07:13
@taimoorzaeem
Copy link
Collaborator Author

@steve-chavez I think the PR is ready. Could you review/bench this?

@taimoorzaeem taimoorzaeem force-pushed the metric/jwt-cache-size branch from 7763527 to feb9c9d Compare March 26, 2025 14:10
@steve-chavez
Copy link
Member

steve-chavez commented Mar 26, 2025

The CPU usage stays high, even when the load test is finished. This doesn't happen on v12.2.8.
And the JWT cache size stays fixed at some number:

The above remarks from #3802 (comment) are no longer true. They look fixed.

There's still a dropdown in perf when lots of new JWTs arrive:

# $ export PGRSTBENCH_EC2_CLIENT_INSTANCE_TYPE="m5a.4xlarge"
# $ postgrest-bench-deploy

# using v12.2.8
$ postgrest-bench-k6 10 k6/GETSingleDifferentJWT.js 

Running k6 with 10 vus
Pseudo-terminal will not be allocated because stdin is not a terminal.
Pseudo-terminal will not be allocated because stdin is not a terminal.

     data_received..................: 2.2 MB 73 kB/s
     data_sent......................: 3.5 MB 116 kB/s
     http_req_blocked...............: avg=3.98µs  min=1.14µs  med=2.4µs   max=1.34ms   p(90)=2.95µs  p(95)=3.25µs  
     http_req_connecting............: avg=802ns   min=0s      med=0s      max=908.62µs p(90)=0s      p(95)=0s      
   ✓ http_req_duration..............: avg=43.05ms min=2.94ms  med=32.66ms max=379.5ms  p(90)=84.76ms p(95)=115.8ms 
       { expected_response:true }...: avg=43.05ms min=2.94ms  med=32.66ms max=379.5ms  p(90)=84.76ms p(95)=115.8ms 
   ✓ http_req_failed................: 0.00%  ✓ 0          ✗ 6950
     http_req_receiving.............: avg=45.35µs min=22.56µs med=44.76µs max=532.59µs p(90)=52.06µs p(95)=63.82µs 
     http_req_sending...............: avg=19.47µs min=7.81µs  med=13.14µs max=757.21µs p(90)=28.87µs p(95)=30.25µs 
     http_req_tls_handshaking.......: avg=0s      min=0s      med=0s      max=0s       p(90)=0s      p(95)=0s      
     http_req_waiting...............: avg=42.99ms min=2.83ms  med=32.6ms  max=379.42ms p(90)=84.71ms p(95)=115.71ms
     http_reqs......................: 6950   231.201961/s
     iteration_duration.............: avg=43.2ms  min=3.13ms  med=32.82ms max=379.62ms p(90)=84.89ms p(95)=115.93ms
     iterations.....................: 6950   231.201961/s
     vus............................: 10     min=10       max=10
     vus_max........................: 10     min=10       max=10

# Using this PR
$ postgrest-bench-k6 10 k6/GETSingleDifferentJWT.js 

Running k6 with 10 vus
Pseudo-terminal will not be allocated because stdin is not a terminal.
Pseudo-terminal will not be allocated because stdin is not a terminal.

     data_received..................: 392 kB 13 kB/s
     data_sent......................: 625 kB 21 kB/s
     http_req_blocked...............: avg=10.13µs  min=1.81µs  med=2.36µs   max=1.09ms   p(90)=2.87µs   p(95)=3.23µs  
     http_req_connecting............: avg=4.29µs   min=0s      med=0s       max=610.92µs p(90)=0s       p(95)=0s      
   ✓ http_req_duration..............: avg=239.85ms min=33.81ms med=226.4ms  max=685.51ms p(90)=374.79ms p(95)=423.53ms
       { expected_response:true }...: avg=239.85ms min=33.81ms med=226.4ms  max=685.51ms p(90)=374.79ms p(95)=423.53ms
   ✓ http_req_failed................: 0.00%  ✓ 0         ✗ 1252
     http_req_receiving.............: avg=48.18µs  min=24.22µs med=46.93µs  max=155.42µs p(90)=54.81µs  p(95)=72.56µs 
     http_req_sending...............: avg=19.26µs  min=8.57µs  med=12.78µs  max=193.15µs p(90)=29.16µs  p(95)=30.92µs 
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s       max=0s       p(90)=0s       p(95)=0s      
     http_req_waiting...............: avg=239.79ms min=33.75ms med=226.34ms max=685.42ms p(90)=374.73ms p(95)=423.45ms
     http_reqs......................: 1252   41.592355/s
     iteration_duration.............: avg=240.02ms min=33.96ms med=226.53ms max=685.66ms p(90)=374.95ms p(95)=423.66ms
     iterations.....................: 1252   41.592355/s
     vus............................: 10     min=10      max=10
     vus_max........................: 10     m

So from ~231 req/s to 41 req/s, ~82% loss. @taimoorzaeem Any idea why this happens?

@taimoorzaeem
Copy link
Collaborator Author

taimoorzaeem commented Mar 26, 2025

So from ~231 req/s to 41 req/s, ~82% loss. @taimoorzaeem Any idea why this happens?

@steve-chavez This happened because of a lot of new JWTs, which means the expensive operation of cache entry size calculation happens of every request (essentially several system calls happen in the backgroud) with every request which makes it slow.

If we save all the JWTs in the first benchmark run and then re-use the saved JWTs in the second benchmark run, I think the performance would be good (given appropriate expiration time in the test script which is 30s for now).

@steve-chavez
Copy link
Member

steve-chavez commented Mar 26, 2025

This happened because of a lot of new JWTs, which means the expensive operation of cache entry size calculation happens of every request (essentially several system calls happen in the backgroud) with every request which makes it slow.

I thought the debounce logic we now had would prevent this though? Shouldn't the expensive calculation only happen twice every second max regardless of the amount of requests?

Edit: right, so maybe we need to increase the debounce frequency.

Comment on lines +119 to +122
-- debounceFreq is set to default 1 second
{ debounceAction = calculateSizeThenLog
, debounceEdge = leadingEdge -- logs at the start and the end
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sets it to 1 second. Maybe we need to increase it for better perf?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taimoorzaeem How about adjusting debounceFreq and see how it affect performance? Maybe try 2, 5 and 10 seconds.

Also you were able to run the benchmark locally before #3802 (comment), could you try here again? (it's going to take me longer to deploy AWS machines with postgrest-benchmark, we no longer really need to do this).

@taimoorzaeem
Copy link
Collaborator Author

I thought the debounce logic we now had would prevent this though? Shouldn't the expensive calculation only happen twice every second max regardless of the amount of requests?

The debounce logic would have prevented it if we were calculating the size of cache in one go i.e. one atomic operation, but we are not. As discussed in this #3802 (comment), we need to calculate the size at the time of storing the entry in cache (it has its own tradeoffs), which is why debouncing isn't helping much (except for only logging the size once).

@steve-chavez
Copy link
Member

As discussed in this #3802 (comment), we need to calculate the size at the time of storing the entry in cache (it has its own tradeoffs), which is why debouncing isn't helping much (except for only logging the size once).

Ah, right. That is bad news. There's no way around it.

#3802 (comment)
So from ~231 req/s to 41 req/s, ~82% loss
calculation happens of every request (essentially several system calls happen in the backgroud)

At this point it's not looking like this perf loss is worth it for new requests, and doing a syscall every second is definitely looking bad. It seems that GHC/Haskell is just not good at this 😿

Maybe we should just concede to an LRU cache where we can only limit the number of entries, but not the memory consumption.


Notes:

  • Was hoping for a "size" claim, but there's not: https://www.iana.org/assignments/jwt/jwt.xhtml#claims
  • Discussions around JWT size, mention that it could reach 11KB or 8KB (ref1, ref2)
    • So given that we could try to default to a 1000 max entries.
    • And then tell on the docs that it's up to the user to estimate and increase this limit. And possibly limit the size of the JWT at the proxy level.

@taimoorzaeem taimoorzaeem marked this pull request as draft March 27, 2025 12:56
@taimoorzaeem
Copy link
Collaborator Author

At this point it's not looking like this perf loss is not worth it for new requests, and doing a syscall every second is definitely looking bad. It seems that GHC/Haskell is just not good at this 😿

Agreed.

Maybe we should just concede to an LRU cache where we can only limit the number of entries, but not the memory consumption

Yes, this seems like our best bet for now.

@taimoorzaeem
Copy link
Collaborator Author

@steve-chavez Where should we go from here? If this metric is very useful, maybe we can run this calculation on the admin server endpoint /metrics? But again, having no upper bound, this cache will keep growing so it will take longer and longer to finish calculation.

@mkleczek
Copy link
Contributor

mkleczek commented Mar 28, 2025

@steve-chavez @taimoorzaeem @wolfgangwalther

Maybe I am missing something but it should be OK to estimate the size of the cache entry as

q * (length token)

Where q > 2

There is no decompression step when converting from raw token to its decoded form.
We know length of the raw token so what is there to calculate?

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Mar 28, 2025

We know length of the raw token so what is there to calculate?

IIUC, the goal is to calculate the size of the haskell structures. The token is decoded, the payload is extracted and then it ends up as this:

data AuthResult = AuthResult
  { authClaims :: KM.KeyMap JSON.Value
  , authRole   :: BS.ByteString
  }

Depending on whether we are dealing, for example, with many small claims or few bigger claims, I assume the actual memory this structure needs would be different.

Edit: Whether the difference between extrapolating from the raw token length and the exact measurement would be big enough to justify this exact calculation, I don't know.

@wolfgangwalther
Copy link
Member

One thing about the JWT cache is not clear to me: We are storing the final AuthResult and this gives us a performance boost. OK. But do we actually know whether "validating the signature" or "decoding the payload" is the part that we benefit with the cache from? Iff it was only the signature validation, another approach could be to only store a hash map of booleans which indicate the result of the verification. The payload could then be decoded on every request.

This would require much less memory, I guess. But we'd very likely lose performance.

@mkleczek
Copy link
Contributor

The size of the cache entry should be O(length raw_token) so it should be fine to just multiply it by a constant I think (we could measure the constant to make it more exact but my guess is that 3 is a safe bet)

@mkleczek
Copy link
Contributor

mkleczek commented Mar 28, 2025

One thing about the JWT cache is not clear to me: We are storing the final AuthResult and this gives us a performance boost. OK. But do we actually know whether "validating the signature" or "decoding the payload" is the part that we benefit with the cache from?

I remember I reported the issue describing the need for the cache because of asymmetric cryptography is CPU intensive and slow:
#2698

I would say caching just validation results would be fine.

On the other hand I am not sure if JOSE provides API that would allow you to decode token without validating it.

@wolfgangwalther
Copy link
Member

On the other hand I am not sure if JOSE provides API that would allow you to decode token without validating it.

That's not too hard to do manually. Split the JWT by . -> you get 3 items. Take the 2nd one and decode it from base64. You have JSON now.

@steve-chavez
Copy link
Member

@steve-chavez Where should we go from here? If this metric is very useful, maybe we can run this calculation on the admin server endpoint /metrics? But again, having no upper bound, this cache will keep growing so it will take longer and longer to finish calculation.

@taimoorzaeem I think we should not continue pursuing this PR and instead go right away to the LRU cache solution, limited by entries. It's clear from the research here that Haskell is not suited for memory usage calculation. It's complicated to implement, has heavy perf implications, and overall I see too much that can go wrong.

@taimoorzaeem
Copy link
Collaborator Author

Closing this PR as we've decided to not pursue it anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants