perf: add ghc 9.12 options -fexpose-overloaded-unfoldings -fspecialise-aggressively#5008
Conversation
| ghc-options: -O2 | ||
| if impl(ghc >= 9.12) | ||
| ghc-options: -fexpose-overloaded-unfoldings -fspecialise-aggressively |
There was a problem hiding this comment.
Hmm, I am not sure if we should do this globally like this. I think losing -O2 could cause unexpected performance degradation.
There was a problem hiding this comment.
Hmm... this should be additive (ie. these are additional flags added to -O2), from what I understand.
There was a problem hiding this comment.
Oh yeah, it is additive. My bad.
I am still skeptical if we should introduce this. I think we should do INLINABLE explicitly only on functions where considerable performance benefits are seen. Doing this globally will incrementally increase binary size as well.
There was a problem hiding this comment.
Q: If this does bring a perf improvement, we should be able to see the results on the loadtest? So far I don't see an improvement.
There was a problem hiding this comment.
I see quite an improvement across the board. It's most notable on the jwt loadtests, but also the mixed loadtest has a tendency of having many more negative numbers than positive. Sure - there is still noise on top of that, so the highest numbers hover closely around 0%. But that's not what you expect on a no-change PR - on a no-change PR you'd expect these to be more evenly distributed around 0%.
I think we should do
INLINABLEexplicitly only on functions where considerable performance benefits are seen. Doing this globally will incrementally increase binary size as well.
I think the small increase in binary size is more than worth this optimization. Not having to put INLINABLE in places where we actually don't want to inline, but to specialize across modules (did I get that right?) is a positive for me...
There was a problem hiding this comment.
@wolfgangwalther Could you elaborate on how you read the loadtests to see a perf improvement?
The load tests on #4880 leave me puzzled, they show some negatives across the board but that PR shouldn't have impacted the perf at all.
There was a problem hiding this comment.
Could you elaborate on how you read the loadtests to see a perf improvement?
Sure, I'm looking at https://github.com/PostgREST/postgrest/actions/runs/27569147229?pr=5008, and the "change" column for all loadtests, but only at P50 to make this a bit shorter - you could do the same with all the others and it wouldn't change the outcome:
- mixed: no positive number at all, From looking at it, the average is probably around -2%.
- jwt-hs: -6.3%
- jwt-hs-cache: -3.7%
- jwt-hs-cache-worst: -4.2%
- jwt-rsa: -2.1%
- jwt-rsa-cache: -3.6%
- jwt-rsa-cache-worst: -3.3%
Literally every test says there is a small improvement.
Compare that to a recent run on main (note I am looking at something from two days ago, because I don't think we can trust results from today / the last couple of hours at all - just look at the crazy numbers on https://github.com/PostgREST/postgrest/actions/runs/27638369318):
- mixed shows roughly ~1-2%
- jwt-hs: -0.8%
- jwt-hs-cache: -0.4%
- jwt-hs-cache-worst: 2.1%
- jwt-rsa: -1.3%
- jwt-rsa-cache: 0.5%
- jwt-rsa-cache-worst: -1.2%
Some positive, some negative. Noisy, that's all.
The load tests on #4880 leave me puzzled, they show some negatives across the board but that PR shouldn't have impacted the perf at all.
I'll respond in that PR about that.
wolfgangwalther
left a comment
There was a problem hiding this comment.
This is a great change to have, because it not only gives us performance improvements, but it eventually makes the code easier to maintain as well: We won't need to add INLINABLE in places where we need cross-module specialisation.
Binary size increases, yes. Let's see how much, by looking at the artifacts of the Build workflow compared to main: The difference is less than 100k...
Yes, this is compressed, but I really doubt that any of our users care more about executable size than performance when handling request. And if they really do: They can always build themselves, just like they can do already.
I really see no downside to this.
I was going to say that we should mention this increase in size in the CHANGELOG but comparing both main and PR the change does look negligible. So should be fine to not mention this. |
…e-aggressively Before 9.12.1 it was necessary to mark functions as INLINABLE or INLINE to make GHC consider cross-module specialization of polymorphic functions. 9.12 added a new -fexpose-overloaded-unfoldings flag that exposes optimized polymorphic functions in interface files. -fspecialise-aggressively then makes GHC apply aggresive specialization.
f6bc215 to
1c35fe3
Compare
Before 9.12.1 it was necessary to mark functions as INLINABLE or INLINE to make GHC consider cross-module specialization of polymorphic functions.
9.12 added a new -fexpose-overloaded-unfoldings flag that exposes optimized polymorphic functions in interface files. -fspecialise-aggressively then makes GHC apply aggresive specialization.
Looking at load test results it gives some performance improvements across the board.
main--fspecialise-aggressivelycan increase it quite badly