Skip to content

perf: add ghc 9.12 options -fexpose-overloaded-unfoldings -fspecialise-aggressively#5008

Merged
taimoorzaeem merged 1 commit into
PostgREST:mainfrom
mkleczek:perf/expose-overloaded-unfoldings
Jun 17, 2026
Merged

perf: add ghc 9.12 options -fexpose-overloaded-unfoldings -fspecialise-aggressively#5008
taimoorzaeem merged 1 commit into
PostgREST:mainfrom
mkleczek:perf/expose-overloaded-unfoldings

Conversation

@mkleczek

@mkleczek mkleczek commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Before 9.12.1 it was necessary to mark functions as INLINABLE or INLINE to make GHC consider cross-module specialization of polymorphic functions.
9.12 added a new -fexpose-overloaded-unfoldings flag that exposes optimized polymorphic functions in interface files. -fspecialise-aggressively then makes GHC apply aggresive specialization.

Looking at load test results it gives some performance improvements across the board.

  • compare executable size with main - -fspecialise-aggressively can increase it quite badly

@mkleczek mkleczek marked this pull request as draft June 15, 2026 19:02
@mkleczek mkleczek marked this pull request as ready for review June 15, 2026 20:06
Comment thread postgrest.cabal
Comment on lines +180 to +182
ghc-options: -O2
if impl(ghc >= 9.12)
ghc-options: -fexpose-overloaded-unfoldings -fspecialise-aggressively

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I am not sure if we should do this globally like this. I think losing -O2 could cause unexpected performance degradation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... this should be additive (ie. these are additional flags added to -O2), from what I understand.

@taimoorzaeem taimoorzaeem Jun 16, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, it is additive. My bad.

I am still skeptical if we should introduce this. I think we should do INLINABLE explicitly only on functions where considerable performance benefits are seen. Doing this globally will incrementally increase binary size as well.

@steve-chavez steve-chavez Jun 16, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: If this does bring a perf improvement, we should be able to see the results on the loadtest? So far I don't see an improvement.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see quite an improvement across the board. It's most notable on the jwt loadtests, but also the mixed loadtest has a tendency of having many more negative numbers than positive. Sure - there is still noise on top of that, so the highest numbers hover closely around 0%. But that's not what you expect on a no-change PR - on a no-change PR you'd expect these to be more evenly distributed around 0%.

I think we should do INLINABLE explicitly only on functions where considerable performance benefits are seen. Doing this globally will incrementally increase binary size as well.

I think the small increase in binary size is more than worth this optimization. Not having to put INLINABLE in places where we actually don't want to inline, but to specialize across modules (did I get that right?) is a positive for me...

@steve-chavez steve-chavez Jun 16, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wolfgangwalther Could you elaborate on how you read the loadtests to see a perf improvement?

The load tests on #4880 leave me puzzled, they show some negatives across the board but that PR shouldn't have impacted the perf at all.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on how you read the loadtests to see a perf improvement?

Sure, I'm looking at https://github.com/PostgREST/postgrest/actions/runs/27569147229?pr=5008, and the "change" column for all loadtests, but only at P50 to make this a bit shorter - you could do the same with all the others and it wouldn't change the outcome:

  • mixed: no positive number at all, From looking at it, the average is probably around -2%.
  • jwt-hs: -6.3%
  • jwt-hs-cache: -3.7%
  • jwt-hs-cache-worst: -4.2%
  • jwt-rsa: -2.1%
  • jwt-rsa-cache: -3.6%
  • jwt-rsa-cache-worst: -3.3%

Literally every test says there is a small improvement.

Compare that to a recent run on main (note I am looking at something from two days ago, because I don't think we can trust results from today / the last couple of hours at all - just look at the crazy numbers on https://github.com/PostgREST/postgrest/actions/runs/27638369318):

  • mixed shows roughly ~1-2%
  • jwt-hs: -0.8%
  • jwt-hs-cache: -0.4%
  • jwt-hs-cache-worst: 2.1%
  • jwt-rsa: -1.3%
  • jwt-rsa-cache: 0.5%
  • jwt-rsa-cache-worst: -1.2%

Some positive, some negative. Noisy, that's all.

The load tests on #4880 leave me puzzled, they show some negatives across the board but that PR shouldn't have impacted the perf at all.

I'll respond in that PR about that.

@wolfgangwalther wolfgangwalther left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great change to have, because it not only gives us performance improvements, but it eventually makes the code easier to maintain as well: We won't need to add INLINABLE in places where we need cross-module specialisation.

Binary size increases, yes. Let's see how much, by looking at the artifacts of the Build workflow compared to main: The difference is less than 100k...

Yes, this is compressed, but I really doubt that any of our users care more about executable size than performance when handling request. And if they really do: They can always build themselves, just like they can do already.

I really see no downside to this.

@steve-chavez

Copy link
Copy Markdown
Member

Binary size increases, yes. Let's see how much, by looking at the artifacts of the Build workflow compared to main: The difference is less than 100k...

I was going to say that we should mention this increase in size in the CHANGELOG but comparing both main and PR the change does look negligible. So should be fine to not mention this.

…e-aggressively

Before 9.12.1 it was necessary to mark functions as INLINABLE or INLINE to make GHC consider cross-module specialization of polymorphic functions.
9.12 added a new -fexpose-overloaded-unfoldings flag that exposes optimized polymorphic functions in interface files. -fspecialise-aggressively then makes GHC apply aggresive specialization.
@mkleczek mkleczek force-pushed the perf/expose-overloaded-unfoldings branch from f6bc215 to 1c35fe3 Compare June 17, 2026 05:25

@taimoorzaeem taimoorzaeem left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@taimoorzaeem taimoorzaeem merged commit 9f97147 into PostgREST:main Jun 17, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants