Skip to content

Put area light cluster iterations behind a spec constant.#119970

Merged
Repiteo merged 1 commit into
godotengine:masterfrom
blueskythlikesclouds:area-light-cluster-spec-constant
Jun 2, 2026
Merged

Put area light cluster iterations behind a spec constant.#119970
Repiteo merged 1 commit into
godotengine:masterfrom
blueskythlikesclouds:area-light-cluster-spec-constant

Conversation

@blueskythlikesclouds
Copy link
Copy Markdown
Member

@blueskythlikesclouds blueskythlikesclouds commented Jun 2, 2026

What problem(s) does this PR solve?

Additional information

Area lights have a massive VGPR footprint, the amount of registers go from 85~ to 150~ and the shader becomes twice as long. This PR makes area light code not get compiled in unless there is one visible in the frustum.

Mobile does not need this since area light count is already a specialization constant there.
Compatibility does not need it either as it enables DISABLE_LIGHT_AREA spec constant if there are no area lights.

Copy link
Copy Markdown
Member

@bruvzg bruvzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to fix FPS drop back to 4.6.3 level.

Tested on RTX 2080 Super on Windows 11, GTX 970M on Windows 10, and M4 Pro Mac:

Windows 11, RTX 2080, Vulkan, 3840×2160 Windows 11, RTX 2080, D3D12, 3840×2160 M4, Metal, 5120×2880 M4, Vulkan, 5120×2880 Windows 10, GTX 970M, Vulkan, 1920×1080
4.6.3 (official build) 188 fps 167 fps 76 fps 69 fps 75 fps
4.7 beta 4 (official build) 135 fps 139 fps 73 fps 67 fps 54 fps
4.7 current (custom production=yes build) 135 fps 138 fps 73 fps 67 fps 54 fps
this PR (custom production=yes build) 187 fps 170 fps 79 fps 72 fps 83 fps
∆ 4.6.3 → 4.7 beta 4 -28% -17% -4% -3% -28%
∆ 4.6.3 → this PR -0.6% +1.5% +4% +4% +11%

@Calinou
Copy link
Copy Markdown
Member

Calinou commented Jun 2, 2026

Area lights have a massive VGPR footprint, the amount of registers go from 85~ to 150~ and the shader becomes twice as long.

We should probably document this in the area light documentation, in the sense that the first visible AreaLight3D in a scene will have a much more significant performance cost than subsequent AreaLight3Ds.

Copy link
Copy Markdown
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, it works as expected. Code looks good to me.

Performance is now up compared to 4.6.3, similar to how master performs better in Mobile and Compatibility compared to 4.6.3.

All tests in 3840×2160.

Version Forward+
4.6.3 953 FPS (1.05 mspf)
4.7.beta4 848 FPS (1.18 mspf)
This PR 1053 FPS (0.95 mspf)
PC specifications
  • CPU: AMD Ryzen 9 9950X3D
  • GPU: NVIDIA GeForce RTX 5090
  • RAM: 64 GB (2×32 GB DDR5-6000 CL30)
  • SSD: Solidigm P44 Pro 2 TB
  • OS: Linux (Fedora 44)

@akien-mga akien-mga requested a review from clayjohn June 2, 2026 16:43
@Image-unavailable
Copy link
Copy Markdown
Contributor

  • 4.7 beta 4: 75fps
  • 4.6.3: 189fps
  • this PR: 210fps
  • pre-arealights commit: 212fps (somehow)

So yes, seems to be working fine on my end, save for a negligible performance loss of ~2fps between the pre-arealight commit and this one. I have no idea why that difference is there, but it doesn't seem like enough of an issue to really be worth looking into.

@AThousandShips
Copy link
Copy Markdown
Member

AThousandShips commented Jun 2, 2026

How did you measure the performance? A 2 fps variation would be tiny and well within natural variability of any testing (it represents a difference of about 45 us)

@Image-unavailable
Copy link
Copy Markdown
Contributor

I just run the test scene as specified in the issue. The PR gets 210-211, pre-light commit gets 212 with the occasional blip to 213. I reran them several times and I keep getting the same numbers.

I'm building by using scons with no parameters. Thinking about it, it could be completely unrelated to GPU workload (CPU side?) since it's such a small difference.

Copy link
Copy Markdown
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the right solution for 4.7. Its not ideal since just adding a single area light to the scene will bring back this performance loss when it is in view.

Overall this leaves us with some open questions:

  1. Is this really a big deal? It seems most modern hardware has a much more acceptable performance tradeoff that is more in line with expectations (10% or so). For most users, this solution might be totally fine.
  2. Should we prioritize a deferred renderer now? With deferred rendering we can have a dedicated shader pass for area lights, so they will be waaaay faster and won't impact the performance of other light types
  3. Should we kick area lights into a hybrid pass? We could calculate area lights in a hybrid pass before the opaque pass. But then they would be more limited (no custom shader support as now, and would support fewer shader features)

For 4.7, this PR is the right approach. We just also need to update the docs (fine to do in a separate PR) to warn that area lights are very expensive and are intended for cinematics.

@Image-unavailable
Copy link
Copy Markdown
Contributor

It seems most modern hardware has a much more acceptable performance tradeoff that is more in line with expectations (10% or so)

That's when compared to 4.6.3, but compared to this PR, the loss is a bit larger than that. Reusing numbers from the tests posted here:

  • Windows 10, GTX 970M, Vulkan, 1920×1080: 83 -> 54 (-35%)
  • Windows 11, RTX 2080, D3D12, 3840×2160: 170 -> 139 (-18%)
  • Fedora 44, RTX 5090: 1053 -> 848 (-19%)
  • M4, Metal, 5120×2880: 79 -> 73 (-8%)

Even disregarding the special case of my RX570, it seems like it's more like ~20% fps loss. More tests on more hardware are probably needed in this regard.

Though of course this drop is also in a fairly barebone scene. In real use-cases, things like higher-res shadows, point shadows, GI, SSAO, SSR, bloom, and so on, would make this less perceivable.

...That is, with the exception of some cards like the RX570 (and who knows which others) which really dislike area lights.

@Repiteo Repiteo merged commit cbc3aa3 into godotengine:master Jun 2, 2026
20 checks passed
@Repiteo
Copy link
Copy Markdown
Contributor

Repiteo commented Jun 2, 2026

Thanks!

@blueskythlikesclouds blueskythlikesclouds deleted the area-light-cluster-spec-constant branch June 3, 2026 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

60%+ FPS loss in Forward+ rendering performance compared to 4.6.3

7 participants