Put area light cluster iterations behind a spec constant.#119970
Conversation
There was a problem hiding this comment.
Seems to fix FPS drop back to 4.6.3 level.
Tested on RTX 2080 Super on Windows 11, GTX 970M on Windows 10, and M4 Pro Mac:
| Windows 11, RTX 2080, Vulkan, 3840×2160 | Windows 11, RTX 2080, D3D12, 3840×2160 | M4, Metal, 5120×2880 | M4, Vulkan, 5120×2880 | Windows 10, GTX 970M, Vulkan, 1920×1080 | |
|---|---|---|---|---|---|
| 4.6.3 (official build) | 188 fps | 167 fps | 76 fps | 69 fps | 75 fps |
| 4.7 beta 4 (official build) | 135 fps | 139 fps | 73 fps | 67 fps | 54 fps |
4.7 current (custom production=yes build) |
135 fps | 138 fps | 73 fps | 67 fps | 54 fps |
this PR (custom production=yes build) |
187 fps | 170 fps | 79 fps | 72 fps | 83 fps |
| ∆ 4.6.3 → 4.7 beta 4 | -28% | -17% | -4% | -3% | -28% |
| ∆ 4.6.3 → this PR | -0.6% | +1.5% | +4% | +4% | +11% |
We should probably document this in the area light documentation, in the sense that the first visible AreaLight3D in a scene will have a much more significant performance cost than subsequent AreaLight3Ds. |
There was a problem hiding this comment.
Tested locally, it works as expected. Code looks good to me.
Performance is now up compared to 4.6.3, similar to how master performs better in Mobile and Compatibility compared to 4.6.3.
All tests in 3840×2160.
| Version | Forward+ |
|---|---|
| 4.6.3 | 953 FPS (1.05 mspf) |
| 4.7.beta4 | 848 FPS (1.18 mspf) |
| This PR | 1053 FPS (0.95 mspf) |
PC specifications
- CPU: AMD Ryzen 9 9950X3D
- GPU: NVIDIA GeForce RTX 5090
- RAM: 64 GB (2×32 GB DDR5-6000 CL30)
- SSD: Solidigm P44 Pro 2 TB
- OS: Linux (Fedora 44)
So yes, seems to be working fine on my end, save for a negligible performance loss of ~2fps between the pre-arealight commit and this one. I have no idea why that difference is there, but it doesn't seem like enough of an issue to really be worth looking into. |
|
How did you measure the performance? A 2 fps variation would be tiny and well within natural variability of any testing (it represents a difference of about 45 us) |
|
I just run the test scene as specified in the issue. The PR gets 210-211, pre-light commit gets 212 with the occasional blip to 213. I reran them several times and I keep getting the same numbers. I'm building by using |
clayjohn
left a comment
There was a problem hiding this comment.
This is the right solution for 4.7. Its not ideal since just adding a single area light to the scene will bring back this performance loss when it is in view.
Overall this leaves us with some open questions:
- Is this really a big deal? It seems most modern hardware has a much more acceptable performance tradeoff that is more in line with expectations (10% or so). For most users, this solution might be totally fine.
- Should we prioritize a deferred renderer now? With deferred rendering we can have a dedicated shader pass for area lights, so they will be waaaay faster and won't impact the performance of other light types
- Should we kick area lights into a hybrid pass? We could calculate area lights in a hybrid pass before the opaque pass. But then they would be more limited (no custom shader support as now, and would support fewer shader features)
For 4.7, this PR is the right approach. We just also need to update the docs (fine to do in a separate PR) to warn that area lights are very expensive and are intended for cinematics.
That's when compared to 4.6.3, but compared to this PR, the loss is a bit larger than that. Reusing numbers from the tests posted here:
Even disregarding the special case of my RX570, it seems like it's more like ~20% fps loss. More tests on more hardware are probably needed in this regard. Though of course this drop is also in a fairly barebone scene. In real use-cases, things like higher-res shadows, point shadows, GI, SSAO, SSR, bloom, and so on, would make this less perceivable. ...That is, with the exception of some cards like the RX570 (and who knows which others) which really dislike area lights. |
|
Thanks! |
What problem(s) does this PR solve?
Additional information
Area lights have a massive VGPR footprint, the amount of registers go from 85~ to 150~ and the shader becomes twice as long. This PR makes area light code not get compiled in unless there is one visible in the frustum.
Mobile does not need this since area light count is already a specialization constant there.
Compatibility does not need it either as it enables
DISABLE_LIGHT_AREAspec constant if there are no area lights.