Significantly optimize updating RenderChunk positions in ViewFrustum #782

DaMatrix · 2025-03-11T15:42:00Z

Rather than simply performing the updates asynchronously in a different thread (which introduces potential race condition and just offloads the issue to another core), this actually makes the RenderChunk position updates significantly faster in most common cases.

We observe that under normal gameplay circumstances, the camera rarely moves more than one cube per frame. By detecting this common case, we can efficiently skip RenderChunks whose position hasn't changed, as when the camera moves by one cube in a given direction, only one 2D slice/plane of RenderChunks are actually changed.

On my machine, with a horizontal render distance of 48 chunks and a vertical render distance of 16 cubes, and while flying around at maximum speed in spectator mode, this change reduces ViewFrustum#updateChunkPositions() from ~22% of the total client thread CPU time to ~2.4%, nearly an order of magnitude performance improvement.

Rather than simply performing the updates asynchronously in a different thread (which introduces potential race condition and just offloads the issue to another core), this actually makes the RenderChunk position updates significantly faster in most common cases. We observe that under normal gameplay circumstances, the camera rarely moves more than one cube per frame. By detecting this common case, we can efficiently skip RenderChunks whose position hasn't changed, as when the camera moves by one cube in a given direction, only one 2D slice/plane of RenderChunks are actually changed. On my machine, with a horizontal render distance of 48 chunks and a vertical render distance of 16 cubes, and while flying around at maximum speed in spectator mode, this change reduces ViewFrustum#updateChunkPositions() from ~22% of the total client thread CPU time to ~2.4%, nearly an order of magnitude performance improvement.

Barteks2x · 2025-03-11T16:05:49Z

Can you measure exact time in microseconds, with various render distance values, including max possible horizontal+vertical?

DaMatrix · 2025-03-11T16:19:19Z

Do you want the exact time for one update in a known direction, over a fixed sample duration with a known movement pattern, or average time for many updates with a random movement pattern? This is hard to microbenchmark since the exact update durations are going to depend on the direction which a player is moving in, as well as where the player is relative to the origin point (not to mention that the actual time is going to be affected by the number of RenderChunks which are actually built when their position is changed, or worse - if the RenderChunk is being built we may have to sleep while acquiring lockCompileTask).

Barteks2x · 2025-03-11T16:20:09Z

Average/minimum/maximum time per invocation when moving normally, this basically tells me how much stutter it's actually going to cause

DaMatrix · 2025-03-11T17:14:07Z

I couldn't go higher than 20 vertical with 64 horizontal render distance, the client takes so long to allocate all the buffers that it gets timed out. All durations in milliseconds.

Original:
- 8 horizontal, 8 vertical: {count=510, sum=204.274381, min=0.055093, average=0.400538, max=1.630065}
- 16 horizontal, 16 vertical: {count=596, sum=1686.617977, min=1.588502, average=2.829896, max=6.958128}
- 48 horizontal, 16 vertical: {count=360, sum=9309.496855, min=17.222589, average=25.859713, max=419.388198}
- 64 horizontal, 20 vertical: {count=317, sum=19034.859821, min=41.213958, average=60.046876, max=621.479877}
This PR:
- 8 horizontal, 8 vertical: {count=425, sum=46.763603, min=0.001328, average=0.110032, max=2.314520}
- 16 horizontal, 16 vertical: {count=543, sum=176.860643, min=0.000544, average=0.325710, max=12.860030}
- 48 horizontal, 16 vertical: {count=429, sum=645.727444, min=0.001609, average=1.505192, max=33.976273}
- 64 horizontal, 20 vertical: {count=606, sum=1800.563056, min=0.002199, average=2.971226, max=65.247747}

This PR seems is clearly significantly faster than the original code at high render distances, and outperforms the original code on average at low render distances.

At low render distances this PR tends to have longer spikes than the original code (about 2x longer), despite having a lower average duration. My guess is that this is caused by the VM occasionally having to deoptimize and recompile the code when one of the conditions is reached for the first time.

Barteks2x · 2025-03-11T17:24:34Z

I would like to figure out further improvements, but 3ms in the max render distance case looks mostly good enough

DaMatrix added 2 commits March 11, 2025 16:40

Update GitHub PR workflow step versions

5e56b61

Barteks2x merged commit 7165b37 into OpenCubicChunks:MC_1.12 Mar 11, 2025
1 check passed

DaMatrix deleted the optimized-viewfrustum-position-updates branch March 11, 2025 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significantly optimize updating RenderChunk positions in ViewFrustum #782

Significantly optimize updating RenderChunk positions in ViewFrustum #782

DaMatrix commented Mar 11, 2025

Barteks2x commented Mar 11, 2025

DaMatrix commented Mar 11, 2025

Barteks2x commented Mar 11, 2025

DaMatrix commented Mar 11, 2025

Barteks2x commented Mar 11, 2025

Significantly optimize updating RenderChunk positions in ViewFrustum #782

Significantly optimize updating RenderChunk positions in ViewFrustum #782

Conversation

DaMatrix commented Mar 11, 2025

Barteks2x commented Mar 11, 2025

DaMatrix commented Mar 11, 2025

Barteks2x commented Mar 11, 2025

DaMatrix commented Mar 11, 2025

Barteks2x commented Mar 11, 2025