Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly optimize updating RenderChunk positions in ViewFrustum #782

Conversation

DaMatrix
Copy link
Member

Rather than simply performing the updates asynchronously in a different thread (which introduces potential race condition and just offloads the issue to another core), this actually makes the RenderChunk position updates significantly faster in most common cases.

We observe that under normal gameplay circumstances, the camera rarely moves more than one cube per frame. By detecting this common case, we can efficiently skip RenderChunks whose position hasn't changed, as when the camera moves by one cube in a given direction, only one 2D slice/plane of RenderChunks are actually changed.

On my machine, with a horizontal render distance of 48 chunks and a vertical render distance of 16 cubes, and while flying around at maximum speed in spectator mode, this change reduces ViewFrustum#updateChunkPositions() from ~22% of the total client thread CPU time to ~2.4%, nearly an order of magnitude performance improvement.

Rather than simply performing the updates asynchronously in a different thread (which introduces potential race condition and just offloads the issue to another core), this actually makes the RenderChunk position updates significantly faster in most common cases.

We observe that under normal gameplay circumstances, the camera rarely moves more than one cube per frame. By detecting this common case, we can efficiently skip RenderChunks whose position hasn't changed, as when the camera moves by one cube in a given direction, only one 2D slice/plane of RenderChunks are actually changed.

On my machine, with a horizontal render distance of 48 chunks and a vertical render distance of 16 cubes, and while flying around at maximum speed in spectator mode, this change reduces ViewFrustum#updateChunkPositions() from ~22% of the total client thread CPU time to ~2.4%, nearly an order of magnitude performance improvement.
@Barteks2x
Copy link
Member

Can you measure exact time in microseconds, with various render distance values, including max possible horizontal+vertical?

@DaMatrix
Copy link
Member Author

Do you want the exact time for one update in a known direction, over a fixed sample duration with a known movement pattern, or average time for many updates with a random movement pattern? This is hard to microbenchmark since the exact update durations are going to depend on the direction which a player is moving in, as well as where the player is relative to the origin point (not to mention that the actual time is going to be affected by the number of RenderChunks which are actually built when their position is changed, or worse - if the RenderChunk is being built we may have to sleep while acquiring lockCompileTask).

@Barteks2x
Copy link
Member

Average/minimum/maximum time per invocation when moving normally, this basically tells me how much stutter it's actually going to cause

@DaMatrix
Copy link
Member Author

I couldn't go higher than 20 vertical with 64 horizontal render distance, the client takes so long to allocate all the buffers that it gets timed out. All durations in milliseconds.

  • Original:
    • 8 horizontal, 8 vertical: {count=510, sum=204.274381, min=0.055093, average=0.400538, max=1.630065}
    • 16 horizontal, 16 vertical: {count=596, sum=1686.617977, min=1.588502, average=2.829896, max=6.958128}
    • 48 horizontal, 16 vertical: {count=360, sum=9309.496855, min=17.222589, average=25.859713, max=419.388198}
    • 64 horizontal, 20 vertical: {count=317, sum=19034.859821, min=41.213958, average=60.046876, max=621.479877}
  • This PR:
    • 8 horizontal, 8 vertical: {count=425, sum=46.763603, min=0.001328, average=0.110032, max=2.314520}
    • 16 horizontal, 16 vertical: {count=543, sum=176.860643, min=0.000544, average=0.325710, max=12.860030}
    • 48 horizontal, 16 vertical: {count=429, sum=645.727444, min=0.001609, average=1.505192, max=33.976273}
    • 64 horizontal, 20 vertical: {count=606, sum=1800.563056, min=0.002199, average=2.971226, max=65.247747}

This PR seems is clearly significantly faster than the original code at high render distances, and outperforms the original code on average at low render distances.

At low render distances this PR tends to have longer spikes than the original code (about 2x longer), despite having a lower average duration. My guess is that this is caused by the VM occasionally having to deoptimize and recompile the code when one of the conditions is reached for the first time.

@Barteks2x
Copy link
Member

I would like to figure out further improvements, but 3ms in the max render distance case looks mostly good enough

@Barteks2x Barteks2x merged commit 7165b37 into OpenCubicChunks:MC_1.12 Mar 11, 2025
1 check passed
@DaMatrix DaMatrix deleted the optimized-viewfrustum-position-updates branch March 11, 2025 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants