-
-
Notifications
You must be signed in to change notification settings - Fork 23.6k
Reuse Sprite3D meshes across nodes when possible. #103312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
94d626e to
10fcacf
Compare
8e33549 to
fa159f1
Compare
fa159f1 to
c0a09f7
Compare
Can we check this the first time a Sprite3D is instanced and cache it in a static variable? The rendering method can't change at runtime. |
Calinou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally, it works as expected.
Testing project: https://github.com/Calinou/godot-sprite3d-vs-meshinstance3d
PC specifications
- CPU: Intel Core i9-13900K
- GPU: NVIDIA GeForce RTX 4090
- RAM: 64 GB (2×32 GB DDR5-5800 C30)
- SSD: Solidigm P44 Pro 2 TB
- OS: Linux (Fedora 41)
Using an optimized release export template build (production=yes lto=full) and the default window size from the testing project.
| Amount | Mesh | Sprite before | Sprite after |
|---|---|---|---|
| 0 | 8480 FPS (0.12 mspf) | 8480 FPS (0.12 mspf) | 8480 FPS (0.12 mspf) |
| 1,000 | 2896 FPS (0.35 mspf) | 979 FPS (1.02 mspf) | 2790 FPS (0.36 mspf) |
| 5,000 | 762 FPS (1.31 mspf) | 143 FPS (6.99 mspf) | 709 FPS (1.41 mspf) |
| 10,000 | 465 FPS (2.15 mspf) | 64 FPS (15.63 mspf) | 442 FPS (2.26 mspf) |
I also quickly tested the Mobile rendering method and the speedup is more modest there, but MeshInstance3D is already significantly slower than it is in Forward+. From 64 FPS with 10,000 Sprite3Ds, you go to 266 FPS with this PR. For comparison, 10,000 MeshInstance3Ds is 281 FPS. This may be chalked up to the lack of depth prepass in Mobile though, since meshes are opaque in this project (and Sprite3D uses alpha-cut transparency).
c0a09f7 to
fac5770
Compare
Got it. The new version also included some extra fixes to handle what happens when a sprite gets deleted while using another sprite. (Basically making the |
fac5770 to
97ff301
Compare
97ff301 to
d18405d
Compare
|
Rebased to fix minor conflict with #105785 |
|
Just as a heads up. I think this is something that is wanted, I've added it to the rendering team meeting agenda to get more eyes on it |
Ansraer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I took a glance at this PR just now. I like the idea, but am not sure I am happy with the implementation.
As far as I can tell right now one of the Sprite3Ds (called A for the sake of this discussion) stores the entire mesh and all the other Sprite3Ds that want to use the same mesh access Sprite3D's mesh.
This means we have special cases for Sprite3D's sharing their own mesh, accessing another Sprite3D's mesh and what happens when the the Sprite3D that owns the mesh gets deleted (with some kind of handover procedure to let another Sprite3D "own" the mesh).
IMO it would be far simpler if the mesh management was pulled out of Sprite3D entirely (maybe into "Sprite3DMeshes"), which would do simple counting for how many Sprite3Ds need a given mesh.
This way we would remove the seperation between mesh owning and mesh subscribing Sprite3Ds.
|
Billboards don't work well. These all happen after enabling and disabling the billboard flag:
|
d18405d to
45174a4
Compare
Yep, that's basically it. I think the alternative is actually flawed in practice. I assume that you're suggesting something like this:
While it is simpler in terms of implementation, I believe it will perform badly in practice. A common use case for Sprite3Ds is to change its color dynamically with modulate. Every time this happens, it is most certainly not going to be able to find an existing mesh to reuse.
Other frequent changes to the mesh data (UVs for AnimatedSprite3Ds for example) will also have similar issues, but modulate blending is likely the most serious case here. Reusing the mesh already created in Sprite3D's constructor (as demonstrated in this implementation) avoids this pitfall by basically providing a perfect upper bound for allocated meshes. A Sprite3D can keep sharing its mesh with others until it wants to change its mesh, in which case it picks a successor, copy its mesh data over, and tell everyone else to share with that successor instead. No new meshes need to be allocated and no old meshes need to be freed aggressively.
Just noticing now that the material (including the shader and the texture used) also needs to be batched alongside the mesh. This is an oversight on my part, sorry! I've updated the implementation to reflect this. Changes
|
|
It's much better now. A problem remaining is that two nodes, once they share the same configuration, become To reproduce:
|
45174a4 to
7b973bb
Compare
Fixed. It was a very sneaky bug.
if (last_sprite_mesh_key.alpha_cut_disabled) {
RS::get_singleton()->material_set_render_priority(get_material(), get_render_priority());
RS::get_singleton()->mesh_surface_set_material(successor->mesh, 0, get_material());
}
if (last_sprite_mesh_key.alpha_cut_disabled) {
RS::get_singleton()->material_set_render_priority(successor->get_material(), get_render_priority());
RS::get_singleton()->mesh_surface_set_material(successor->mesh, 0, successor->get_material());
}(Basically, I accidentally assigned the same material to the successor's mesh, causing the material to be shared between two meshes) |
I think having bookkeeping variables here is worth it for minimizing overhead. That is, avoiding The actual oversight here is forgetting that every if (users[i] && users[i]->using_sprite == this) {
successor = users[i];
shared_sprites.insert(last_sprite_mesh_key, successor);
+ successor->sharing_own_mesh = true;This logic has been refactored into void SpriteBase3D::_start_sharing_sprite() {
shared_sprites.insert(last_sprite_mesh_key, this);
sharing_own_mesh = true;
}With some extra refactoring I also added the void SpriteBase3D::_start_using_sprite(SpriteBase3D *p_using_sprite) {
using_sprite = p_using_sprite;
using_sprite_user_index = using_sprite->users.size();
set_base(using_sprite->mesh);
set_aabb(using_sprite->aabb);
using_sprite->users.push_back(this);
// We don't need to remove this sprite from the previous shared sprite's users list,
// as setting using_sprite means they will be detected and filtered out later.
} |
f05c085 to
85de885
Compare
|
Update: Added unit tests for completeness. |
f70f54b to
523b1e7
Compare
0d1ca8c to
cb4d9d1
Compare
|
Finally got the unit tests passing on every CI build. Using I've also changed the hashing algorithm to be identical to |
cb4d9d1 to
c0b9bd0
Compare
c0b9bd0 to
09f212e
Compare
|
Rebased due to |
Closes godotengine/godot-proposals#9947
This PR implements the proposal above, making Sprite3Ds (SpriteBase3Ds to be precise) reuse meshes whenever possible.
There's still some overhead associated with this reuse system, (on top of Sprite3D's existing overhead) so the performance isn't completely identical to using MeshInstance3Ds, but the performance gain is still very noticeable. See the results attached below.
Benchmarking is performed using the project provided in the proposal. Though in this case the Renderdoc captures should be enough to confirm the performance increase.
10000 Quad MeshInstance3Ds (which already use instancing) (40~ FPS)
10000 Sprite3Ds without Instancing (< 1 FPS)
10000 Sprite3Ds with Instancing (30~ FPS)
Implementation details:
draw_texture_rectis called.shared_spritesso other SpriteBase3Ds can share its mesh.Remaining issues:
RS::get_singleton()->get_current_rendering_method() == "gl_compatibility", which would add significant overhead on its own)rendering/3d/optimization/automatic_sprite_3d_instancing?