Skip to content

Optimize GDScriptInstance::notification for better performance#94118

Merged
Repiteo merged 1 commit into
godotengine:masterfrom
CrazyRoka:optimize-gdscript-notifications
Oct 1, 2025
Merged

Optimize GDScriptInstance::notification for better performance#94118
Repiteo merged 1 commit into
godotengine:masterfrom
CrazyRoka:optimize-gdscript-notifications

Conversation

@CrazyRoka
Copy link
Copy Markdown
Contributor

@CrazyRoka CrazyRoka commented Jul 9, 2024

Optimize GDScriptInstance::notification for better performance

This PR optimizes the GDScriptInstance::notification function to improve performance and reduce CPU usage.

Changes

  • Replace List<GDScript *> with a LocalVector<GDScript *> to reduce allocations and deallocations.
  • Minimize List push_front() and push_back() operations by reusing the LocalVector across calls.
  • Add unlikely() hint for vector growth condition to optimize the common case.
  • Reduce overall CPU usage in notification traversal.

Performance Impact

  • Eliminates the need for frequent List operations, which were causing significant CPU usage.
  • Reduces memory allocations by reusing the vector across calls.
  • Improves cache efficiency by using a contiguous memory structure (Vector) instead of a linked list.

Testing

  • Tested locally with a scene that contains 20K cubes, all of them were moving around.
  • Profiled application before and after my change. Initial implementation was causing 3.35% of CPU usage on my laptop. Optimized version is using 0.5% of CPU.

Profiling data with current mainline

image

Profiling data with my changes

image

This optimization should significantly reduce the CPU time spent in GDScriptInstance::notification, particularly for projects with frequent notifications or deep GDScript inheritance hierarchies.

@CrazyRoka CrazyRoka requested a review from a team as a code owner July 9, 2024 08:34
@Mickeon Mickeon added this to the 4.x milestone Jul 9, 2024
Comment thread modules/gdscript/gdscript.cpp
@adamscott adamscott requested a review from RandomShaper July 9, 2024 16:22
@adamscott
Copy link
Copy Markdown
Member

I see thread_local, I tag @RandomShaper. Just to be sure.

@adamscott adamscott modified the milestones: 4.x, 4.3 Jul 9, 2024
@adamscott
Copy link
Copy Markdown
Member

@CrazyRoka Can you share your test project? I'm curious to run the profiler on my machine before approving your PR.

@CrazyRoka
Copy link
Copy Markdown
Contributor Author

@CrazyRoka Can you share your test project? I'm curious to run the profiler on my machine before approving your PR.

I've just created a new repository with my test project. Have a look: https://github.com/CrazyRoka/godot-benchmark-moving-cubes/tree/main

Copy link
Copy Markdown
Member

@adamscott adamscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite an improvement. I tested out @CrazyRoka's test repo and I've seen an improvement of ~61% over one minute of benchmark testing. I've only tested once, so don't take my results too seriously.

Results
Master branch This PR
Capture d’écran, le 2024-07-09 à 14 35 20 Capture d’écran, le 2024-07-09 à 14 35 27

I would wait though for @RandomShaper's input though.

@akien-mga akien-mga changed the title Optimize GDScriptInstance::notification for better performance Optimize GDScriptInstance::notification for better performance Jul 17, 2024
@akien-mga akien-mga modified the milestones: 4.3, 4.x Jul 17, 2024
@RandomShaper
Copy link
Copy Markdown
Member

First, as a disclaimer, note that I haven't reviewed if the logic is exactly equivalent, but it has been reviewed by others and any failure I guess would be pretty notorious.

Now, I think the changes are very good. If anything, I'd use LocalVector instead of Vector for the thread-local optimization. By the way, the thread_local LocalVector is a pattern we're already using in some other areas of the engine (i.e., rendering). In the future we may want to merge all of them into some kind of arena allocators or something (to try to squeeze even a bit more perf), but at this point, this is already an excellent optimization.

@akien-mga akien-mga modified the milestones: 4.x, 3.x, 4.4 Nov 14, 2024
@KoBeWi
Copy link
Copy Markdown
Member

KoBeWi commented Nov 14, 2024

Yeah this should use LocalVector.

@Ivorforce
Copy link
Copy Markdown
Member

Ivorforce commented Nov 14, 2024

I'd be interested which of the changes cause how much of a difference. Specifically, I'm somewhat skeptical of the thread_local use - the rest is harmless i suppose - but if it contributes a significant margin to the speed, it's of course worth it.

@Repiteo Repiteo modified the milestones: 4.4, 4.x Jan 20, 2025
Copy link
Copy Markdown
Contributor

@aaronp64 aaronp64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we can use thread_local here, as one notification could cause another to be called before it's finished - the second call would overwrite the values in the thread_local vector from the first call while it's still iterating through them. For example, the code below prints "Base1 Base2 Child2 Child1" on master, but "Base1 Base2 Child2 Child2" with this change - the second inheritance chain is overwriting the first in between the Base1 and Child1 calls, so Child1 is replaced with Child2:

func _ready():
	var child1 := Child1.new()
	child1.notification(NOTIFICATION_ENABLED)

class Base1 extends Node:
	func _notification(what: int) -> void:
		if what == NOTIFICATION_ENABLED:
			print("Base1")
			var child2 := Child2.new()
			child2.notification(NOTIFICATION_ENABLED)

class Child1 extends Base1:
	func _notification(what: int) -> void:
		if what == NOTIFICATION_ENABLED:
			print("Child1")
		
class Base2 extends Node:
	func _notification(what: int) -> void:
		if what == NOTIFICATION_ENABLED:
			print("Base2")

class Child2 extends Base2:
	func _notification(what: int) -> void:
		if what == NOTIFICATION_ENABLED:
			print("Child2")

Maybe we could loop through the _base pointers to get the count, then use alloca instead of the List to get a similar speedup?

@DeeJayLSP
Copy link
Copy Markdown
Contributor

DeeJayLSP commented Apr 12, 2025

Out of curiosity I tested both removing thread_local and using LocalVector. The profiler results in my machine:

Build cycles:Pu
Original 0.105%
PR 0.0299%
PR (no thread_local) 0.022%
PR (no thread_local, LocalVector) 0.0209%

This was my first time using a profiler, so please take this with a grain of salt.

Edit: tested again using the test project above:

Build cycles:Pu Improvement
Original 0.0355% -
PR 0.0258% 37.59%
PR (no thread_local) 0.0258% 37.59%
PR (no thread_local, LocalVector) 0.0103% 244.66%
PR (LocalVector) 0.00694% 411.52%

Given the issue with thread_local above, I'd go without it just to be safe.

Comment thread modules/gdscript/gdscript.cpp Outdated
Comment thread modules/gdscript/gdscript.cpp Outdated
@DeeJayLSP

This comment was marked as outdated.

@CrazyRoka CrazyRoka force-pushed the optimize-gdscript-notifications branch from afaa94c to c3f8998 Compare September 29, 2025 21:51
@CrazyRoka
Copy link
Copy Markdown
Contributor Author

@DeeJayLSP Thanks for the advice and deep investigations. I applied your suggestion and used LocalVector.

Copy link
Copy Markdown
Member

@Ivorforce Ivorforce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes still don't address Aaron's concerns (#94118 (review)). This needs to be fixed by removing static.
While not using a static might limit the performance somewhat, replacing List is the most important part, so this should still be faster than master in either case.
If we wanted to improve this further, we should use reserve or alloca, but that would require a cached depth on the GDScript, so this is for another time.

Comment thread modules/gdscript/gdscript.cpp Outdated
Comment thread modules/gdscript/gdscript.cpp Outdated
@CrazyRoka CrazyRoka force-pushed the optimize-gdscript-notifications branch from c3f8998 to bcc0922 Compare September 30, 2025 18:59
@CrazyRoka CrazyRoka requested a review from Ivorforce September 30, 2025 19:00
Copy link
Copy Markdown
Member

@Ivorforce Ivorforce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While there are more ways to improve this, this should be a straight upgrade on all fronts. Thanks for keeping up with the reviews! Let's finally get this merged.

@Repiteo Repiteo modified the milestones: 4.x, 4.6 Oct 1, 2025
@Repiteo Repiteo merged commit 7b5ee98 into godotengine:master Oct 1, 2025
20 checks passed
@Repiteo
Copy link
Copy Markdown
Contributor

Repiteo commented Oct 1, 2025

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.