You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
EventPipe sessions that aren't file or IPC sessions never garbage collect their "thread state session list" for exited threads. This results in a memory leak proportional to the number of threads that have ever started since the EventPipe session began, and for each EventPipe event, a single-threaded CPU leak caused by iterating over the ever-growing thread session state linked list when looking for the next event.
This behavior has been present since at least .NET 6.0 when EventPipe was rewritten into CoreCLR in C, and is still present in the latest main branch as of writing this.
Reproduction Steps
Create an EventListener subclass that enables CLR events, something like this:
classClrEventListener:EventListener{protectedoverridevoidOnEventSourceCreated(EventSourceeventSource){if(eventSource.Name.Equals("Microsoft-Windows-DotNETRuntime")){EnableEvents(eventSource,EventLevel.Informational,(EventKeywords)0x8000);// 0x8000 enables exception events in this case}}}
Instantiate the event listener subclass, spawn thousands of short-lived threads, and then do something that creates CLR events (e.g. throw and catch exceptions).
As the number of threads ever spawned increases, the .NET Long Runni thread that processes these CLR events will get slower and slower and tend towards 100% CPU time, even if the number of currently running threads stays constant.
EventPipe session performance should be at least proportional to number currently running threads, not threads that have ever existed during the session.
EventPipe "thread session states" should be garbage collected in all EventPipe session types, so that the length of the thread session state linked-list approximates the number of currently running threads.
Actual behavior
After a non-file, non-IPC EventPipe session is created, any new threads created will cause the .NET Long Runni thread to consume more and more CPU for each event it collects.
Regression?
I haven't tried this on .NET 5.0, so I'm not sure, but it's been around since at least .NET 6.0 when EventPipe was rewritten in C.
Known Workarounds
If the EventPipe session is closed and reopened, the new session will start with an empty "thread state session list" and the old one will have cleaned everything up. This also happens if an active EventPipe session is reconfigured, which ends up closing and opening a new session anyways.
Another workaround would be to only use file or IPC-type EventPipe sessions if possible, as these session types do run garbage collection on the "thread session state list".
Configuration
Tested locally on .NET 8.0.404 and .NET 9.0.101, on a Linux laptop running Ubuntu 24.04 (kernel 6.8.0) with an Intel i7-13800H processor.
Have seen this in production in containers on both Intel and AMD processors, all x86, on various Linux kernels. I don't believe this issue is specific to any particular system or architecture.
Other information
When I test creating 5,000 synchronous, short-lived threads, and then throwing 5,000,000 exceptions on .NET 9.0.101 on my Linux laptop, perf reports the following "Self" times with libcoreclr.so offsets:
26.23% .NET Long Runni libcoreclr.so [.] 0x00000000004f9eda
13.61% .NET Long Runni libcoreclr.so [.] 0x00000000004f9ed6
7.88% .NET Long Runni libcoreclr.so [.] 0x00000000004f9ed2
For some reason, on my laptop, the perf addresses are off by exactly 0x1000. I couldn't tell you why, but I have validated this behavior with .NET 8.0.404 on my laptop, and I've seen this exact same issue in at least 5 different production systems all on different .NET versions with correct perf addresses.
Description
EventPipe sessions that aren't file or IPC sessions never garbage collect their "thread state session list" for exited threads. This results in a memory leak proportional to the number of threads that have ever started since the EventPipe session began, and for each EventPipe event, a single-threaded CPU leak caused by iterating over the ever-growing thread session state linked list when looking for the next event.
This behavior has been present since at least .NET 6.0 when EventPipe was rewritten into CoreCLR in C, and is still present in the latest
main
branch as of writing this.Reproduction Steps
Create an
EventListener
subclass that enables CLR events, something like this:Instantiate the event listener subclass, spawn thousands of short-lived threads, and then do something that creates CLR events (e.g. throw and catch exceptions).
As the number of threads ever spawned increases, the
.NET Long Runni
thread that processes these CLR events will get slower and slower and tend towards 100% CPU time, even if the number of currently running threads stays constant.I made a GitHub repo to fully demonstrate and reproduce this issue.
Expected behavior
EventPipe session performance should be at least proportional to number currently running threads, not threads that have ever existed during the session.
EventPipe "thread session states" should be garbage collected in all EventPipe session types, so that the length of the thread session state linked-list approximates the number of currently running threads.
Actual behavior
After a non-file, non-IPC EventPipe session is created, any new threads created will cause the
.NET Long Runni
thread to consume more and more CPU for each event it collects.Regression?
I haven't tried this on .NET 5.0, so I'm not sure, but it's been around since at least .NET 6.0 when EventPipe was rewritten in C.
Known Workarounds
If the EventPipe session is closed and reopened, the new session will start with an empty "thread state session list" and the old one will have cleaned everything up. This also happens if an active EventPipe session is reconfigured, which ends up closing and opening a new session anyways.
Another workaround would be to only use file or IPC-type EventPipe sessions if possible, as these session types do run garbage collection on the "thread session state list".
Configuration
Tested locally on .NET 8.0.404 and .NET 9.0.101, on a Linux laptop running Ubuntu 24.04 (kernel 6.8.0) with an Intel i7-13800H processor.
Have seen this in production in containers on both Intel and AMD processors, all x86, on various Linux kernels. I don't believe this issue is specific to any particular system or architecture.
Other information
When I test creating 5,000 synchronous, short-lived threads, and then throwing 5,000,000 exceptions on .NET 9.0.101 on my Linux laptop,
perf
reports the following "Self" times withlibcoreclr.so
offsets:For some reason, on my laptop, the
perf
addresses are off by exactly 0x1000. I couldn't tell you why, but I have validated this behavior with .NET 8.0.404 on my laptop, and I've seen this exact same issue in at least 5 different production systems all on different .NET versions with correctperf
addresses.Unfortunately,
libcoreclr.so
isn't built with debug symbols, and no debuginfo is published for it, so I've had to reverse-engineerlibcoreclr.so
to match the disassembly with the source code to figure out what is going on here. Below are the .NET 9.0 slow instructions in question:This is the section from
buffer_manager_move_next_event_any_thread
that iterates each thread session state to find a single event, which will then be sent back through to the EventListener.The text was updated successfully, but these errors were encountered: