Add size-segregated buckets in UnlockedLoaderHeap#129203
Conversation
There was a problem hiding this comment.
Pull request overview
This PR restructures UnlockedLoaderHeap’s free list to reduce allocation-time overhead in backout-heavy scenarios by replacing a single linear-scanned free list with size-segregated buckets for common small block sizes plus an overflow list for larger blocks.
Changes:
- Replaces
m_pFirstFreeBlockwith 32 size buckets (pointer-size increments) and a separate “large/overflow” free list. - Updates free-block insertion/allocation logic to use bucketed O(1) reuse for small sizes, and retains linear scanning only for the overflow list (including a stress-log warning on long scans).
- Adjusts debug-only free-list dumping and validation to iterate across all buckets and the overflow list.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/coreclr/utilcode/loaderheap.cpp | Implements bucket initialization, bucket-aware allocation/insertion, overflow scan warning, and updates debug dump/validation to traverse buckets. |
| src/coreclr/utilcode/loaderheap_shared.h | Updates LoaderHeapFreeBlock API to no longer take an explicit head pointer (heap chooses bucket internally). |
| src/coreclr/inc/loaderheap.h | Adds bucket/overflow free list fields and related constants to UnlockedLoaderHeap. |
| size_t bucket = dwSize / UnlockedLoaderHeap::FreeListBucketSize - 1; | ||
| _ASSERTE(bucket >= 0); | ||
| if (bucket < UnlockedLoaderHeap::NumFreeListBuckets) | ||
| return &pHeap->m_freeListBuckets[bucket]; | ||
| return &pHeap->m_pLargeFreeBlock; |
| size_t bucket = dwSize / UnlockedLoaderHeap::FreeListBucketSize - 1; | ||
| _ASSERTE(bucket >= 0); | ||
| bool blockFound = false; |
| #ifndef DACCESS_COMPILE | ||
| static void InsertFreeBlock(LoaderHeapFreeBlock **ppHead, void *pMem, size_t dwTotalSize, UnlockedLoaderHeap *pHeap); | ||
| static void *AllocFromFreeList(LoaderHeapFreeBlock **ppHead, size_t dwSize, UnlockedLoaderHeap *pHeap); | ||
| static void InsertFreeBlock(void *pMem, size_t dwTotalSize, UnlockedLoaderHeap *pHeap); | ||
| static void *AllocFromFreeList(size_t dwSize, UnlockedLoaderHeap *pHeap); |
Where is this race condition exactly? Can we add a lock there instead? The freelist in LoaderHeap is meant to be only used in error conditions to backout types that failed to load, or to deal with rare race condition. If you see the freelist growing this much, it means that the loader heap is not used correctly. We should fix that instead. |
|
I'll take a look, this the part where many threads lose runtime/src/coreclr/vm/genmeth.cpp Lines 493 to 542 in 24547a7 |
This reverts commit 96f0039.
I was working with the test in https://github.com/korchak-aleksandr/net10-regression-repro and found out that the free list in UnlockedLoaderHeap grows to thousands of elements, which makes allocations very slow since we do a linear scan of this free list for each one of them.
In these scenarios multiple threads might need the same generic instantiation simultaneously and they all race to create/publish it. Multiple threads can lose the race and quickly add blocks to the free list since they don't need that memory. Subsequent calls that need generic instantiations do a linear scan of the free list to find a memory block to reuse. This ends up taking a lot of time due to its size (can be up to ten of thousands).
This PR splits the single list into 32 lists, each corresponding to free blocks of 8 bytes, 16 bytes, ..., 256 bytes (which are the most common sizes). We have another one for blocks bigger than 256 bytes.