Add Memory::alloc_static_zeroed to allocate memory that's filled with zeroes.#104124
Conversation
|
#91633 looks similar, adding For naming I'm leaning more toward |
This is the same idea indeed! I think calling the function I instead opted to call it |
This comment was marked as resolved.
This comment was marked as resolved.
|
I found use for this in these files (all of them have occurrences of
|
34261cf to
adb793d
Compare
|
Thanks a lot @DeeJayLSP! |
|
I benchmarked a ~4% improvement for (large) // Setup
HashMap<int, int> hash_map;
auto t0 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 20000000; i ++) {
// Test
hash_map.insert(i, i);
}
auto t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n";This printed:
|
adb793d to
6b0cd9c
Compare
|
I rebased (to include #106020 changes), and ran a slightly more complicated test (to cover more cases): for (int size : { 1, 2, 8, 64, 1024, 4096}) {
auto t0 = std::chrono::high_resolution_clock::now();
for (int run = 0; run < 20000000 / size; run++) {
HashMap<int64_t, int64_t> dictionary;
for (int idx = 0; idx < size; idx ++) {
// Test
dictionary.insert(idx, idx);
}
}
auto t1 = std::chrono::high_resolution_clock::now();
std::cout << "size:" << size << std::endl;
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n";
}This printed on master (db66343): On this pr: This new test suggests that, at least for fundamental types in key/value, Admittedly, some of the speed difference may come from the new version using |
…th zeroes. This is generally faster than `malloc` followed by `memset` / loop-set to 0.
6b0cd9c to
3207066
Compare
|
Out of interest, I just tested again using master: memset: It's the same test, but quite different results from last time for |
|
Thanks! |
This is generally faster than
mallocfollowed bymemset. It's essentially theMemoryequivalent ofcalloc.The reason is that pages handed out by the OS are usually zeroed out already. This is for security purposes, as you cannot risk handing out traces of memory used by another application. The OS will do this in its spare time.
Most allocators make use of this in
calloc. If large amounts of memory are requested, it will just request zeroed pages from OS, and can hand them back immediately. If small amounts are requested, the allocator will just usememsetto clear out the requested memory.This can be exploited for performance benefit for e.g.
resize_zeroedandis_zero_constructiblestructs: If we request zeroed pages for these calls, we can omit our ownmemsetcalls, which will make the allocation near instantaneous.