Skip to content

Conversation

mohamedelkony
Copy link

@mohamedelkony mohamedelkony commented Sep 7, 2025

Issue:

Iterated CUs (along side with each CU's DIEs cache) are kept in cache as long as DARWF object is alive, While cache can be useful for hitting DIEs shared be CUs and other use cases

A simple loop over CUs and their DIEs keeps all DIEs of ELF stored in memory until DWARF object is no longer referenced.
For a big ELF file ours is ~100 MBs this causes all DIEs of all CUs to stay in memory while we only need to parse DIEs of current CU and then we no longer need to keep those DIE in memory leading to memory leak as we will never use those DIEs on previous CUs (except for interleaved DIE case).

This leads to excessive memory usage it reaches 7 GBs, We parse 5 files in parallel this leads to 35 GBs memory usage

Fix:

Add possibly to set max cache size on cached CU, Purge Chace FIFO on reaching max size
Kept behavior same as before cache size is unlimited

Results:

Reduced memory usage by 90%

Memory usage for our 100MB ELF file

Before: 4GB

image

After: 350 MBs

image

Used test case:

image

@mohamedelkony mohamedelkony marked this pull request as ready for review September 7, 2025 20:13
@mohamedelkony
Copy link
Author

Hi @eliben Could you please have a look? Thanks!

keyword syntax.
max_cu_cache_size:
Enforces a limit on CU cache size, For unlimitted cache size set to -1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size limit is on the number of entries/CUs? The comment should say this in more detail.

@eliben
Copy link
Owner

eliben commented Sep 8, 2025

@sevaa WDYT?

@sevaa
Copy link
Contributor

sevaa commented Sep 8, 2025

Only helps with the firehose parsing scenario with no storing of results, and only if you have many small CUs as opposed to several large ones. Also, keeping the cache alive (and the memory usage up) by accident is really easy; holding a reference to as much as one DIE in an evicted CU keeps the whole CU (with caches) in place.

This patch may alleviate some OOM-on-large-binary scenarios that users are complaining about, but I can see OOM scenarios it won't address.

By the time the OOM condition gets to us, it's usually boiled down to a minimal example, but we don't get to see the users' real life code that crashes.

@eliben
Copy link
Owner

eliben commented Sep 8, 2025

Thanks, @sevaa

@mohamedelkony in addition to the other comment, could you add a test that exercises this functionality?

@sevaa
Copy link
Contributor

sevaa commented Sep 8, 2025

Also, I don't like adding extra parameters to the DWARFInfo constructor :) It's effectively a public API on the DWARF-not-in-ELF side of things - gratuitous breaking change and all that. As we (and the DWARF committee, and the compiler vendors) add more sections into the mix, an optional final parameter will break the constructor invocations in a way that's no fun to debug. A section object will end up taking the place of the cache limit, and they will get a bogus error of "cannot compare a section descriptor to a number" way downstream... Not cool.

Also, my preferred approach would be a plain no-cache mode rather than a limit on cache size. A limit needs to be pretty much hand tuned to a specific binary or corpus - note how in the OP's own example, the cache limit is zero, effectively no cache mode anyway. The CU cache is a relatively recent addition - making its maintenance optional won't be too much of a lift IMHO. The DIE cache in the CU would be slightly trickier to tackle, and finally, DIEs have a way of storing references to one another (_parent and _terminator) that also ends up eating up memory, if DIEs references are long lived.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants