-
Notifications
You must be signed in to change notification settings - Fork 530
Limit compilation units cache size #626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @eliben Could you please have a look? Thanks! |
keyword syntax. | ||
max_cu_cache_size: | ||
Enforces a limit on CU cache size, For unlimitted cache size set to -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size limit is on the number of entries/CUs? The comment should say this in more detail.
@sevaa WDYT? |
Only helps with the firehose parsing scenario with no storing of results, and only if you have many small CUs as opposed to several large ones. Also, keeping the cache alive (and the memory usage up) by accident is really easy; holding a reference to as much as one DIE in an evicted CU keeps the whole CU (with caches) in place. This patch may alleviate some OOM-on-large-binary scenarios that users are complaining about, but I can see OOM scenarios it won't address. By the time the OOM condition gets to us, it's usually boiled down to a minimal example, but we don't get to see the users' real life code that crashes. |
Thanks, @sevaa @mohamedelkony in addition to the other comment, could you add a test that exercises this functionality? |
Also, I don't like adding extra parameters to the Also, my preferred approach would be a plain no-cache mode rather than a limit on cache size. A limit needs to be pretty much hand tuned to a specific binary or corpus - note how in the OP's own example, the cache limit is zero, effectively no cache mode anyway. The CU cache is a relatively recent addition - making its maintenance optional won't be too much of a lift IMHO. The DIE cache in the CU would be slightly trickier to tackle, and finally, DIEs have a way of storing references to one another ( |
Issue:
Iterated CUs (along side with each CU's DIEs cache) are kept in cache as long as DARWF object is alive, While cache can be useful for hitting DIEs shared be CUs and other use cases
A simple loop over CUs and their DIEs keeps all DIEs of ELF stored in memory until DWARF object is no longer referenced.
For a big ELF file ours is ~100 MBs this causes all DIEs of all CUs to stay in memory while we only need to parse DIEs of current CU and then we no longer need to keep those DIE in memory leading to memory leak as we will never use those DIEs on previous CUs (except for interleaved DIE case).
This leads to excessive memory usage it reaches 7 GBs, We parse 5 files in parallel this leads to 35 GBs memory usage
Fix:
Add possibly to set max cache size on cached CU, Purge Chace FIFO on reaching max size
Kept behavior same as before cache size is unlimited
Results:
Reduced memory usage by 90%
Memory usage for our 100MB ELF file
Before: 4GB
After: 350 MBs
Used test case: