Speeds up module.add_debug_info and add_metadata by 2x#887
Speeds up module.add_debug_info and add_metadata by 2x#887bslatkin wants to merge 11 commits intonumba:mainfrom
Conversation
|
Looks like I need to run flake8 and address some style issues. Will do that shortly. |
|
Thanks for the PR! I'm curious about
Is there any public info about your compiler? |
|
It's not public yet! Hopefully early next year I'll publish it for people to see. There are a couple of sneak peeks here: https://twitter.com/haxor/status/1580959518984241158 I will also share the private project with you, Graham, so you can take a look for context. The part that this PR specifically improves is how I assign Thank you for looking and the attention on this PR! |
|
Is there anything else you need from me on this PR? Just making sure. Thank you for the reviews! |
|
Thanks for checking in - it's taking me a little while to review as I'm unfamiliar with most of the parts that are changed in this PR - apologies for the delay. I do have some review notes drafted but I'd like to finish the review before I post them. |
gmarkall
left a comment
There was a problem hiding this comment.
Many thanks for the PR, and apologies for the delay in reviewing.
The performance measurement reported by test_debug_info_caching is more than doubled (execution time is less than half) when using this patch on my machine, but I can see how it has even more performance benefit for real-world use cases (as you report in the description). I think this is a good change overall.
I have a few questions / comments, which are mainly on the diff.
In addition, I think the debug metadata performance measurement is valuable to have because it'll enable us to work out where a performance regression occurred (if we notice one) but I don't think we can keep it in the unit test suite itself - generally the test suite should not output anything, and the test more than doubles the testsuite execution time on my system. Could you please:
- Copy the code from
test_debug_info_cachinginto a standalone example in theexamplesdirectory, complete with timing and output? - Remove the timing and printing from
test_debug_info_metadata, and have the test code only execute a single time (as opposed to 10) so that the test takes less time? I think it should be possible to simplify the test and make it much more "straight-line code" since separate functions won't be needed fortimeitin the context of the test.
I hope the comments are clear and helpful - please let me know if any clarification or additional information would be helpful.
| def __init__(self): | ||
| self._useset = set(['']) | ||
| self._basenamemap = defaultdict(int) | ||
| self._basenamemap = {} |
There was a problem hiding this comment.
Is a defaultdict slower than a normal dict? It seems that making this change here requires you to then emulate a default dict below - is there a reason not to keep it as a defaultdict?
There was a problem hiding this comment.
(Follow-up - I tried reverting the changes in this file and it seemed to make no performance difference in test_debug_info_caching)
| try: | ||
| ident = self._basenamemap[basename] | ||
| except KeyError: | ||
| ident = 0 |
There was a problem hiding this comment.
This is the emulation of a default dict I mentioned in the comment above - could it just be
| try: | |
| ident = self._basenamemap[basename] | |
| except KeyError: | |
| ident = 0 | |
| ident = self._basenamemap[basename] |
with _basenamemap remaining a defaultdict?
| types.MetaDataType(), | ||
| name=name) | ||
| self.operands = tuple(values) | ||
| self.hash_cache = None |
There was a problem hiding this comment.
It feels like hash_cache is an implementation detail rather than an attribute we'd want to expose - should this be _hash_cache instead?
| # Ensure that the hash is not cached between Python invocations | ||
| # due to pickling or other serialization. The hash seed changes | ||
| # which will cause these not to match. | ||
| self.hash_cache = None |
There was a problem hiding this comment.
It seems a bit odd to mutate state inside __getstate__ - do you see any downside to removing the hash cache from a copy of the dict?
| self.hash_cache = None | ||
| return self.__dict__ |
There was a problem hiding this comment.
e.g.
| self.hash_cache = None | |
| return self.__dict__ | |
| state = self.__dict__.copy() | |
| state['hash_cache'] = None | |
| return state |
| self.is_distinct = is_distinct | ||
| self.kind = kind | ||
| self.operands = tuple(operands) | ||
| self.hash_cache = None |
There was a problem hiding this comment.
Similar comment on the name to above, i.e. should it be _hash_cache?
| # Ensure that the hash is not cached between Python invocations | ||
| # due to pickling or other serialization. The hash seed changes | ||
| # which will cause these not to match. | ||
| self.hash_cache = None |
There was a problem hiding this comment.
Also should this invalidate the hash cache on a copy?
|
|
||
| print('test_debug_info_performance took', total_time, 'to finish') | ||
|
|
||
| self.assertEqual(100004, len(mod._metadatacache)) |
There was a problem hiding this comment.
I think the expected result could do with a comment explaining its derivation - it took me a while to work out that it's 100000 locations, plus one file, one compile unit, one subprogram, and [di_subprogram, di_subprogram].
| md = self._metadatacache[key] | ||
|
|
||
| fixed_operands = self._fix_metadata_operands(operands) | ||
| key = hash(tuple(fixed_operands)) |
There was a problem hiding this comment.
In the PR description, I see
Only generates the key for module._metadatacache a single time instead of twice on cache miss
I'm not sure I can understand where it was generated twice before - am I missing that somewhere?
In any case, I do think the changes here are an improvement - reverting the specific changes to this file does slow down the performance test in test_debug_info_caching below.
|
Hi @gmarkall! My apologies for the delay. This is great feedback. I'll start working through it and update the PR once I've addressed everything. Thank you |
|
@bslatkin Hi! and thank you for your initial work on this. I am the release manager for 0.41.x and am checking on the status of this PR. I see that it has been in progress for quite some time. 0.41.x is now scheduled to be tagged during the week 7th August 2023, which is in 4 weeks time. What do you think, will this PR make it by then? |
|
Thank you for the encouragement! I'm behind where I wanted to be on this, obviously :) I'll give it a try to turn it around before then. But I'm not sure how much additional review it might take, especially during the summer time. So perhaps I should aim for the release after that? |
It's up to you, just let me know what you prefer. Reviewers are the most scarce resource in the Numba ecosystem so I share your gut feeling that perhaps the 0.42 release is a more realistic goal. Here is the Gnatt chart for the next few months, which shows the approximate tag dates for Numba 0.59 and 0.60 -- these will coincide with llvmlite releases, as the two libraries are released in lockstep: Effectively, the next releases will be in the Nov 2023 (0.42.x) and Feb 2024 (0.43.x) time-frame. |
|
@bslatkin Hope you are well - it's been a while! I'm having a bit of a tidy-up, and checking in - do you see a future for this PR? |
In my own compiler's codebase, this change reduces the time it takes for my debug info regression test to complete from 180 seconds down to 20 seconds. The more nesting of metadata there is, the larger the effect.