⚡️ Speed up method Cube.add_aux_factory by 5,758%
#26
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 5,758% (57.58x) speedup for
Cube.add_aux_factoryinlib/iris/cube.py⏱️ Runtime :
43.1 milliseconds→735 microseconds(best of12runs)📝 Explanation and details
The key optimization targets the performance bottleneck in the
add_aux_factorymethod. The original code had an expensive O(N) membership checkref_coord not in cube_coordsinside a loop, creating O(N*M) time complexity where N is the number of coordinates and M is the number of factory dependencies.What was optimized:
Replaced list-based membership checks with set-based identity checks: Instead of using
ref_coord not in cube_coords(which requires linear scanning), the code now creates a set of coordinate IDs usingcube_coords_id_set = {id(coord) for coord in cube_coords}and checksid(ref_coord) not in cube_coords_id_set.Eliminated the nested helper function: The original
coordsonlyhelper function created temporary lists and was called twice. This is replaced with direct list building in a single pass.Why this leads to speedup:
id()for object identity comparison is faster than Python's default equality checksPerformance characteristics:
The line profiler shows the critical line
if ref_coord is not None and ref_coord not in cube_coords:took 98.5% of execution time in the original (167ms), but the optimized version withid(ref_coord) not in cube_coords_id_settakes only 8.7% (254μs) - a ~650x improvement on this specific operation. The overall method speedup of 5758% demonstrates how a single algorithmic improvement can dramatically impact performance when the bottleneck is properly identified.This optimization is particularly effective for test cases with many coordinates or multiple factory dependencies, where the quadratic behavior of the original code becomes pronounced.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
integration/test_pp.py::TestVertical.test_hybrid_height_with_non_standard_coordsintegration/test_pp.py::TestVertical.test_hybrid_pressure_with_non_standard_coordsunit/cube/test_Cube.py::Test_add_metadata.test_add_valid_aux_factoryunit/cube/test_Cube.py::Test_add_metadata.test_error_for_add_invalid_aux_factory⏪ Replay Tests and Runtime
test_pytest_libiristestsintegrationtest_netcdf__loadsaveattrs_py_libiristestsunitlazy_datatest_non_lazy_p__replay_test_0.py::test_iris_cube_Cube_add_aux_factoryTo edit these changes
git checkout codeflash/optimize-Cube.add_aux_factory-mh52lg9jand push.