-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase memory consumption in CMSSW_15_0_X #47470
Comments
cms-bot internal usage |
A new Issue was created by @mandrenguyen. @Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign core, reconstruction, simulation |
New categories assigned: core,reconstruction,simulation @civanch,@Dr15Jones,@jfernan2,@kpedro88,@makortel,@mandrenguyen,@mdhildreth,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Note that for 15.0.0 we are using ROOT 6.32 where as 14.1 and 14.2 release cycles are using ROOT 6.30. |
I ran the workflow 1000.0 in 14_2_0 and 15_0_0 with MaxMemoryPreload, and can confirm the ~300 MB increase (269 MB, to be more precise) in step 2 (RAW2DIGI_L1Reco_RECO_ALCAPRODUCER_DQM)
Other steps of the workflow
|
I ran the same job as Matti at FNAL on an el8 VM. For step2
for step3
These are consistent with what Matti sees. I was suprised that step3 takes considerably more memory than step2. |
I ran the 1000.0 for the 15_0_X prereleases. In order to reduce the impact of the fluctuations (in ONNXRuntime) discussed in #46966, I ran the step2 4 times in each (including 14_2_0 and 15_0_0). The results on "max memory used" were stable within the precision I report below (so maybe this Run 1 workflow does not use ONNX)
We updated ROOT in pre2, so apparently it doesn't play a big role in this puzzle. |
|
I ran the step2 of 1000.0 in CMSSW_14_2_0_pre2 and CMSSW_14_2_0_pre2_ROOT632. The "max memory used" increased by 16 MB with ROOT 6.32 (from 2511 MB to 2527 MB). So it seems the memory increase comes from mostly from elsewhere. |
So I ran the ModuleAllocMonitor service in the 1000.0 workflow step2 for both releases. That service watches all allocations/deallocations for all modules and outputs that data to a file. I then created a simple script to compare the results of the two files. In the following output a prefix of '1' denotes CMSSW_14_2 output and '2' denotes CMSSW_15_0. The script found some differences in the configurations (to be expected)
as well as some modules which ran different (probably because they were deleted early)
Then for the cases where there was agreement, I made a summary which totaled up the 'added' memory and then found the maximum of 'maxTemp' as well. For the The source shows minimum difference
the memory changes when we clear the event are
So it looks like CMSSW_15 holds onto more memory in the Event (i.e. it deletes more at the end of the Event). Seems to be about 15MB Then the modules with the largest change in their memory use are
So we added about 80MB in CMSSW_15 with these largest ones but lost 20MB for topSingleMuonMediumDQM. |
I modified the script to break out memory usage in construction to be separate from the rest. It had limited effect except for the two largest outliers
|
To try to better understand the memory usage, I plotted amount of outstanding allocations as a function of time (using PeriodicAllocMonitor) for each release for the step2 job It looks like the memory increase happens at the very beginning of the job and then the difference stays pretty constant for the rest of the job. |
Do these comments
suggest most of the ~260 MB increase would be in the framework (or somewhere else not covered by the ActivityRegistry signals)? |
So I used ModuleEventAllocMonitor (which tracks individual allocation/dealloc within a full event to determine module allocations related to the algorithm vs those for the data products put in the event). This only tracks ED modules and only for work done during the event, not during construction, Run or LuminosityBlock processing. Doing a comparison of the two releases using the step2 job used earlier, I see that on average the CMSSW_15_0 modules retain less data (about 110kB) compared to CMSSW_14_2. As this is average retained, if the vast majority of the data is gotten at the beginning of the job and then nothing else added, then this number needs to be multiplied by 100 (since that is the number of events for which it was averaged). So at most CMSSW_15 would be saving about 11MB. As for data productions, CMSSW_15_0 is a bit larger (110kB) the vast majority coming from an increase in memory used in data products coming from particleFlowTmp/PFProducer (158kB). So it looks like the memory increase does not come from work which happens while processing Events with ED modules. |
I ran IgProf for the step2 of 1000.0 in 14_2_0 and 15_0_0_pre1 (that one had the largest increase, and pre1 we still had an slc7 build), full profiles here 14_2_0 and 15_0_0_pre1 The profiles show 158 MB difference via (skipping some stack frames)
So it looks like a case of cling autoparsing on a type that is consumed and does not have a dictionary |
So the code associated to @makortel 's traceback is in the loop in cmssw/DataFormats/Provenance/src/ProductRegistry.cc Lines 310 to 312 in 059eb10
we have the call to cmssw/DataFormats/Provenance/src/ProductRegistry.cc Lines 474 to 475 in 059eb10
which is defined here
|
With @Dr15Jones we have determined that the culprit for the ~160 MB increase is in the cmssw/DataFormats/Provenance/src/ProductRegistry.cc Lines 573 to 577 in 059eb10
In 14_2_0, the In 15_0_0_pre1, the same call requests total of 2.3 GB(!), of which 159 MB is left at the end of the call (largest single allocation being exactly 2 MB). |
We have determined
In this case the missing dictionary call chain was processing Another type that appears to lead to large memory allocations is |
With The |
I verified with the AllocMonitors that with the aforementioned change
|
@pcanal it appears that IF the shared library containing a pre-built dictionary is loaded as part of a TFile open (in this case it is brought in as a dependent of a library where the original library has a dictionary we are searching for) it sets up the TClass with enough information so that when we call But IF the shared library has not yet been loaded in the job and it happens during the So the question is, is there a different code path within ROOT between how TFile::Open triggers a TClass to be read vs the call to |
So in the 15_0_0_pre1 header-parsing case, the following shared libraries are loaded during the
In addition, a
|
During the same
In addition headers from the following externals were opened
|
The code path are quite different with |
There are code paths in |
Here is a partial stack trace corresponding #47470 (comment) in a ROOT debug build
The
|
Here is a stack trace for a "breakpoint" in 2 MB allocation (during the same
On stack frame 75 it seems like the header parsing is done for type On stack frame 81 the |
My two cents: from the RECO profiling, we started to see the following memory messages since 15_0_0_pre2, not in pre1: MemoryReport> EndJob: virtual size 5262.04 Mbytes, RSS 2575.98 Mbytes, PSS 1878.62 MBytes, Private 1878.62 Full details in the Reco profiling web page: Changes in RECO from pre1 to pre2 were numerous: |
Those new messages are from #46859, which first appeared in pre2. |
If a library was loaded, the alias for a class were also loaded and a non normalized name might lead to a match. If we skip this check, then the next step will normalized the name and check for typedef inside the name (if it is a template instance name) and if the name are not found as is, it may lead to loading the header file. ie. this fixes root-project#17992 and cms-sw/cmssw#47470
The problem should be solved by root-project/root#17994 |
thanks @pcanal for root-project/root#17994 . @makortel I have tested this change for our ROOT master based IBs and max memory allocation is decreased ( see details here for cms-sw/root#220 tests) . @pcanal , do you have a backport of root-project/root#17994 for root 6.32 branch? |
The backport will be done as soon as the main PR has been reviewed. |
While building 15_0_0 it was noticed that the memory footprint of standard workflows has increased significantly since 14_2_0, as pointed out here:
#47468 (comment)
With data taking around the corner, it's important to understand the origin of this increase and potentially mitigate it.
The text was updated successfully, but these errors were encountered: