-
Notifications
You must be signed in to change notification settings - Fork 92
Description
Uproot version 5.7.1
For my analysis I have many intermediate histogram products I am constructing in python and wish to save into a temporary file so I don't have to re-run the time-consuming pre-processing stage. For some reason though, partway through my processing the root file becomes corrupted and uproot.update("./intermediate.root") fails on the file due to the assert on this line. At that point uproot is still able to read the histograms in the file and ROOT can still open the file without reporting issues.
Here is a download link (~360MB) to the corrupted root file: Google Drive - intermediate.root
Basically the preprocessing consists of:
# We will store intermediate results in "intermediate.root"
with uproot.create("intermediate.root") as file:
pass
for raw_data_file in raw_data_files:
# Read in data...
# Construct hist Histograms...
directory = 'rawroot_' + str(raw_data_file_number) + '/'
with uproot.update("intermediate.root") as file:
output_hist: hist.Hist
for output_hist in output_hist_list:
file[directory + output_hist.name] = output_hist
I don't see any obvious issues with this approach, but anyways the pre-processing fails partway through due to the corruption issue I observe blocking further writes to the intermediate result file.
Here are some bug-testing steps I have taken:
-
Maybe the issue is because of the large number of histograms I am writing to the file?
It didn't seem likely to me, but anyways I ran a toy example where I wrote 100,000 histograms to a file successfully using the same directory structure I have in my sample file. No issues here. -
Maybe the last histograms I wrote into the file during pre-processing are corrupt for some reason?
I checked the possibly suspicious histograms in python before they get written and they seem fine. -
What if I try writing each rawroot raw data file's intermediate output into a separate new intermediate ROOT file instead of using directories like in my sample file?
This works! Pre-processing completes and I can continue with analysis. So, for now I am using this workaround -
Since I can read the
intermediate.rootfile just fine, I tried re-reading all written histograms back in and immediately copying them to another root file.
So long as I perform this all inside a singlewithblock, there is no issue to keep writing more histograms:
# This works
with uproot.open('intermediate.root') as in_file:
with uproot.create('intermediate_copy.root') as out_file:
# Copy all histograms from in to out
# Now write more histograms to out_file, this succeeds
pass
pass
# Now if I try opening with uproot.update('intermediate_copy.root') I have no issues
However, just making a copy and re-opening the copy does not
# This doesn't work
with uproot.open('intermediate.root') as in_file:
with uproot.create('intermediate_copy.root') as out_file:
# Copy all histograms from in to out
pass
pass
# Fails with same assert error
with uproot.update('intermediate_copy.root') as file:
pass
So, I'm kind of at a loss what the problem might be. My best (only?) guess at this point is that repeatedly re-opening, adding a few histograms, and then closing a ROOT file is somehow causing this corruption. Then again, performing a deep copy of all histograms in my file using just a single while block also results in a corrupted file. Anyways, I have a workaround solution for now (using multiple intermediate files - one for each raw data file), but I'm curious what's causing this.