-
Notifications
You must be signed in to change notification settings - Fork 41
Allow fix_file
to return dataset objects
#2579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Enable lock sharing between ncdata and Iris Suppress load warnings Restore lat/lon coord units after load All fixes in fix_file
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2579 +/- ##
==========================================
+ Coverage 95.14% 95.16% +0.02%
==========================================
Files 259 259
Lines 15113 15157 +44
==========================================
+ Hits 14379 14424 +45
+ Misses 734 733 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
One more incentive why it would be great to have this feature: path = "/path/to/real_EMAC_file_with_855_variables.nc"
%%timeit
iris.load(path)
# 1min 47s ± 446 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
iris.load_raw(path)
# 1.98 s ± 7.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
xr.open_dataset(path, chunks="auto")
# 277 ms ± 8.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
netCDF4.Dataset(path, mode="r")
# 5.2 ms ± 23.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each) This uses the most up-to-date versions of all these packages (fresh install from 2025-05-09):
Since we sometimes need to read hundreds of those files (this example only contains a single time step), ESMValTool can spend hours just loading these datasets. |
fix_file
to return Cube
and CubeList
objectsfix_file
to return dataset objects
Description
Currently, if a file cannot be properly read with Iris, we use
fix_file
to create a copy of that file and modify it usingnetCDF4.Dataset
(example). This is very inefficient and slow.A much better way to deal with this is to read the file with ncdata or xarray and then use ncdata to convert that object to an Iris object. However, for this to work, we need to allow
fix_file
to return dataset objects (instead of paths) andload
to read dataset objects (instead of paths). This PR does that.Closes #2129
Related to #674
Link to documentation:
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
All checks below this pull request were successfulRemaining Codacy issues cannot be fixedTo help with the number pull requests: