fix dims and coords order issue,Solved time order in NETcdf file#231
fix dims and coords order issue,Solved time order in NETcdf file#231rhaegar325 wants to merge 2 commits intomainfrom
time order in NETcdf file#231Conversation
time order in NETcdf file
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #231 +/- ##
==========================================
+ Coverage 49.02% 49.49% +0.47%
==========================================
Files 22 22
Lines 4241 4271 +30
==========================================
+ Hits 2079 2114 +35
+ Misses 2162 2157 -5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
a7aba3d to
5e42d97
Compare
|
@rbeucher I have combined our work and this pr can ensure |
|
Thanks @rhaegar325, I have been reading a bit more about NetCDF. The order of dimensions is important for variables, but the order of coordinate variables themselves is not important and is not enforced by CMIP. This bit: self.ds = xr.Dataset(ordered_vars, attrs=self.ds.attrs)This can:
I will take a closer look tomorrow, but I am inclined to keep things simple and rely on the changes I have already made. |
This violates the CMIP6 / NetCDF convention that
time, as the unlimited record dimension, must come first:(time, [lev], lat, lon).Ocean data was unaffected — see root cause analysis below.
Root Cause
The problem exists at three independent levels:
Level 1 — Data variable dimension order
CMIP tables define most variable dimensions as
longitude latitude time(time last).select_and_process_variables()transposes the data variable to match, producingtas.dims = (lon, lat, time).Level 2 — Per-variable
transpose()cannot fix dataset-level dimension orderingxr.Dataset.sizeskey order is determined by the order dimensions are first encountered across all variables when the Dataset is constructed. Assigningself.ds[var] = self.ds[var].transpose(...)only fixes that variable's own.dims; it does not changeds.sizeskey order.Level 3 — Why atmosphere differs from ocean
lat/lontypelat,londimensionsds.sizesresult{lat, bnds, lon, time}— time last ❌{time, lev, j, i}— time first ✓The old
reorder()usedcore=("lat", "lon", "time", "height"), causinglat/loncoordinate variables to be written to the NetCDF file beforetime, which made xarray placetimelast inds.coordson read-back.Changes
src/access_moppy/base.pyensure_time_first_dimension()— add Dataset reconstruction stepThe original implementation only transposed variables one by one, which fixed per-variable
.dimsbut leftds.sizes/ds.coordsordering unchanged. A second step is added: ifds.sizesdoes not already start withtime, reconstruct the Dataset placingtime/time_bndsfirst in the variable dict, so xarray encounters thetimedimension first during construction.#221 Solved