Skip to content

fix dims and coords order issue,Solved time order in NETcdf file#231

Open
rhaegar325 wants to merge 2 commits intomainfrom
fix_dimention_order_time_first
Open

fix dims and coords order issue,Solved time order in NETcdf file#231
rhaegar325 wants to merge 2 commits intomainfrom
fix_dimention_order_time_first

Conversation

@rhaegar325
Copy link
Collaborator

This violates the CMIP6 / NetCDF convention that time, as the unlimited record dimension, must come first: (time, [lev], lat, lon).

Ocean data was unaffected — see root cause analysis below.

Root Cause

The problem exists at three independent levels:

Level 1 — Data variable dimension order
CMIP tables define most variable dimensions as longitude latitude time (time last). select_and_process_variables() transposes the data variable to match, producing tas.dims = (lon, lat, time).

Level 2 — Per-variable transpose() cannot fix dataset-level dimension ordering
xr.Dataset.sizes key order is determined by the order dimensions are first encountered across all variables when the Dataset is constructed. Assigning self.ds[var] = self.ds[var].transpose(...) only fixes that variable's own .dims; it does not change ds.sizes key order.

Level 3 — Why atmosphere differs from ocean

Atmosphere Ocean
lat / lon type 1-D dimension coordinates — introduce lat, lon dimensions 2-D auxiliary coordinates — do not introduce dimension coordinates
ds.sizes result {lat, bnds, lon, time} — time last ❌ {time, lev, j, i} — time first ✓

The old reorder() used core=("lat", "lon", "time", "height"), causing lat/lon coordinate variables to be written to the NetCDF file before time, which made xarray place time last in ds.coords on read-back.

Changes

src/access_moppy/base.py

ensure_time_first_dimension() — add Dataset reconstruction step

The original implementation only transposed variables one by one, which fixed per-variable .dims but left ds.sizes/ds.coords ordering unchanged. A second step is added: if ds.sizes does not already start with time, reconstruct the Dataset placing time/time_bnds first in the variable dict, so xarray encounters the time dimension first during construction.

# Step 2: rebuild dataset so 'time' is first in ds.sizes / ds.coords
current_sizes = list(self.ds.sizes.keys())
if current_sizes and current_sizes[0] == "time":
    return  # already correct

ordered_vars = {}
for name in ("time", "time_bnds"):
    if name in self.ds.variables:
        ordered_vars[name] = self.ds[name]
for name in self.ds.variables:
    if name not in ordered_vars:
        ordered_vars[name] = self.ds[name]

self.ds = xr.Dataset(ordered_vars, attrs=self.ds.attrs)

#221 Solved

@rhaegar325 rhaegar325 changed the title fix dims and coords order issue fix dims and coords order issue,Solved #221 Mar 19, 2026
@rhaegar325 rhaegar325 changed the title fix dims and coords order issue,Solved #221 fix dims and coords order issue,Solved time order in NETcdf file Mar 19, 2026
@rhaegar325 rhaegar325 requested a review from rbeucher March 19, 2026 04:06
@codecov
Copy link

codecov bot commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.49%. Comparing base (5bd7f01) to head (5e42d97).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/access_moppy/base.py 80.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #231      +/-   ##
==========================================
+ Coverage   49.02%   49.49%   +0.47%     
==========================================
  Files          22       22              
  Lines        4241     4271      +30     
==========================================
+ Hits         2079     2114      +35     
+ Misses       2162     2157       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rhaegar325 rhaegar325 force-pushed the fix_dimention_order_time_first branch from a7aba3d to 5e42d97 Compare March 19, 2026 09:49
@rhaegar325
Copy link
Collaborator Author

@rbeucher I have combined our work and this pr can ensure time be in the first in both variables and dataset.

@rbeucher
Copy link
Member

rbeucher commented Mar 19, 2026

Thanks @rhaegar325, I have been reading a bit more about NetCDF. The order of dimensions is important for variables, but the order of coordinate variables themselves is not important and is not enforced by CMIP.
Your code works fine, but I am not fully convinced it is lazy.

This bit:

self.ds = xr.Dataset(ordered_vars, attrs=self.ds.attrs)

This can:

  • Trigger reconstruction of the Dataset
  • Potentially drop encoding / chunk metadata
  • Not strictly “eager” in computation, but:
  • It breaks references
  • Can invalidate optimisations
  • May lead to unexpected loads later

I will take a closer look tomorrow, but I am inclined to keep things simple and rely on the changes I have already made.
Sorry @rhaegar325, I am still getting up to speed on what is actually required versus what is not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants