fix[next]: properly ignore ndarray embedded caches#2536
fix[next]: properly ignore ndarray embedded caches#2536egparedes merged 15 commits intoGridTools:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a bug in the embedded NdArrayConnectivityField implementation so its runtime inverse-image cache is excluded from object state serialization (and thus won’t leak into serialized artifacts).
Changes:
- Added
__getstate__/__setstate__toNdArrayConnectivityFieldto omit the_cachecached-property from serialized state. - Added unit tests verifying
_cacheis excluded from__getstate__and not reintroduced by__setstate__.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/gt4py/next/embedded/nd_array_field.py |
Implements state (de)serialization hooks to drop the _cache runtime cache from serialization. |
tests/next_tests/unit_tests/embedded_tests/test_nd_array_field.py |
Adds tests covering __getstate__/__setstate__ behavior around _cache. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
tests/next_tests/unit_tests/embedded_tests/test_nd_array_field.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Fixes serialization/fingerprinting behavior for embedded NdArrayField/NdArrayConnectivityField by ensuring runtime-only caches aren’t included in saved state.
Changes:
- Added
__getstate__/__setstate__to serialize only dataclass-defined state (excluding cached/runtime attributes). - Moved/reintroduced an embedded runtime cache (
_cache) for connectivity inverse-image/restrict acceleration while keeping it out of serialization. - Added unit tests asserting cached properties and runtime caches are excluded from state/restoration.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| tests/next_tests/unit_tests/embedded_tests/test_nd_array_field.py | Adds tests for __getstate__ / __setstate__ behavior and cache exclusion for ndarray + connectivity fields |
| src/gt4py/next/embedded/nd_array_field.py | Implements __getstate__ / __setstate__ and clarifies _cache as a runtime-only cached property |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
tests/next_tests/unit_tests/embedded_tests/test_nd_array_field.py
Outdated
Show resolved
Hide resolved
tests/next_tests/unit_tests/embedded_tests/test_nd_array_field.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
This PR fixes a serialization/fingerprinting bug in the embedded NdArrayField (and NdArrayConnectivityField) by ensuring runtime-only caches (e.g., cached properties) are excluded from pickling state.
Changes:
- Add
NdArrayField.__getstate__/__setstate__to serialize only dataclass instance fields (excluding cached attributes). - Rework the embedded connectivity runtime cache (
_cache) to remain a non-serialized cached property. - Add unit tests covering
__getstate__,__setstate__, and a pickle roundtrip for both ndarray fields and connectivity fields; add a utility helper for dataclass field name lookup.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
tests/next_tests/unit_tests/embedded_tests/test_nd_array_field.py |
Adds tests asserting cached attributes are excluded from state and that pickle roundtrips restore data correctly. |
src/gt4py/next/embedded/nd_array_field.py |
Implements __getstate__/__setstate__ to drop runtime caches from serialization and keeps connectivity _cache non-serialized. |
src/gt4py/eve/utils.py |
Adds a cached helper to retrieve dataclass field names for use by serialization logic. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Fixes serialization/fingerprinting of embedded NdArrayField (and subclasses) by ensuring runtime-only cached attributes are excluded from pickled state, improving determinism of content_hash-based fingerprints.
Changes:
- Add
NdArrayField.__getstate__/__setstate__to serialize only dataclass instance fields (excluding cached properties stored in__dict__). - Reintroduce
NdArrayConnectivityField._cacheas an explicit runtime-only cache and add tests ensuring it is excluded from serialization. - Add unit tests covering
__getstate__/__setstate__behavior and pickle roundtrips.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/gt4py/next/embedded/nd_array_field.py |
Implements custom pickling state handling to exclude runtime caches and introduces helper logic for selecting serializable dataclass fields. |
tests/next_tests/unit_tests/embedded_tests/test_nd_array_field.py |
Adds regression tests for excluding cached properties/runtime caches from serialization and for pickle roundtrips. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
This PR introduces a metadata-driven pickling mechanism for dataclass-like objects in gt4py.next, and applies it to embedded NdArrayField serialization so runtime caches don’t affect fingerprinting (restoring proper cache reuse).
Changes:
- Add
utils.MetadataBasedPickling+utils.gt4py_metadata(...)to control which dataclass/datamodel fields participate in pickling. - Update embedded
NdArrayField(andNdArrayConnectivityField) to use metadata-based pickling and to exclude runtime-only derived/cached data. - Add unit tests validating
__getstate__output and pickle roundtrips for dataclasses/datamodels and embedded ndarray fields.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/gt4py/next/utils.py |
Adds the metadata namespace helper and the MetadataBasedPickling mixin that generates __getstate__ based on field metadata. |
src/gt4py/next/embedded/nd_array_field.py |
Adopts the mixin on NdArrayField and marks _kind as non-pickled to avoid fingerprint instability from derived state. |
tests/next_tests/unit_tests/test_utils.py |
Adds tests for the mixin/state generator and pickle roundtrips for dataclass/datamodel variants. |
tests/next_tests/unit_tests/embedded_tests/test_nd_array_field.py |
Adds regression tests ensuring embedded field runtime caches aren’t serialized and pickle roundtrips work. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
havogt
left a comment
There was a problem hiding this comment.
lgtm, just a few optional comments
src/gt4py/next/utils.py
Outdated
| # - for class instances with __dict__ and no __slots__ -> state = self.__dict__. | ||
| # - for class instances with __slots__ and no __dict__ -> state = (None, {slot.name: slot.value for all slots}). | ||
| # - for class instances with __dict__ and __slots__ -> state = (self.__dict__, {slot.name: slot.value for all slots}). | ||
| if not has_slots: |
There was a problem hiding this comment.
I find this if else change surprisingly hard to read. I think I'd prefer (although slightly more checks and feel free to ignore):
if has_slots and has_dict:
...
else:
if has_slots:
...
else: # has_dict
...
There was a problem hiding this comment.
I've now inlined the comments with the cases so it should be easier to read.
| return ( | ||
| {name: getattr(self, name) for name in dict_names}, | ||
| {name: getattr(self, name) for name in slot_names}, | ||
| ) |
There was a problem hiding this comment.
nit: we could add a test for empty datamodels and dataclasses. I don't see why it should break though...
| _codomain: common.DimT | ||
| _skip_value: Optional[core_defs.IntegralScalar] | ||
| _kind: Optional[common.ConnectivityKind] = None | ||
| _kind: Optional[common.ConnectivityKind] = dataclasses.field( |
There was a problem hiding this comment.
Why is it skipped, because it is also a property? I think for caches it is clear, but for other cases a docstring would be helpful.
There was a problem hiding this comment.
kind has been converted to a cached_property so this is not longer necessary.
Add a general mechanism to deal with serialization of dataclass-like instances using a new mixin class:
utils.MetadataBasedPickling. By default this mechanism ignores run-time only data added to the instance (e.g. caches) and allows skipping some specific fields from pickle using field metadata.Additionally, this fixes a bug in the embedded
NdArrayField(and subclasses), where because of cache data not being skipped properly during fingerprinting, the gt4py caches are not used properly. This bug was the actual trigger of this implementation.