Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export Function for (Integrated) Datasets? #97

Open
BijalBPatel opened this issue Aug 4, 2023 · 3 comments · May be fixed by #173
Open

Export Function for (Integrated) Datasets? #97

BijalBPatel opened this issue Aug 4, 2023 · 3 comments · May be fixed by #173
Assignees
Labels
enhancement New feature or request

Comments

@BijalBPatel
Copy link
Collaborator

Occasionally I find it useful to export the scattering dataset after integration, but the builtin xarray.to_netcdf() gives some clunky errors on datetime.datetime() attributes and attributes with nested dicts.

Would it be useful to build in export/load functions? Where should it go?

@BijalBPatel BijalBPatel added the enhancement New feature or request label Aug 4, 2023
@BijalBPatel BijalBPatel self-assigned this Aug 4, 2023
@pbeaucage
Copy link
Collaborator

Already exists in some form - look at PyHyperScattering.util.FileIO and the methods therein saveNexus savePickle loadNexus loadPickle

A function in FileIO that sanitized the attributes to allow NetCDF serialization would be very useful, as would documentation improvements around the existing save/load functionality.

@BijalBPatel
Copy link
Collaborator Author

BijalBPatel commented Aug 4, 2023

I have a messy stub for netCDF, i can take this on during the hackathon. Pardon the formatting below:

import json
import copy


def saveScan(int_scans: xr.DataArray, outPath: str):
    """Saves an xr.DataArray containing scattering data to a netCDF file
Converts datetime attributes to strings (one-way conversion) and uses JSON.dumps()
to convert nested dicts to str (reverses on load with loadIntegratedScan)

Parameters
----------
int_scans : xr.DataArray
    xarray DataArray containing scattering data
outPath : str
    target output path (containing filename and extension)
"""

# Create output variable
int_scans_out = copy.deepcopy(int_scans)

# Convert problematic (non serializable) keys
keys = list(int_scans_out.attrs.keys())
for attr in keys:
    # Convert datetime to str
    if isinstance(int_scans_out.attrs[attr], datetime.datetime):
        int_scans_out.attrs[attr] = str(int_scans_out.attrs[attr])
    # Serialize dicts
    if isinstance(int_scans_out.attrs[attr], dict):
        # Identify as JSON'd by changing name
        newKey = "json_" + attr
        # Todo handle errors on unserializable key/values, for now just tries to convert to str
        int_scans_out.attrs[newKey] = json.dumps(int_scans_out.attrs[attr], default=str)
        del int_scans_out.attrs[attr]

# Save integrated data
int_scans_out.to_netcdf(outPath)
def loadScan(inPath: str):
    """Loads an xr.DataArray from netcdf generated by saveIntegratedScan()

    Attempts to revert JSON'd nested dict vars. Note probably doesn't preserve data types.
Parameters
----------
inPath : str
    target output path (containing filename and extension)
Returns
-------
xr.DataArray containing scattering data
"""

# Load from file
scans_in = xr.load_dataarray(inPath)

# Revert JSON'd vars
keys = list(scans_in.attrs.keys())
for attr in keys:
    # Identify JSON'd keys
    if "json_" in str(attr):
        scans_in.attrs[attr[5:]] = json.loads(scans_in.attrs[attr])
        del scans_in.attrs[attr]

# return loaded data
return scans_in

@pbeaucage
Copy link
Collaborator

Looks good!

Worth considering orjson (https://github.com/ijl/orjson) which correctly serializes numpy types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants