Skip to content

[Talk Suggestion]: Speed up access to your NetCDF data using VirtualiZarr #57

@VeckoTheGecko

Description

@VeckoTheGecko

Describe the topic for the talk
Zarr is an "Analysis Ready Cloud Optimized" (ARCO) data format, and allows for very fast data access for analysis. A lot of datasets, however, exist in NetCDF data formats which can significantly slow down analysis. Rewriting the NetCDF data to Zarr is a monstrous task that would also duplicate your data, and need to be done every time you update your data. Thankfully there's a win-win-win solution! Enter VirtualiZarr, a project that maps out your NetCDF data creating virtual Zarr stores. This gives you Zarr-like access to NetCDF data with no data duplication, and in a way that easily allows you to update the datasets.

VirtualiZarr Docs

Describe the benefit

I think that this talk should be done at a higher level focussing on the benefits from the scientists POV (i.e., tailor it to the audience). The exact implementation details would be of interest to RSEs and Data Engineers, but is also well documented in the project docs.

Would you be capable/willing to give the talk?
Yes - I'm wanting to do this for our Lorenz data anyway (also this would be good to explore interactions with Icechunk etc). Also open to others taking over/being involved if interested.

Additional comments

None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions