Describe the topic for the talk
Zarr is an "Analysis Ready Cloud Optimized" (ARCO) data format, and allows for very fast data access for analysis. A lot of datasets, however, exist in NetCDF data formats which can significantly slow down analysis. Rewriting the NetCDF data to Zarr is a monstrous task that would also duplicate your data, and need to be done every time you update your data. Thankfully there's a win-win-win solution! Enter VirtualiZarr, a project that maps out your NetCDF data creating virtual Zarr stores. This gives you Zarr-like access to NetCDF data with no data duplication, and in a way that easily allows you to update the datasets.
VirtualiZarr Docs
Describe the benefit
I think that this talk should be done at a higher level focussing on the benefits from the scientists POV (i.e., tailor it to the audience). The exact implementation details would be of interest to RSEs and Data Engineers, but is also well documented in the project docs.
Would you be capable/willing to give the talk?
Yes - I'm wanting to do this for our Lorenz data anyway (also this would be good to explore interactions with Icechunk etc). Also open to others taking over/being involved if interested.
Additional comments
None
Describe the topic for the talk
Zarr is an "Analysis Ready Cloud Optimized" (ARCO) data format, and allows for very fast data access for analysis. A lot of datasets, however, exist in NetCDF data formats which can significantly slow down analysis. Rewriting the NetCDF data to Zarr is a monstrous task that would also duplicate your data, and need to be done every time you update your data. Thankfully there's a win-win-win solution! Enter VirtualiZarr, a project that maps out your NetCDF data creating virtual Zarr stores. This gives you Zarr-like access to NetCDF data with no data duplication, and in a way that easily allows you to update the datasets.
VirtualiZarr Docs
Describe the benefit
I think that this talk should be done at a higher level focussing on the benefits from the scientists POV (i.e., tailor it to the audience). The exact implementation details would be of interest to RSEs and Data Engineers, but is also well documented in the project docs.
Would you be capable/willing to give the talk?
Yes - I'm wanting to do this for our Lorenz data anyway (also this would be good to explore interactions with Icechunk etc). Also open to others taking over/being involved if interested.
Additional comments
None