Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 67 additions & 15 deletions docs/high-level.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
"xbeam_ds"
],
"outputs": [],
"execution_count": 2
"execution_count": 1
},
{
"metadata": {
Expand All @@ -83,7 +83,7 @@
"xarray_ds.chunk(chunks).to_zarr('example_data.zarr', mode='w')"
],
"outputs": [],
"execution_count": 3
"execution_count": 2
},
{
"metadata": {
Expand Down Expand Up @@ -186,7 +186,7 @@
"xarray.open_zarr('example_climatology.zarr')"
],
"outputs": [],
"execution_count": 6
"execution_count": 3
},
{
"metadata": {
Expand Down Expand Up @@ -215,7 +215,7 @@
"xarray.open_zarr('example_regrid.zarr')"
],
"outputs": [],
"execution_count": 7
"execution_count": 4
},
{
"metadata": {
Expand Down Expand Up @@ -245,7 +245,7 @@
" print(f'{type(e).__name__}: {e}')"
],
"outputs": [],
"execution_count": 8
"execution_count": 5
},
{
"metadata": {
Expand All @@ -262,19 +262,75 @@
},
"cell_type": "code",
"source": [
"ds_beam = xbeam.Dataset.from_zarr('example_data.zarr')\n",
"ds_beam.map_blocks(lambda ds: ds.compute(), template=ds_beam.template)"
"(\n",
" xbeam.Dataset.from_zarr('example_data.zarr')\n",
" .map_blocks(lambda ds: ds.compute(), template=ds_beam.template)\n",
")"
],
"outputs": [],
"execution_count": 9
"execution_count": 6
},
{
"metadata": {
"id": "-U4t0kKIkDvb"
},
"cell_type": "markdown",
"source": [
"## Interfacing with low-level transforms"
]
},
{
"metadata": {
"id": "75IG-22cKcuE"
},
"cell_type": "markdown",
"source": [
"Sometimes, your computation doesn't fit into the ``map_blocks`` paradigm because you don't want to create `xarray.Dataset` objects. For these cases, you can switch to the lower-level Xarray-Beam [data model](data-model), and use raw Beam operations:"
"`Dataset` is a thin wrapper around Xarray-Beam transformations, so you can always drop into the lower-level Xarray-Beam [data model](data-model) and use raw Beam operations. This is especially useful for the reading or writing data.\n",
"\n",
"```{warning}\n",
"The `Dataset` constructor currently performs **no validation** on its inputs!\n",
"```\n",
"\n",
"For example, here's how you could manually recreate a `Dataset`, using the common pattern of evaluating a single example in-memory to create a template with {py:func}`~xarray_beam.make_template` and {py:func}`~xarray_beam.replace_template_dims`:"
]
},
{
"metadata": {
"id": "l9pHS1QDlMd-"
},
"cell_type": "code",
"source": [
"all_times = pd.date_range('2025-01-01', freq='1D', periods=365)\n",
"source_dataset = xarray.open_zarr('example_data.zarr', chunks=None)\n",
"\n",
"def load_chunk(time: pd.Timestamp) -\u003e tuple[xbeam.Key, xarray.Dataset]:\n",
" key = xbeam.Key({'time': (time - all_times[0]).days})\n",
" dataset = source_dataset.sel(time=[time])\n",
" return key, dataset\n",
"\n",
"_, example = load_chunk(all_times[0])\n",
"\n",
"template = xbeam.make_template(example)\n",
"template = xbeam.replace_template_dims(template, time=all_times)\n",
"\n",
"ds_beam = xbeam.Dataset(\n",
" template=template,\n",
" chunks=xbeam.normalize_chunks({'time': 1}, template),\n",
" split_vars=False,\n",
" ptransform=(beam.Create(all_times) | beam.Map(load_chunk)),\n",
")\n",
"ds_beam"
],
"outputs": [],
"execution_count": 12
},
{
"metadata": {
"id": "1qjeY5mwlLGJ"
},
"cell_type": "markdown",
"source": [
"You can also pull-out the underlying Beam `ptransform` from a dataset to append new transformations, e.g., to write each element of the pipeline to disk as a separate file:"
]
},
{
Expand All @@ -288,16 +344,12 @@
" chunk.to_netcdf(path)\n",
"\n",
"with beam.Pipeline() as p:\n",
" p | (\n",
" xbeam.Dataset.from_zarr('example_data.zarr')\n",
" .rechunk({'latitude': -1, 'longitude': -1})\n",
" .ptransform\n",
" ) | beam.MapTuple(to_netcdf)\n",
" p | ds_beam.rechunk('50MB').ptransform | beam.MapTuple(to_netcdf)\n",
"\n",
"%ls *.nc"
],
"outputs": [],
"execution_count": 10
"execution_count": 13
}
],
"metadata": {
Expand Down