|
63 | 63 | "xbeam_ds"
|
64 | 64 | ],
|
65 | 65 | "outputs": [],
|
66 |
| - "execution_count": 2 |
| 66 | + "execution_count": 1 |
67 | 67 | },
|
68 | 68 | {
|
69 | 69 | "metadata": {
|
|
83 | 83 | "xarray_ds.chunk(chunks).to_zarr('example_data.zarr', mode='w')"
|
84 | 84 | ],
|
85 | 85 | "outputs": [],
|
86 |
| - "execution_count": 3 |
| 86 | + "execution_count": 2 |
87 | 87 | },
|
88 | 88 | {
|
89 | 89 | "metadata": {
|
|
186 | 186 | "xarray.open_zarr('example_climatology.zarr')"
|
187 | 187 | ],
|
188 | 188 | "outputs": [],
|
189 |
| - "execution_count": 6 |
| 189 | + "execution_count": 3 |
190 | 190 | },
|
191 | 191 | {
|
192 | 192 | "metadata": {
|
|
215 | 215 | "xarray.open_zarr('example_regrid.zarr')"
|
216 | 216 | ],
|
217 | 217 | "outputs": [],
|
218 |
| - "execution_count": 7 |
| 218 | + "execution_count": 4 |
219 | 219 | },
|
220 | 220 | {
|
221 | 221 | "metadata": {
|
|
245 | 245 | " print(f'{type(e).__name__}: {e}')"
|
246 | 246 | ],
|
247 | 247 | "outputs": [],
|
248 |
| - "execution_count": 8 |
| 248 | + "execution_count": 5 |
249 | 249 | },
|
250 | 250 | {
|
251 | 251 | "metadata": {
|
|
262 | 262 | },
|
263 | 263 | "cell_type": "code",
|
264 | 264 | "source": [
|
265 |
| - "ds_beam = xbeam.Dataset.from_zarr('example_data.zarr')\n", |
266 |
| - "ds_beam.map_blocks(lambda ds: ds.compute(), template=ds_beam.template)" |
| 265 | + "(\n", |
| 266 | + " xbeam.Dataset.from_zarr('example_data.zarr')\n", |
| 267 | + " .map_blocks(lambda ds: ds.compute(), template=ds_beam.template)\n", |
| 268 | + ")" |
267 | 269 | ],
|
268 | 270 | "outputs": [],
|
269 |
| - "execution_count": 9 |
| 271 | + "execution_count": 6 |
| 272 | + }, |
| 273 | + { |
| 274 | + "metadata": { |
| 275 | + "id": "-U4t0kKIkDvb" |
| 276 | + }, |
| 277 | + "cell_type": "markdown", |
| 278 | + "source": [ |
| 279 | + "## Interfacing with Beam transforms" |
| 280 | + ] |
270 | 281 | },
|
271 | 282 | {
|
272 | 283 | "metadata": {
|
273 | 284 | "id": "75IG-22cKcuE"
|
274 | 285 | },
|
275 | 286 | "cell_type": "markdown",
|
276 | 287 | "source": [
|
277 |
| - "Sometimes, your computation doesn't fit into the ``map_blocks`` paradigm because you don't want to create `xarray.Dataset` objects. For these cases, you can switch to the lower-level Xarray-Beam [data model](data-model), and use raw Beam operations:" |
| 288 | + "`Dataset` is a thin wrapper around Xarray-Beam transformations, so you can always drop into the lower-level Xarray-Beam [data model](data-model) and use raw Beam operations. This is especially useful for the reading or writing data.\n", |
| 289 | + "\n", |
| 290 | + "For example, here's how you could manually recreate a `Dataset`, using the common pattern of evaluating a single example in-memory to create a template with {py:func}`~xarray_beam.make_template` and {py:func}`~xarray_beam.replace_template_dims`:" |
| 291 | + ] |
| 292 | + }, |
| 293 | + { |
| 294 | + "metadata": { |
| 295 | + "id": "l9pHS1QDlMd-" |
| 296 | + }, |
| 297 | + "cell_type": "code", |
| 298 | + "source": [ |
| 299 | + "all_times = pd.date_range('2025-01-01', freq='1D', periods=365)\n", |
| 300 | + "source_dataset = xarray.open_zarr('example_data.zarr', chunks=None)\n", |
| 301 | + "\n", |
| 302 | + "def load_chunk(time: pd.Timestamp) -\u003e tuple[xbeam.Key, xarray.Dataset]:\n", |
| 303 | + " key = xbeam.Key({'time': (time - all_times[0]).days})\n", |
| 304 | + " dataset = source_dataset.sel(time=[time])\n", |
| 305 | + " return key, dataset\n", |
| 306 | + "\n", |
| 307 | + "_, example = load_chunk(all_times[0])\n", |
| 308 | + "\n", |
| 309 | + "template = xbeam.make_template(example)\n", |
| 310 | + "template = xbeam.replace_template_dims(template, time=all_times)\n", |
| 311 | + "\n", |
| 312 | + "ds_beam = xbeam.Dataset(\n", |
| 313 | + " template=template,\n", |
| 314 | + " chunks=xbeam.normalize_chunks({'time': 1}, template),\n", |
| 315 | + " split_vars=False,\n", |
| 316 | + " ptransform=(beam.Create(all_times) | beam.Map(load_chunk)),\n", |
| 317 | + ")\n", |
| 318 | + "ds_beam" |
| 319 | + ], |
| 320 | + "outputs": [], |
| 321 | + "execution_count": 12 |
| 322 | + }, |
| 323 | + { |
| 324 | + "metadata": { |
| 325 | + "id": "1qjeY5mwlLGJ" |
| 326 | + }, |
| 327 | + "cell_type": "markdown", |
| 328 | + "source": [ |
| 329 | + "You can also pull-out the underlying Beam `ptransform` from a dataset to append new transformations, e.g., to write each element of the pipeline to disk as a separate file:" |
278 | 330 | ]
|
279 | 331 | },
|
280 | 332 | {
|
|
288 | 340 | " chunk.to_netcdf(path)\n",
|
289 | 341 | "\n",
|
290 | 342 | "with beam.Pipeline() as p:\n",
|
291 |
| - " p | (\n", |
292 |
| - " xbeam.Dataset.from_zarr('example_data.zarr')\n", |
293 |
| - " .rechunk({'latitude': -1, 'longitude': -1})\n", |
294 |
| - " .ptransform\n", |
295 |
| - " ) | beam.MapTuple(to_netcdf)\n", |
| 343 | + " p | ds_beam.rechunk('50MB').ptransform | beam.MapTuple(to_netcdf)\n", |
296 | 344 | "\n",
|
297 | 345 | "%ls *.nc"
|
298 | 346 | ],
|
299 | 347 | "outputs": [],
|
300 |
| - "execution_count": 10 |
| 348 | + "execution_count": 13 |
301 | 349 | }
|
302 | 350 | ],
|
303 | 351 | "metadata": {
|
|
0 commit comments