1
+ .. _data structures :
2
+
1
3
Data Structures
2
4
===============
3
5
4
6
.. ipython :: python
5
7
:suppress:
6
8
7
- import numpy as np
8
- np.random.seed(123456 )
9
- np.set_printoptions(threshold = 10 )
10
-
11
- To get started, we will import numpy, pandas and xray:
12
-
13
- .. ipython :: python
14
-
15
9
import numpy as np
16
10
import pandas as pd
17
11
import xray
12
+ np.random.seed(123456 )
13
+ np.set_printoptions(threshold = 10 )
18
14
19
15
DataArray
20
16
---------
@@ -31,10 +27,9 @@ multi-dimensional array. It has several key properties:
31
27
32
28
xray uses ``dims `` and ``coords `` to enable its core metadata aware operations.
33
29
Dimensions provide names that xray uses instead of the ``axis `` argument found
34
- in many numpy functions. Coordinates (particularly "index coordinates") enable
35
- fast label based indexing and alignment, building on the functionality of the
36
- ``index `` found on a pandas :py:class: `~pandas.DataFrame ` or
37
- :py:class: `~pandas.Series `.
30
+ in many numpy functions. Coordinates enable fast label based indexing and
31
+ alignment, building on the functionality of the ``index `` found on a pandas
32
+ :py:class: `~pandas.DataFrame ` or :py:class: `~pandas.Series `.
38
33
39
34
DataArray objects also can have a ``name `` and can hold arbitrary metadata in
40
35
the form of their ``attrs `` property (an ordered dictionary). Names and
@@ -66,9 +61,9 @@ in with default values:
66
61
67
62
xray.DataArray(data)
68
63
69
- As you can see, dimension names and index coordinates, which label tick marks
70
- along each dimension, are always present. This behavior is similar to pandas,
71
- which fills in index values in the same way.
64
+ As you can see, dimensions and coordinate arrays corresponding to each
65
+ dimension are always present. This behavior is similar to pandas, which fills
66
+ in index values in the same way.
72
67
73
68
The data array constructor also supports supplying ``coords `` as a list of
74
69
``(dim, ticks[, attrs]) `` pairs with length equal to the number of dimensions:
@@ -80,7 +75,7 @@ The data array constructor also supports supplying ``coords`` as a list of
80
75
Yet another option is to supply ``coords `` in the form of a dictionary where
81
76
the values are scaler values, 1D arrays or tuples (in the same form as the
82
77
`dataarray constructor `_). This form lets you supply other coordinates than
83
- those used for indexing (more on these later):
78
+ those corresponding to dimensions (more on these later):
84
79
85
80
.. ipython :: python
86
81
@@ -214,16 +209,14 @@ variables. Dictionary like access on a dataset will supply arrays found in
214
209
either category. However, the distinction does have important implications for
215
210
indexing and compution.
216
211
217
- Here is an example how we might structure a dataset for a weather forecast:
212
+ Here is an example of how we might structure a dataset for a weather forecast:
218
213
219
214
.. image :: _static/dataset-diagram.png
220
215
221
216
In this example, it would be natural to call ``temperature `` and
222
217
``precipitation `` "variables" and all the other arrays "coordinates" because
223
- they label the points along the dimensions. ``x ``, ``y `` and ``time `` are
224
- index coordinates (used for alignment purposes), and ``latitude ``,
225
- ``longitude `` and ``reference_time `` are other coordinates, not used for
226
- indexing (see [1 ]_ for more background on this example).
218
+ they label the points along the dimensions. (see [1 ]_ for more background on
219
+ this example).
227
220
228
221
.. _dataarray constructor :
229
222
@@ -383,40 +376,46 @@ Another useful option is the ability to rename the variables in a dataset:
383
376
384
377
ds.rename({' temperature' : ' temp' , ' precipitation' : ' precip' })
385
378
379
+ .. _coordinates :
380
+
386
381
Coordinates
387
382
-----------
388
383
389
- ``DataArray `` and ``Dataset `` objects store two types of arrays in their
390
- ``coords `` attribute:
384
+ Coordinates are ancilliary arrays stored for ``DataArray `` and ``Dataset ``
385
+ objects in the ``coords `` attribute:
386
+
387
+ .. ipython :: python
388
+
389
+ ds.coords
391
390
392
- * "Index" coordinates are used for label based indexing and alignment, like the
393
- ``index `` found on a pandas :py:class: `~pandas.DataFrame ` or
394
- :py:class: `~pandas.Series `. Index coordinates must be one-dimensional, and
395
- are (automatically) identified by arrays with a name equal to their (single)
396
- dimension.
397
- * "Other" coordinates are also intended to be descriptive of points along
398
- dimensions, but xray makes no any direct use of them, beyond persisting
399
- through operations when it can be done unambiguously. These coordinates can
400
- have any number of dimensions.
391
+ Unlike attributes, xray *does * interpret and persist coordinates in
392
+ operations that transform xray objects.
401
393
402
- .. note ::
394
+ One dimensional coordinates with a name equal to their sole dimension (marked
395
+ by ``* `` when printing a dataset or data array) take on a special meaning in
396
+ xray. They are used for label based indexing and alignment,
397
+ like the ``index `` found on a pandas :py:class: `~pandas.DataFrame ` or
398
+ :py:class: `~pandas.Series `. Indeed, these "dimension" coordinates use a
399
+ :py:class: `pandas.Index ` internally to store their values.
403
400
404
- You cannot yet use a :py:class: `pandas.MultiIndex ` as a xray index
405
- coordinate (:issue: `164 `).
401
+ Other than for indexing, xray does not make any direct use of the values
402
+ associated with coordinates. Coordinates with names not matching a dimension
403
+ are not used for alignment or indexing, nor are they required to match when
404
+ doing arithmetic (see :ref: `alignment and coordinates `).
406
405
407
406
Converting to ``pandas.Index ``
408
407
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
409
408
410
- To convert an index coordinate into an actual :py:class: ` pandas.Index `, use
411
- the :py:meth: `~xray.DataArray.to_index ` method:
409
+ To convert a coordinate (or any `` DataArray ``) into an actual
410
+ :py:class: ` pandas.Index `, use the :py:meth: `~xray.DataArray.to_index ` method:
412
411
413
412
.. ipython :: python
414
413
415
414
ds[' time' ].to_index()
416
415
417
416
A useful shortcut is the ``indexes `` property (on both ``DataArray `` and
418
- ``Dataset ``), which lazily constructs a dictionary where the values are
419
- ``Index `` objects:
417
+ ``Dataset ``), which lazily constructs a dictionary whose keys are given by each
418
+ dimension and whose the values are ``Index `` objects:
420
419
421
420
.. ipython :: python
422
421
@@ -436,18 +435,17 @@ variables, use the the :py:meth:`~xray.Dataset.set_coords` and
436
435
ds.set_coords([' temperature' , ' precipitation' ])
437
436
ds[' temperature' ].reset_coords(drop = True )
438
437
439
- Notice that these operations skip index coordinates.
440
-
441
- .. note ::
442
-
443
- We do not yet have a ``set_index `` method like pandas for manipulating
444
- indexes. This is planned.
438
+ Notice that these operations skip coordinates with names given by dimensions,
439
+ as used for indexing. This mostly because we are not entirely sure how to
440
+ design the interface around the fact that xray cannot store a coordinate and
441
+ variable with the name but different values in the same dictionary. But we do
442
+ recognize that supporting something like this would be useful.
445
443
446
444
Converting into datasets
447
445
~~~~~~~~~~~~~~~~~~~~~~~~
448
446
449
- Coordinate objects also have a few useful methods, mostly for converting them
450
- into dataset objects:
447
+ `` Coordinates `` objects also have a few useful methods, mostly for converting
448
+ them into dataset objects:
451
449
452
450
.. ipython :: python
453
451
0 commit comments