Skip to content

Commit 5df6bdd

Browse files
committed
Merge pull request #235 from shoyer/better-format
Better formatting for coordinates, getting rid of "index coordinates" (and assorted doc improvements)
2 parents fbff4a7 + 4a06024 commit 5df6bdd

12 files changed

+329
-174
lines changed

doc/combining.rst

+18-1
Original file line numberDiff line numberDiff line change
@@ -21,16 +21,33 @@ that dimension:
2121
2222
arr = xray.DataArray(np.random.randn(2, 3),
2323
[('x', ['a', 'b']), ('y', [10, 20, 30])])
24+
arr[:, :1]
25+
# this resembles how you would use np.concatenate
2426
xray.concat([arr[:, :1], arr[:, 1:]], dim='y')
2527
2628
In addition to combining along an existing dimension, ``concat`` can create a
27-
new dimension by stacking lower dimension arrays together:
29+
new dimension by stacking lower dimensional arrays together:
2830

2931
.. ipython:: python
3032
3133
arr[0]
34+
# to combine these 1d arrays into a 2d array in numpy, you would use np.array
35+
xray.concat([arr[0], arr[1]], 'x')
36+
37+
If the second argument to ``concat`` is a new dimension name, the arrays will
38+
be concatenated along that new dimension, which is always inserted as the first
39+
dimension:
40+
41+
.. ipython:: python
42+
3243
xray.concat([arr[0], arr[1]], 'new_dim')
3344
45+
This is actually the default behavior for ``concat``:
46+
47+
.. ipython:: python
48+
49+
xray.concat([arr[0], arr[1]])
50+
3451
The second argument to ``concat`` can also be an :py:class:`~pandas.Index` or
3552
:py:class:`~xray.DataArray` object as well as a string, in which case it is
3653
used to label the values along the new dimension:

doc/computation.rst

+19-9
Original file line numberDiff line numberDiff line change
@@ -127,11 +127,14 @@ This means, for example, that you always subtract an array from its transpose:
127127
128128
c - c.T
129129
130+
.. _alignment and coordinates:
131+
130132
Alignment and coordinates
131133
=========================
132134

133135
For now, performing most binary operations on xray objects requires that the
134-
all *index* coordinates have the same values:
136+
all *index* :ref:`coordinates` (that is, coordinates with the same name as a
137+
dimension) have the same values:
135138

136139
.. ipython::
137140

@@ -157,18 +160,25 @@ See :ref:`align and reindex` for more details.
157160
expect to default to ``join='inner'``.
158161

159162
Although index coordinates are required to match exactly, other coordinates are
160-
not. Still, xray will persist other coordinates in arithmetic, as long as there
163+
not, and if their values conflict, they will be dropped. This is necessary,
164+
for example, because indexing turns 1D coordinates into scalars:
165+
166+
.. ipython:: python
167+
168+
arr[0]
169+
arr[1]
170+
# notice that the scalar coordinate 'x' is silently dropped
171+
arr[1] - arr[0]
172+
173+
Still, xray will persist other coordinates in arithmetic, as long as there
161174
are no conflicting values:
162175

163176
.. ipython:: python
164177
165-
a.coords['z'] = -1
166-
b.coords['z'] = 999
167-
# notice that 'z' is silently dropped
168-
a + b
169-
b.coords['z'] = -1
170-
# now 'z' is persisted, because it has a unique value
171-
a + b
178+
# only one argument has the 'x' coordinate
179+
arr[0] + 1
180+
# both arguments have the same 'x' coordinate
181+
arr[0] - arr[0]
172182
173183
Math with Datasets
174184
==================

doc/data-structures.rst

+45-47
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,16 @@
1+
.. _data structures:
2+
13
Data Structures
24
===============
35

46
.. ipython:: python
57
:suppress:
68
7-
import numpy as np
8-
np.random.seed(123456)
9-
np.set_printoptions(threshold=10)
10-
11-
To get started, we will import numpy, pandas and xray:
12-
13-
.. ipython:: python
14-
159
import numpy as np
1610
import pandas as pd
1711
import xray
12+
np.random.seed(123456)
13+
np.set_printoptions(threshold=10)
1814
1915
DataArray
2016
---------
@@ -31,10 +27,9 @@ multi-dimensional array. It has several key properties:
3127

3228
xray uses ``dims`` and ``coords`` to enable its core metadata aware operations.
3329
Dimensions provide names that xray uses instead of the ``axis`` argument found
34-
in many numpy functions. Coordinates (particularly "index coordinates") enable
35-
fast label based indexing and alignment, building on the functionality of the
36-
``index`` found on a pandas :py:class:`~pandas.DataFrame` or
37-
:py:class:`~pandas.Series`.
30+
in many numpy functions. Coordinates enable fast label based indexing and
31+
alignment, building on the functionality of the ``index`` found on a pandas
32+
:py:class:`~pandas.DataFrame` or :py:class:`~pandas.Series`.
3833

3934
DataArray objects also can have a ``name`` and can hold arbitrary metadata in
4035
the form of their ``attrs`` property (an ordered dictionary). Names and
@@ -66,9 +61,9 @@ in with default values:
6661
6762
xray.DataArray(data)
6863
69-
As you can see, dimension names and index coordinates, which label tick marks
70-
along each dimension, are always present. This behavior is similar to pandas,
71-
which fills in index values in the same way.
64+
As you can see, dimensions and coordinate arrays corresponding to each
65+
dimension are always present. This behavior is similar to pandas, which fills
66+
in index values in the same way.
7267

7368
The data array constructor also supports supplying ``coords`` as a list of
7469
``(dim, ticks[, attrs])`` pairs with length equal to the number of dimensions:
@@ -80,7 +75,7 @@ The data array constructor also supports supplying ``coords`` as a list of
8075
Yet another option is to supply ``coords`` in the form of a dictionary where
8176
the values are scaler values, 1D arrays or tuples (in the same form as the
8277
`dataarray constructor`_). This form lets you supply other coordinates than
83-
those used for indexing (more on these later):
78+
those corresponding to dimensions (more on these later):
8479

8580
.. ipython:: python
8681
@@ -214,16 +209,14 @@ variables. Dictionary like access on a dataset will supply arrays found in
214209
either category. However, the distinction does have important implications for
215210
indexing and compution.
216211

217-
Here is an example how we might structure a dataset for a weather forecast:
212+
Here is an example of how we might structure a dataset for a weather forecast:
218213

219214
.. image:: _static/dataset-diagram.png
220215

221216
In this example, it would be natural to call ``temperature`` and
222217
``precipitation`` "variables" and all the other arrays "coordinates" because
223-
they label the points along the dimensions. ``x``, ``y`` and ``time`` are
224-
index coordinates (used for alignment purposes), and ``latitude``,
225-
``longitude`` and ``reference_time`` are other coordinates, not used for
226-
indexing (see [1]_ for more background on this example).
218+
they label the points along the dimensions. (see [1]_ for more background on
219+
this example).
227220

228221
.. _dataarray constructor:
229222

@@ -383,40 +376,46 @@ Another useful option is the ability to rename the variables in a dataset:
383376
384377
ds.rename({'temperature': 'temp', 'precipitation': 'precip'})
385378
379+
.. _coordinates:
380+
386381
Coordinates
387382
-----------
388383

389-
``DataArray`` and ``Dataset`` objects store two types of arrays in their
390-
``coords`` attribute:
384+
Coordinates are ancilliary arrays stored for ``DataArray`` and ``Dataset``
385+
objects in the ``coords`` attribute:
386+
387+
.. ipython:: python
388+
389+
ds.coords
391390
392-
* "Index" coordinates are used for label based indexing and alignment, like the
393-
``index`` found on a pandas :py:class:`~pandas.DataFrame` or
394-
:py:class:`~pandas.Series`. Index coordinates must be one-dimensional, and
395-
are (automatically) identified by arrays with a name equal to their (single)
396-
dimension.
397-
* "Other" coordinates are also intended to be descriptive of points along
398-
dimensions, but xray makes no any direct use of them, beyond persisting
399-
through operations when it can be done unambiguously. These coordinates can
400-
have any number of dimensions.
391+
Unlike attributes, xray *does* interpret and persist coordinates in
392+
operations that transform xray objects.
401393

402-
.. note::
394+
One dimensional coordinates with a name equal to their sole dimension (marked
395+
by ``*`` when printing a dataset or data array) take on a special meaning in
396+
xray. They are used for label based indexing and alignment,
397+
like the ``index`` found on a pandas :py:class:`~pandas.DataFrame` or
398+
:py:class:`~pandas.Series`. Indeed, these "dimension" coordinates use a
399+
:py:class:`pandas.Index` internally to store their values.
403400

404-
You cannot yet use a :py:class:`pandas.MultiIndex` as a xray index
405-
coordinate (:issue:`164`).
401+
Other than for indexing, xray does not make any direct use of the values
402+
associated with coordinates. Coordinates with names not matching a dimension
403+
are not used for alignment or indexing, nor are they required to match when
404+
doing arithmetic (see :ref:`alignment and coordinates`).
406405

407406
Converting to ``pandas.Index``
408407
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
409408

410-
To convert an index coordinate into an actual :py:class:`pandas.Index`, use
411-
the :py:meth:`~xray.DataArray.to_index` method:
409+
To convert a coordinate (or any ``DataArray``) into an actual
410+
:py:class:`pandas.Index`, use the :py:meth:`~xray.DataArray.to_index` method:
412411

413412
.. ipython:: python
414413
415414
ds['time'].to_index()
416415
417416
A useful shortcut is the ``indexes`` property (on both ``DataArray`` and
418-
``Dataset``), which lazily constructs a dictionary where the values are
419-
``Index`` objects:
417+
``Dataset``), which lazily constructs a dictionary whose keys are given by each
418+
dimension and whose the values are ``Index`` objects:
420419

421420
.. ipython:: python
422421
@@ -436,18 +435,17 @@ variables, use the the :py:meth:`~xray.Dataset.set_coords` and
436435
ds.set_coords(['temperature', 'precipitation'])
437436
ds['temperature'].reset_coords(drop=True)
438437
439-
Notice that these operations skip index coordinates.
440-
441-
.. note::
442-
443-
We do not yet have a ``set_index`` method like pandas for manipulating
444-
indexes. This is planned.
438+
Notice that these operations skip coordinates with names given by dimensions,
439+
as used for indexing. This mostly because we are not entirely sure how to
440+
design the interface around the fact that xray cannot store a coordinate and
441+
variable with the name but different values in the same dictionary. But we do
442+
recognize that supporting something like this would be useful.
445443

446444
Converting into datasets
447445
~~~~~~~~~~~~~~~~~~~~~~~~
448446

449-
Coordinate objects also have a few useful methods, mostly for converting them
450-
into dataset objects:
447+
``Coordinates`` objects also have a few useful methods, mostly for converting
448+
them into dataset objects:
451449

452450
.. ipython:: python
453451

doc/groupby.rst

+2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _groupby:
2+
13
GroupBy: split-apply-combine
24
----------------------------
35

doc/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ Documentation
2626

2727
why-xray
2828
installing
29+
quickstart
2930
data-structures
3031
indexing
3132
computation

doc/indexing.rst

+9-6
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _indexing:
2+
13
Indexing and selecting data
24
===========================
35

@@ -76,12 +78,12 @@ and :py:meth:`~xray.DataArray.isel` methods:
7678
# index by integer array indices
7779
arr.isel(space=0, time=slice(None, 2))
7880
79-
# index by index coordinate labels
81+
# index by dimension coordinate labels
8082
arr.sel(time=slice('2000-01-01', '2000-01-02'))
8183
8284
The arguments to these methods can be any objects that could index the array
83-
along that dimension, e.g., labels for an individual value, Python ``slice``
84-
objects or 1-dimensional arrays.
85+
along the dimension given by the keyword, e.g., labels for an individual value,
86+
Python :py:func:`slice` objects or 1-dimensional arrays.
8587

8688
.. note::
8789

@@ -170,9 +172,10 @@ Align and reindex
170172
-----------------
171173

172174
xray's ``reindex``, ``reindex_like`` and ``align`` impose a ``DataArray`` or
173-
``Dataset`` onto a new set of index coordinates. The original values are subset
174-
to the index labels still found in the new labels, and values corresponding to
175-
new labels not found in the original object are in-filled with `NaN`.
175+
``Dataset`` onto a new set of coordinates corresponding to dimensions. The
176+
original values are subset to the index labels still found in the new labels,
177+
and values corresponding to new labels not found in the original object are
178+
in-filled with `NaN`.
176179

177180
To reindex a particular dimension, use :py:meth:`~xray.DataArray.reindex`:
178181

doc/installing.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,13 @@ Optional dependencies:
1616
The easiest way to get all these dependencies installed is to use the
1717
`Anaconda python distribution <https://store.continuum.io/cshop/anaconda/>`__.
1818

19-
To install xray, use pip:
20-
21-
::
19+
To install xray, use pip::
2220

2321
pip install xray
2422

2523
.. warning::
2624

2725
If you don't already have recent versions of numpy and pandas installed,
28-
installing xray will automatically update them.
26+
installing xray will attempt to automatically update them. This may or may
27+
not succeed: you probably want to ensure you have an up-to-date installs
28+
of numpy and pandas before attempting to install xray.

0 commit comments

Comments
 (0)