boolean_indexing_nd.Rmd

---
jupyter:
  jupytext:
    notebook_metadata_filter: all,-language_info
    split_at_heading: true
    text_representation:
      extension: .Rmd
      format_name: rmarkdown
      format_version: '1.2'
      jupytext_version: 1.13.7
  kernelspec:
    display_name: Python 3
    language: python
    name: python3
  orphan: true
---

# Boolean indexing with more than one dimension


```{python}
# import common modules
import numpy as np  # the Python array package
import matplotlib.pyplot as plt  # the Python plotting package
```

First we make a 3D array of shape (4, 3, 2), just as we did in [the 3D array
page](arrays_3d.Rmd):

```{python}
# create an array of numbers from 0 to 11 inclusive, reshape this into a 4 rows by 3 columns array
plane0 = np.reshape(np.arange(12), (4, 3))
plane0
```

```{python}
# create an array of numbers from 100 to 111 inclusive, reshape this into a 4 rows by 3 columns array
plane1 = np.reshape(np.arange(100, 112), (4, 3))
plane1
```

We can use `np.stack` to stack the two 2D arrays into a 3D array, by stacking
over the third axis (`axis=2`):

```{python}
# Make a 3D array by stacking the two 2D arrays.
arr_3d = np.stack([plane0, plane1], axis=2)
arr_3d
```

Here is the array pictured in 3D space:

![](images/arr_3d_planes.jpg)


## Boolean indexing with more than one dimension

Below you will see that we use Boolean arrays of one or two or three dimensions
to index into a three-dimensional array.

This is a little hard to visualize, so we will go through it slowly.  But
first, we will give you the rule for what you get back from indexing with a
Boolean array.

Let's say you have an array `A`, of N dimensions - in our case N=3.

Let's say you are indexing with a Boolean array `B` of P dimensions.  P can be
1 or 2 or 3.  Indexing `A` with `B` gives a result `R`, as in:

```python
R = A[B]
```

**Rule**: The result `R` of Indexing an N-dimensional array `A` with a
P-dimensional Boolean array `B` gives you one *row* for every `True` element in
`B`, and as many extra dimension as are left in `A` — specifically, `N - P`.
Let's say there are `T` True elements in `B`.  Then the shape of `R` is `(T,) +
A.shape[(N - P):]`

That is a hard to follow in the abstract, but you will see the rule playing out
in the examples below.  We will rephrase the rule again a couple of times at
the end.


## Boolean indexing in two dimensions

Let's see how the rule plays out in two dimensions.  Remember:

```{python}
plane0
```

Now imagine we index into this 2D array with a one-dimensional array:

```{python}
B_1D = np.array([False, True, False, True])
B_1D
```

```{python}
res_1d = plane0[B_1D]
res_1d
```

Notice that the Boolean array `B_1D` has one element for each *row* in
`plane0`, and it has selected only the rows of `plane0` where there is a `True`
in the Boolean array. There are 2 `True` values in `B_1D` (`T`, above = 2)
`B_1D` has one dimension, leaving N - P = 2-1 = 1 dimension over, so the output
array is two dimensions, shape `(2,) + plane0.shape[1:]` == `(2, 3)`.

```{python}
res_1d.shape
```

Now lets try a two dimensional Boolean array.

```{python}
B_2D = np.array([[True, False, True],
                 [False, True, False],
                 [True, False, True],
                 [True, True, False]])
B_2D
```

```{python}
res_2d = plane0[B_2D]
res_2d
```

Notice that the Boolean array `B_2D` has one element for each *element* in
`plane0`, and it has selected *elements* from `plane0` where there is a `True`
in the Boolean array. There are 7 `True` values in `B_2D` (`T`, above = 7)
`B_2D` has two dimensions, leaving N - P = 2-2 = 0 dimension over, so the output
array is one dimensions, shape `(7,)`.

```{python}
res_2d.shape
```

## Indexing a three-dimensional array with a 1D Boolean

We can index `arr_3d` with a one-dimensional Boolean array. This selects
elements from the *first* axis.

```{python}
bool_1d = np.array([False, True, True, False])
arr_3d[bool_1d]
```

Pictured in 3D space, the first axis is the rows, across each plane. So, using
`[False, True, True, False]` to index `arr_3d` is equivalent to stating "give
me the 2nd and 3rd rows, across both remaining planes, from `arr_3d`".

Put another way, the 1D Boolean index is saying, "Give me only the *rows* with True, and all the columns and all the planes".

Remember that the `True` and `False` values in the Boolean array act as
'switches': only the elements corresponding to `True` make it into the final
array:

![](images/arr_3d_1d_bool.jpg)

```{python}
# Show the final array, from indexing arr_3d with bool_1d
arr_3d[bool_1d]
```

Notice that the Boolean array `bool_1d` has one element for each *row* in
`arr_3d`, and it has selected only the rows of `arr_3d` where there is a `True`
in the Boolean array. There are 2 `True` values in `bool_1D`,  `T = 2`,
`bool_1D` has one dimension, leaving N - P = 3 - 1 = 2 dimensions over, so the
output array is three dimensions, shape `(2,) + arr_3d.shape[1:]` == `(2, 3,
2)`.

```{python}
# Show the shape of the final array.
arr_3d[bool_1d].shape
```

## Indexing a three-dimensional array with a 2D Boolean

We can also index with a two-dimensional Boolean array.

Put another way, we want "Only the rows, columns with a True value, and all the planes".

```{python}
bool_2d = np.array([[False, True, False],
                    [True, False, True],
                    [True, False, False],
                    [False, False, True],
                   ])
bool_2d
```

If we index with this array, it selects elements from the first *two* axes.

Put another way, we want "Only the rows, columns with a True value, and all the planes".

```{python}
# index arr_3d with bool_2d
arr_3d[bool_2d]
```

You can think of the `bool_2d` array as 'switching' on or off individual columns of the `arr_3d` array. If we show this as a sequence, pictured in 3D space, it looks as follows:

![](images/arr_3d_2d_bool.jpg)


In this case, using `bool_2d` to index `arr_3d` yields an array with one plane, with 5 rows and 2 columns:

```{python}
arr_3d[bool_2d]
```

The Boolean array `bool_2d` has one element for each *row, column* pair in
`arr_3d`, and it has selected only the row, column pairs of `arr_3d` where
there is a `True` in the Boolean array. There are 5 `True` values in `bool_2d`,
`T = 5`, `bool_2d` has two dimensions, leaving N - P = 3 - 2 = 1 dimensions left
over, so the output array is two dimensions, shape `(5,) + arr_3d.shape[2:]` ==
`(5, 2)`.

```{python}
# show the shape of the arr_3d array, when indexed with the bool_2d boolean array
arr_3d[bool_2d].shape
```

## Indexing a three-dimensional array with a 3D Boolean

We can even index with a 3D array, this selects elements over all three
dimensions.  In which order does it get the elements?

```{python}
# A zero (=False) array of all Booleans (dtype=bool)
bool_3d = np.zeros((4, 3, 2), dtype=bool)
bool_3d
```

```{python}
# Fill in the first plane with the original 2D array
bool_3d[:, :, 0] = bool_2d
# Fill in the second plane with another 2D array.
another_bool_2d = np.array([[True, True, False],
                            [False, False, False],
                            [True, True, False],
                            [True, False, False],
                           ])
bool_3d[:, :, 1] = another_bool_2d
bool_3d
```

```{python}
# Use bool_3d to index arr_3d
arr_3d[bool_3d]
```

![](images/arr_3d_3d_bool.jpg)

The Boolean array `bool_3d` has one element for each *row, column, plane*
triple in `arr_3d`.  Each triple defines an element.  `bool_3d` has selected
only the elements of `arr_3d` where there is a `True` in the Boolean array.
There are 10 `True` values in `bool_3d`.

```{python}
np.count_nonzero(bool_3d)
```

Therefore `T = 10`, `bool_3d` has three dimensions
leaving N - P = 3 - 3 = 0 dimensions left over, so the output array is one
dimension, shape `(10,)`.

```{python}
# Show the shape
arr_3d[bool_3d].shape
```

## Another summary of the rule

Now we have done the indexing in one, two and three dimensions, let us think
again about the rule for the shape.

The Boolean index selects the items on the corresponding dimension(s).
Therefore:

* A 1D index on its own selects rows (and leaves columns and planes intact).
  The output has one *row* for each True in the array, and the original number
  of columns and planes.  It is therefore 3D.
* A 2D index on its own selects row, column elements (and leaves planes
  intact).  The output has one *row* for each True in the 2D array (each
  selected row, column pair), and the original number of planes.  It is
  therefore 2D.
* A 3D index selects row, column, plane elements.  The output has one *row* for
  each True in the 3D array (each selected row, column, plane element).  It is
  therefore 1D.

What's the rule again?

The dimensions in the indexing Boolean array get collapsed to a single
dimension.  Call that the Boolean dimension.  The dimension has the same number
of elements are there are True values in the Boolean array.  The remaining
dimensions persist. In our case:

* 1D Boolean collapses the single dimension to a single dimension - giving
  Boolean dimension x columns x planes.
* 2D Boolean collapses two dimensions to a single dimension - giving Boolean
  dimension x planes.
* 3D Boolean collapses all three dimensions to a single dimension - giving
  Boolean dimension.

In each of the cases above, the Boolean dimension has the same length as the
number of True elements in the indexing Boolean array.


## Using 1D Boolean arrays on other axes


We can also mix 1D Boolean arrays with ordinary slicing to select elements on
a single axis. E.g. the code below returns the entire second plane of `arr_3d`:

```{python}
bool_1d_dim3 = np.array([False, True])
arr_3d[:, :, bool_1d_dim3]
```