-
Notifications
You must be signed in to change notification settings - Fork 13
/
Copy pathboolean_indexing_nd.Rmd
321 lines (241 loc) · 9.54 KB
/
boolean_indexing_nd.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
---
jupyter:
jupytext:
notebook_metadata_filter: all,-language_info
split_at_heading: true
text_representation:
extension: .Rmd
format_name: rmarkdown
format_version: '1.2'
jupytext_version: 1.13.7
kernelspec:
display_name: Python 3
language: python
name: python3
orphan: true
---
# Boolean indexing with more than one dimension
```{python}
# import common modules
import numpy as np # the Python array package
import matplotlib.pyplot as plt # the Python plotting package
```
First we make a 3D array of shape (4, 3, 2), just as we did in [the 3D array
page](arrays_3d.Rmd):
```{python}
# create an array of numbers from 0 to 11 inclusive, reshape this into a 4 rows by 3 columns array
plane0 = np.reshape(np.arange(12), (4, 3))
plane0
```
```{python}
# create an array of numbers from 100 to 111 inclusive, reshape this into a 4 rows by 3 columns array
plane1 = np.reshape(np.arange(100, 112), (4, 3))
plane1
```
We can use `np.stack` to stack the two 2D arrays into a 3D array, by stacking
over the third axis (`axis=2`):
```{python}
# Make a 3D array by stacking the two 2D arrays.
arr_3d = np.stack([plane0, plane1], axis=2)
arr_3d
```
Here is the array pictured in 3D space:
![](images/arr_3d_planes.jpg)
## Boolean indexing with more than one dimension
Below you will see that we use Boolean arrays of one or two or three dimensions
to index into a three-dimensional array.
This is a little hard to visualize, so we will go through it slowly. But
first, we will give you the rule for what you get back from indexing with a
Boolean array.
Let's say you have an array `A`, of N dimensions - in our case N=3.
Let's say you are indexing with a Boolean array `B` of P dimensions. P can be
1 or 2 or 3. Indexing `A` with `B` gives a result `R`, as in:
```python
R = A[B]
```
**Rule**: The result `R` of Indexing an N-dimensional array `A` with a
P-dimensional Boolean array `B` gives you one *row* for every `True` element in
`B`, and as many extra dimension as are left in `A` — specifically, `N - P`.
Let's say there are `T` True elements in `B`. Then the shape of `R` is `(T,) +
A.shape[(N - P):]`
That is a hard to follow in the abstract, but you will see the rule playing out
in the examples below. We will rephrase the rule again a couple of times at
the end.
## Boolean indexing in two dimensions
Let's see how the rule plays out in two dimensions. Remember:
```{python}
plane0
```
Now imagine we index into this 2D array with a one-dimensional array:
```{python}
B_1D = np.array([False, True, False, True])
B_1D
```
```{python}
res_1d = plane0[B_1D]
res_1d
```
Notice that the Boolean array `B_1D` has one element for each *row* in
`plane0`, and it has selected only the rows of `plane0` where there is a `True`
in the Boolean array. There are 2 `True` values in `B_1D` (`T`, above = 2)
`B_1D` has one dimension, leaving N - P = 2-1 = 1 dimension over, so the output
array is two dimensions, shape `(2,) + plane0.shape[1:]` == `(2, 3)`.
```{python}
res_1d.shape
```
Now lets try a two dimensional Boolean array.
```{python}
B_2D = np.array([[True, False, True],
[False, True, False],
[True, False, True],
[True, True, False]])
B_2D
```
```{python}
res_2d = plane0[B_2D]
res_2d
```
Notice that the Boolean array `B_2D` has one element for each *element* in
`plane0`, and it has selected *elements* from `plane0` where there is a `True`
in the Boolean array. There are 7 `True` values in `B_2D` (`T`, above = 7)
`B_2D` has two dimensions, leaving N - P = 2-2 = 0 dimension over, so the output
array is one dimensions, shape `(7,)`.
```{python}
res_2d.shape
```
## Indexing a three-dimensional array with a 1D Boolean
We can index `arr_3d` with a one-dimensional Boolean array. This selects
elements from the *first* axis.
```{python}
bool_1d = np.array([False, True, True, False])
arr_3d[bool_1d]
```
Pictured in 3D space, the first axis is the rows, across each plane. So, using
`[False, True, True, False]` to index `arr_3d` is equivalent to stating "give
me the 2nd and 3rd rows, across both remaining planes, from `arr_3d`".
Put another way, the 1D Boolean index is saying, "Give me only the *rows* with True, and all the columns and all the planes".
Remember that the `True` and `False` values in the Boolean array act as
'switches': only the elements corresponding to `True` make it into the final
array:
![](images/arr_3d_1d_bool.jpg)
```{python}
# Show the final array, from indexing arr_3d with bool_1d
arr_3d[bool_1d]
```
Notice that the Boolean array `bool_1d` has one element for each *row* in
`arr_3d`, and it has selected only the rows of `arr_3d` where there is a `True`
in the Boolean array. There are 2 `True` values in `bool_1D`, `T = 2`,
`bool_1D` has one dimension, leaving N - P = 3 - 1 = 2 dimensions over, so the
output array is three dimensions, shape `(2,) + arr_3d.shape[1:]` == `(2, 3,
2)`.
```{python}
# Show the shape of the final array.
arr_3d[bool_1d].shape
```
## Indexing a three-dimensional array with a 2D Boolean
We can also index with a two-dimensional Boolean array.
Put another way, we want "Only the rows, columns with a True value, and all the planes".
```{python}
bool_2d = np.array([[False, True, False],
[True, False, True],
[True, False, False],
[False, False, True],
])
bool_2d
```
If we index with this array, it selects elements from the first *two* axes.
Put another way, we want "Only the rows, columns with a True value, and all the planes".
```{python}
# index arr_3d with bool_2d
arr_3d[bool_2d]
```
You can think of the `bool_2d` array as 'switching' on or off individual columns of the `arr_3d` array. If we show this as a sequence, pictured in 3D space, it looks as follows:
![](images/arr_3d_2d_bool.jpg)
In this case, using `bool_2d` to index `arr_3d` yields an array with one plane, with 5 rows and 2 columns:
```{python}
arr_3d[bool_2d]
```
The Boolean array `bool_2d` has one element for each *row, column* pair in
`arr_3d`, and it has selected only the row, column pairs of `arr_3d` where
there is a `True` in the Boolean array. There are 5 `True` values in `bool_2d`,
`T = 5`, `bool_2d` has two dimensions, leaving N - P = 3 - 2 = 1 dimensions left
over, so the output array is two dimensions, shape `(5,) + arr_3d.shape[2:]` ==
`(5, 2)`.
```{python}
# show the shape of the arr_3d array, when indexed with the bool_2d boolean array
arr_3d[bool_2d].shape
```
## Indexing a three-dimensional array with a 3D Boolean
We can even index with a 3D array, this selects elements over all three
dimensions. In which order does it get the elements?
```{python}
# A zero (=False) array of all Booleans (dtype=bool)
bool_3d = np.zeros((4, 3, 2), dtype=bool)
bool_3d
```
```{python}
# Fill in the first plane with the original 2D array
bool_3d[:, :, 0] = bool_2d
# Fill in the second plane with another 2D array.
another_bool_2d = np.array([[True, True, False],
[False, False, False],
[True, True, False],
[True, False, False],
])
bool_3d[:, :, 1] = another_bool_2d
bool_3d
```
```{python}
# Use bool_3d to index arr_3d
arr_3d[bool_3d]
```
![](images/arr_3d_3d_bool.jpg)
The Boolean array `bool_3d` has one element for each *row, column, plane*
triple in `arr_3d`. Each triple defines an element. `bool_3d` has selected
only the elements of `arr_3d` where there is a `True` in the Boolean array.
There are 10 `True` values in `bool_3d`.
```{python}
np.count_nonzero(bool_3d)
```
Therefore `T = 10`, `bool_3d` has three dimensions
leaving N - P = 3 - 3 = 0 dimensions left over, so the output array is one
dimension, shape `(10,)`.
```{python}
# Show the shape
arr_3d[bool_3d].shape
```
## Another summary of the rule
Now we have done the indexing in one, two and three dimensions, let us think
again about the rule for the shape.
The Boolean index selects the items on the corresponding dimension(s).
Therefore:
* A 1D index on its own selects rows (and leaves columns and planes intact).
The output has one *row* for each True in the array, and the original number
of columns and planes. It is therefore 3D.
* A 2D index on its own selects row, column elements (and leaves planes
intact). The output has one *row* for each True in the 2D array (each
selected row, column pair), and the original number of planes. It is
therefore 2D.
* A 3D index selects row, column, plane elements. The output has one *row* for
each True in the 3D array (each selected row, column, plane element). It is
therefore 1D.
What's the rule again?
The dimensions in the indexing Boolean array get collapsed to a single
dimension. Call that the Boolean dimension. The dimension has the same number
of elements are there are True values in the Boolean array. The remaining
dimensions persist. In our case:
* 1D Boolean collapses the single dimension to a single dimension - giving
Boolean dimension x columns x planes.
* 2D Boolean collapses two dimensions to a single dimension - giving Boolean
dimension x planes.
* 3D Boolean collapses all three dimensions to a single dimension - giving
Boolean dimension.
In each of the cases above, the Boolean dimension has the same length as the
number of True elements in the indexing Boolean array.
## Using 1D Boolean arrays on other axes
We can also mix 1D Boolean arrays with ordinary slicing to select elements on
a single axis. E.g. the code below returns the entire second plane of `arr_3d`:
```{python}
bool_1d_dim3 = np.array([False, True])
arr_3d[:, :, bool_1d_dim3]
```