-
Notifications
You must be signed in to change notification settings - Fork 13
/
Copy pathboolean_indexing.Rmd
173 lines (121 loc) · 4.21 KB
/
boolean_indexing.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
---
jupyter:
jupytext:
notebook_metadata_filter: all,-language_info
split_at_heading: true
text_representation:
extension: .Rmd
format_name: rmarkdown
format_version: '1.2'
jupytext_version: 1.13.7
kernelspec:
display_name: Python 3
language: python
name: python3
---
# Indexing with Boolean arrays
As usual with arrays, we need the Numpy library:
```{python}
import numpy as np
```
Remember {ref}`Boolean values <true-and-false>`, and
{ref}`comparison-operators` from {doc}`brisk_python`. We will be using these
values and operators, in and with arrays.
## Select values with Boolean arrays
Here we are using Boolean arrays to *index* into other arrays. You will see
what we mean by that by the end of this section.
We often want to select several elements from an array according to some
criterion.
The most common way to do this, is to do array slicing, using a Boolean array
between the square brackets.
It can be easier to understand this by example than by description.
We are going to use some example data from [student ratings of their
professors](https://github.com/odsti/datasets/tree/master/good_and_easy).
You can go to the link for the long story, but the short story is that the
dataset is a table where the rows are academic disciplines, and the columns
contain the average student rating values for the corresponding discipline.
Here we have extracted the ratings for the six largest subjects — the subjects with the largest number of rated professors.
This is the array of discipline names for those six largest subjects:
```{python}
disciplines = np.array(
['English', 'Mathematics', 'Biology',
'Psychology', 'History', 'Chemistry'])
disciplines
```
One of the ratings the students gave was of how easy the course was, on a five
point scale from 1 (hard) to 5 (easy).
These are the average "Easiness" scores for the six largest courses named
above:
```{python}
easiness = np.array([3.16, 3.06, 2.71, 3.32, 3.05, 2.65])
```
The top (largest) discipline is:
```{python}
disciplines[0]
```
The Easiness rating for that course is:
```{python}
easiness[0]
```
and so on.
## Boolean arrays
Boolean arrays are arrays that contain values that are one of True or False.
Here is a Boolean array, created from applying a comparison to an array:
```{python}
greater_than_3 = easiness > 3
greater_than_3
```
This has a `True` value at the positions of elements > 3, and `False`
otherwise.
We can do things like count the number of `True` values in the Boolean array:
```{python}
np.count_nonzero(greater_than_3)
```
Now let us say that we wanted to get the elements from `easiness`
that are greater than 3. That is, we want to get the elements in `easiness`
for which the corresponding element in `greater_than_3` is `True`.
We can do this with *Boolean array indexing*. The Boolean array goes between
the square brackets, after the array name. As a reminder:
```{python}
# The easiness array
easiness
```
```{python}
# The greater_than_3 Boolean array
greater_than_3
```
We put the Boolean array between square brackets, after the array we want to get values from, like this:
```{python}
# Boolean indexing into the easiness array.
easiness[greater_than_3]
```
We have selected the numbers in `easiness` that are greater than 3.
See the picture below for an illustration of what is happening:
![](images/easiness_values.png)
We can use this same Boolean array to index into another array. For example,
here we show the discipline *names* corresponding to the courses with Easiness
scores greater than 3:
```{python}
disciplines[greater_than_3]
```
See the picture below for an illustration of how this works:
![](images/easiness_reused.png)
## Setting values with Boolean arrays
You have seen, above, that Boolean indexing can select values from an array:
```{python}
# Create the Boolean array
another_array = np.array([2, 3, 4, 2, 1, 5, 1, 0, 3])
are_gt_2 = another_array > 2
are_gt_2
```
```{python}
# Get the values by indexing with the Boolean array.
# Return only the values of 'another_array' where the Boolean array has True.
another_array[are_gt_2]
```
Given what you know, what do you think would happen with:
```
another_array[are_gt_2] = 10
another_array
```
Try it.