-
Notifications
You must be signed in to change notification settings - Fork 13
/
Copy pathboolean_arrays.Rmd
170 lines (129 loc) · 4.39 KB
/
boolean_arrays.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
jupyter:
jupytext:
notebook_metadata_filter: all,-language_info
split_at_heading: true
text_representation:
extension: .Rmd
format_name: rmarkdown
format_version: '1.2'
jupytext_version: 1.10.3
kernelspec:
display_name: Python 3
language: python
name: python3
---
# Boolean arrays
```{python}
import numpy as np
```
Remember the problem of the [onsets and reaction times](numpy_intro.Rmd).
We had the task of calculating the onset times of trials, given a file of trial
inter-stimulus intervals, and response times.
```{python}
import nipraxis
# Fetch the file.
stim_fname = nipraxis.fetch_file('24719.f3_beh_CHYM.csv')
# Show the filename.
stim_fname
```
We got the data using the Pandas library:
```{python}
# Get the Pandas module, rename as "pd"
import pandas as pd
# Read the data file into a data frame.
data = pd.read_csv(stim_fname)
# Show the result
data
```
There is one row for each trial. The columns we are interested in are:
* `response_time` — the reaction time for their response (milliseconds after
the stimulus, 0 if no response)
* `trial_ISI` — the time between the *previous* stimulus and this one (the
Interstimulus Interval). For the first stimulus this is the time from the
start of the experimental software.
```{python}
response_times = np.array(data['response_time'])
trial_isis = np.array(data['trial_ISI'])
```
We then calculated the onset times of each trial relative to the start of the
scanning run. The scanning run started 4000 milliseconds before the
experimental software.
```{python}
exp_onsets = np.cumsum(trial_isis)
scanner_onsets = exp_onsets + 4000
scanner_onsets[:15]
```
We then wanted to calculate the onset times of each response, relative to the
scanner start. The response times for each trial are relative to the start of
the trial, so we can add the response
```{python}
# Same result from adding the two arrays with the same shape.
scanner_response_onsets = scanner_onsets + response_times
scanner_response_onsets[:15]
```
## Boolean arrays
As you remember, many of the response time values are 0 indicating no response:
```{python}
first_15_rts = response_times[:15]
first_15_rts
```
We would like to select the response onsets corresponding to not 0
`response_times`.
We can use Boolean arrays to do this.
This is just a taster of selecting with Boolean arrays. See [Boolean
indexing](boolean_indexing) for more.
Boolean arrays are arrays that contain values that are one of the two Boolean
values `True` or `False`.
Remember {ref}`Boolean values <true-and-false>`, and
{ref}`comparison-operators` from {doc}`brisk_python`. We can be use comparison
operators on arrays, to create Boolean arrays.
Let's start by looking at the first 15 reaction times:
```{python}
first_15_rts
```
Remember that comparisons are operators that give answers to a *comparison
question*. This is how comparisons work on individual values:
```{python}
first_15_rts[0] > 0
```
What do you think will happen if we do the comparison on the whole array, like this?
```python
first_15_rts > 0
```
You have seen how Numpy works when adding a single number to an array — it
takes this to mean that you want to add that number *to every element in the
array*.
Comparisons work the same way:
```{python}
first_15_rts_not_zero = first_15_rts > 0
first_15_rts_not_zero
```
This is the result of asking the comparison question `> 0` of *every element in
the array*.
So the values that end up in the `first_15_rts_not_zero` array come from these
comparisons:
```{python}
print('Position 0:', first_15_rts[0] > 0)
print('Position 1:', first_15_rts[1] > 0)
print(' ... and so on, up to ...')
print('Position 13:', first_15_rts[13] > 0)
print('Position 14:', first_15_rts[14] > 0)
```
Here is the equivalent array for all the reaction times:
```{python}
rts_not_zero = response_times > 0
# Show the first 50 values.
rts_not_zero[:50]
```
We will [soon see](boolean_indexing) that we can use these arrays to select
elements from other arrays.
Specifically, if we put a Boolean array like `rts_not_zero` between square
brackets for another array, that will have the effect of selecting the elements
at positions where `rts_not_zero` has True, and throwing away elements where
`rts_not_zero` has False.
For example, rushing ahead, we can select the values in `rt_arr` corresponding
to reaction times greater than zero with:
```{python}
response_times[rts_not_zero]
```