Skip to content

Commit 04948e0

Browse files
authored
Merge pull request #253 from Smit-create/ar1-bayes-1
AR1 Bayes Fixes
2 parents 411cd08 + 8033652 commit 04948e0

File tree

1 file changed

+49
-48
lines changed

1 file changed

+49
-48
lines changed

lectures/ar1_bayes.md

Lines changed: 49 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -43,96 +43,96 @@ logger.setLevel(logging.CRITICAL)
4343
4444
```
4545

46-
This lecture uses Bayesian methods offered by [pymc](https://www.pymc.io/projects/docs/en/stable/) and [numpyro](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
46+
This lecture uses Bayesian methods offered by [pymc](https://www.pymc.io/projects/docs/en/stable/) and [numpyro](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
4747

4848

49-
The model is a good laboratory for illustrating
49+
The model is a good laboratory for illustrating
5050
consequences of alternative ways of modeling the distribution of the initial $y_0$:
5151

5252
- As a fixed number
53-
53+
5454
- As a random variable drawn from the stationary distribution of the $\{y_t\}$ stochastic process
5555

5656

57-
The first component of statistical model is
57+
The first component of the statistical model is
5858

59-
$$
60-
y_{t+1} = \rho y_t + \sigma_x \epsilon_{t+1}, \quad t \geq 0
59+
$$
60+
y_{t+1} = \rho y_t + \sigma_x \epsilon_{t+1}, \quad t \geq 0
6161
$$ (eq:themodel)
6262
63-
where the scalars $\rho$ and $\sigma_x$ satisfy $|\rho| < 1$ and $\sigma_x > 0$;
63+
where the scalars $\rho$ and $\sigma_x$ satisfy $|\rho| < 1$ and $\sigma_x > 0$;
6464
$\{\epsilon_{t+1}\}$ is a sequence of i.i.d. normal random variables with mean $0$ and variance $1$.
6565
6666
The second component of the statistical model is
6767
6868
$$
6969
y_0 \sim {\cal N}(\mu_0, \sigma_0^2)
70-
$$ (eq:themodel_2)
70+
$$ (eq:themodel_2)
7171
7272
7373
7474
Consider a sample $\{y_t\}_{t=0}^T$ governed by this statistical model.
7575
76-
The model
76+
The model
7777
implies that the likelihood function of $\{y_t\}_{t=0}^T$ can be **factored**:
7878
79-
$$
80-
f(y_T, y_{T-1}, \ldots, y_0) = f(y_T| y_{T-1}) f(y_{T-1}| y_{T-2}) \cdots f(y_1 | y_0 ) f(y_0)
79+
$$
80+
f(y_T, y_{T-1}, \ldots, y_0) = f(y_T| y_{T-1}) f(y_{T-1}| y_{T-2}) \cdots f(y_1 | y_0 ) f(y_0)
8181
$$
8282
8383
where we use $f$ to denote a generic probability density.
8484
85-
The statistical model {eq}`eq:themodel`-{eq}`eq:themodel_2` implies
85+
The statistical model {eq}`eq:themodel`-{eq}`eq:themodel_2` implies
8686
8787
$$
8888
\begin{aligned}
8989
f(y_t | y_{t-1}) & \sim {\mathcal N}(\rho y_{t-1}, \sigma_x^2) \\
9090
f(y_0) & \sim {\mathcal N}(\mu_0, \sigma_0^2)
91-
\end{aligned}
91+
\end{aligned}
9292
$$
9393
9494
We want to study how inferences about the unknown parameters $(\rho, \sigma_x)$ depend on what is assumed about the parameters $\mu_0, \sigma_0$ of the distribution of $y_0$.
9595
96-
Below, we study two widely used alternative assumptions:
96+
Below, we study two widely used alternative assumptions:
9797
9898
- $(\mu_0,\sigma_0) = (y_0, 0)$ which means that $y_0$ is drawn from the distribution ${\mathcal N}(y_0, 0)$; in effect, we are **conditioning on an observed initial value**.
9999
100100
- $\mu_0,\sigma_0$ are functions of $\rho, \sigma_x$ because $y_0$ is drawn from the stationary distribution implied by $\rho, \sigma_x$.
101-
102101
103-
102+
103+
104104
**Note:** We do **not** treat a third possible case in which $\mu_0,\sigma_0$ are free parameters to be estimated.
105-
106-
Unknown parameters are $\rho, \sigma_x$.
105+
106+
Unknown parameters are $\rho, \sigma_x$.
107107
108108
We have independent **prior probability distributions** for $\rho, \sigma_x$ and want to compute a posterior probability distribution after observing a sample $\{y_{t}\}_{t=0}^T$.
109109
110-
The notebook uses `pymc4` and `numpyro` to compute a posterior distribution of $\rho, \sigma_x$.
110+
The notebook uses `pymc4` and `numpyro` to compute a posterior distribution of $\rho, \sigma_x$. We will use NUTS samplers to generate samples from the posterior in a chain. Both of these libraries support NUTS samplers.
111111
112+
NUTS is a form of Monte Carlo Markov Chain (MCMC) algorithm that bypasses random walk behaviour and allows for convergence to a target distribution more quickly. This not only has the advantage of speed, but allows for complex models to be fitted without having to employ specialised knowledge regarding the theory underlying those fitting methods.
112113
113-
Thus, we explore consequences of making these alternative assumptions about the distribution of $y_0$:
114+
Thus, we explore consequences of making these alternative assumptions about the distribution of $y_0$:
114115
115-
- A first procedure is to condition on whatever value of $y_0$ is observed. This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
116+
- A first procedure is to condition on whatever value of $y_0$ is observed. This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
116117
117-
- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
118-
so that $y_0 \sim {\cal N} \left(0, {\sigma_x^2\over (1-\rho)^2} \right) $
118+
- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
119+
so that $y_0 \sim {\cal N} \left(0, {\sigma_x^2\over (1-\rho)^2} \right) $
119120
120121
When the initial value $y_0$ is far out in a tail of the stationary distribution, conditioning on an initial value gives a posterior that is **more accurate** in a sense that we'll explain.
121122
122-
Basically, when $y_0$ happens to be in a tail of the stationary distribution and we **don't condition on $y_0$**, the likelihood function for $\{y_t\}_{t=0}^T$ adjusts the posterior distribution of the parameter pair $\rho, \sigma_x $ to make the observed value of $y_0$ more likely than it really is under the stationary distribution, thereby adversely twisting the posterior in short samples.
123+
Basically, when $y_0$ happens to be in a tail of the stationary distribution and we **don't condition on $y_0$**, the likelihood function for $\{y_t\}_{t=0}^T$ adjusts the posterior distribution of the parameter pair $\rho, \sigma_x $ to make the observed value of $y_0$ more likely than it really is under the stationary distribution, thereby adversely twisting the posterior in short samples.
123124
124125
An example below shows how not conditioning on $y_0$ adversely shifts the posterior probability distribution of $\rho$ toward larger values.
125126
126127
127-
We begin by solving a **direct problem** that simulates an AR(1) process.
128+
We begin by solving a **direct problem** that simulates an AR(1) process.
128129
129130
How we select the initial value $y_0$ matters.
130131
131-
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$. Why? Because $y_0$ contains information
132-
about $\rho, \sigma_x$.
133-
132+
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$. Why? Because $y_0$ contains information about $\rho, \sigma_x$.
133+
134134
* If we suspect that $y_0$ is far in the tails of the stationary distribution -- so that variation in early observations in the sample have a significant **transient component** -- it is better to condition on $y_0$ by setting $f(y_0) = 1$.
135-
135+
136136
137137
To illustrate the issue, we'll begin by choosing an initial $y_0$ that is far out in a tail of the stationary distribution.
138138
@@ -148,7 +148,7 @@ def ar1_simulate(rho, sigma, y0, T):
148148
y[0] = y0
149149
for t in range(1, T):
150150
y[t] = rho*y[t-1] + eps[t]
151-
151+
152152
return y
153153
154154
sigma = 1.
@@ -164,23 +164,23 @@ plt.plot(y)
164164
plt.tight_layout()
165165
```
166166
167-
Now we shall use Bayes' law to construct a posterior distribution, conditioning on the initial value of $y_0$.
167+
Now we shall use Bayes' law to construct a posterior distribution, conditioning on the initial value of $y_0$.
168168
169169
(Later we'll assume that $y_0$ is drawn from the stationary distribution, but not now.)
170170
171171
First we'll use **pymc4**.
172172
173173
## PyMC Implementation
174174
175-
For a normal distribution in `pymc`,
175+
For a normal distribution in `pymc`,
176176
$var = 1/\tau = \sigma^{2}$.
177177
178178
```{code-cell} ipython3
179179
180180
AR1_model = pmc.Model()
181181
182182
with AR1_model:
183-
183+
184184
# Start with priors
185185
rho = pmc.Uniform('rho', lower=-1., upper=1.) # Assume stable rho
186186
sigma = pmc.HalfNormal('sigma', sigma = np.sqrt(10))
@@ -192,6 +192,8 @@ with AR1_model:
192192
y_like = pmc.Normal('y_obs', mu=yhat, sigma=sigma, observed=y[1:])
193193
```
194194
195+
[pmc.sample](https://www.pymc.io/projects/docs/en/latest/api/generated/pymc.sample.html?highlight=sample#pymc.sample) by default uses the NUTS samplers to generate samples as shown in the below cell:
196+
195197
```{code-cell} ipython3
196198
:tag: [hide-output]
197199
@@ -204,31 +206,31 @@ with AR1_model:
204206
az.plot_trace(trace, figsize=(17,6))
205207
```
206208
207-
Evidently, the posteriors aren't centered on the true values of $.5, 1$ that we used to generate the data.
209+
Evidently, the posteriors aren't centered on the true values of $.5, 1$ that we used to generate the data.
208210
209-
This is is a symptom of the classic **Hurwicz bias** for first order autorgressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`.)
211+
This is a symptom of the classic **Hurwicz bias** for first order autoregressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`.)
210212
211-
The Hurwicz bias is worse the smaller is the sample (see {cite}`Orcutt_Winokur_69`.)
213+
The Hurwicz bias is worse the smaller is the sample (see {cite}`Orcutt_Winokur_69`).
212214
213215
214216
Be that as it may, here is more information about the posterior.
215217
216218
```{code-cell} ipython3
217219
with AR1_model:
218220
summary = az.summary(trace, round_to=4)
219-
221+
220222
summary
221223
```
222224
223225
Now we shall compute a posterior distribution after seeing the same data but instead assuming that $y_0$ is drawn from the stationary distribution.
224226
225-
This means that
227+
This means that
226228
227229
$$
228230
y_0 \sim N \left(0, \frac{\sigma_x^{2}}{1 - \rho^{2}} \right)
229231
$$
230232
231-
We alter the code as follows:
233+
We alter the code as follows:
232234
233235
```{code-cell} ipython3
234236
AR1_model_y0 = pmc.Model()
@@ -238,10 +240,10 @@ with AR1_model_y0:
238240
# Start with priors
239241
rho = pmc.Uniform('rho', lower=-1., upper=1.) # Assume stable rho
240242
sigma = pmc.HalfNormal('sigma', sigma=np.sqrt(10))
241-
243+
242244
# Standard deviation of ergodic y
243245
y_sd = sigma / np.sqrt(1 - rho**2)
244-
246+
245247
# yhat
246248
yhat = rho * y[:-1]
247249
y_data = pmc.Normal('y_obs', mu=yhat, sigma=sigma, observed=y[1:])
@@ -265,7 +267,7 @@ with AR1_model_y0:
265267
```{code-cell} ipython3
266268
with AR1_model:
267269
summary_y0 = az.summary(trace_y0, round_to=4)
268-
270+
269271
summary_y0
270272
```
271273
@@ -278,7 +280,7 @@ that make observations more likely.
278280
279281
We'll return to this issue after we use `numpyro` to compute posteriors under our two alternative assumptions about the distribution of $y_0$.
280282
281-
We'll now repeat the calculations using `numpyro`.
283+
We'll now repeat the calculations using `numpyro`.
282284
283285
## Numpyro Implementation
284286
@@ -319,10 +321,10 @@ def AR1_model(data):
319321
320322
# Expected value of y at the next period (rho * y)
321323
yhat = rho * data[:-1]
322-
324+
323325
# Likelihood of the actual realization.
324326
y_data = numpyro.sample('y_obs', dist.Normal(loc=yhat, scale=sigma), obs=data[1:])
325-
327+
326328
```
327329
328330
```{code-cell} ipython3
@@ -347,7 +349,7 @@ plot_posterior(mcmc.get_samples())
347349
mcmc.print_summary()
348350
```
349351
350-
Next, we again compute the posterior under the assumption that $y_0$ is drawn from the stationary distribution, so that
352+
Next, we again compute the posterior under the assumption that $y_0$ is drawn from the stationary distribution, so that
351353
352354
$$
353355
y_0 \sim N \left(0, \frac{\sigma_x^{2}}{1 - \rho^{2}} \right)
@@ -366,7 +368,7 @@ def AR1_model_y0(data):
366368
367369
# Expected value of y at the next period (rho * y)
368370
yhat = rho * data[:-1]
369-
371+
370372
# Likelihood of the actual realization.
371373
y_data = numpyro.sample('y_obs', dist.Normal(loc=yhat, scale=sigma), obs=data[1:])
372374
y0_data = numpyro.sample('y0_obs', dist.Normal(loc=0., scale=y_sd), obs=data[0])
@@ -402,4 +404,3 @@ is telling `numpyro` to explain what it interprets as "explosive" observations
402404
Bayes' Law is able to generate a plausible likelihood for the first observation by driving $\rho \rightarrow 1$ and $\sigma \uparrow$ in order to raise the variance of the stationary distribution.
403405
404406
Our example illustrates the importance of what you assume about the distribution of initial conditions.
405-

0 commit comments

Comments
 (0)