You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This lecture uses Bayesian methods offered by [pymc](https://www.pymc.io/projects/docs/en/stable/) and [numpyro](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
46
+
This lecture uses Bayesian methods offered by [pymc](https://www.pymc.io/projects/docs/en/stable/) and [numpyro](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
47
47
48
48
49
-
The model is a good laboratory for illustrating
49
+
The model is a good laboratory for illustrating
50
50
consequences of alternative ways of modeling the distribution of the initial $y_0$:
51
51
52
52
- As a fixed number
53
-
53
+
54
54
- As a random variable drawn from the stationary distribution of the $\{y_t\}$ stochastic process
We want to study how inferences about the unknown parameters $(\rho, \sigma_x)$ depend on what is assumed about the parameters $\mu_0, \sigma_0$ of the distribution of $y_0$.
95
95
96
-
Below, we study two widely used alternative assumptions:
96
+
Below, we study two widely used alternative assumptions:
97
97
98
98
- $(\mu_0,\sigma_0) = (y_0, 0)$ which means that $y_0$ is drawn from the distribution ${\mathcal N}(y_0, 0)$; in effect, we are **conditioning on an observed initial value**.
99
99
100
100
- $\mu_0,\sigma_0$ are functions of $\rho, \sigma_x$ because $y_0$ is drawn from the stationary distribution implied by $\rho, \sigma_x$.
101
-
102
101
103
-
102
+
103
+
104
104
**Note:** We do **not** treat a third possible case in which $\mu_0,\sigma_0$ are free parameters to be estimated.
105
-
106
-
Unknown parameters are $\rho, \sigma_x$.
105
+
106
+
Unknown parameters are $\rho, \sigma_x$.
107
107
108
108
We have independent **prior probability distributions** for $\rho, \sigma_x$ and want to compute a posterior probability distribution after observing a sample $\{y_{t}\}_{t=0}^T$.
109
109
110
-
The notebook uses `pymc4` and `numpyro` to compute a posterior distribution of $\rho, \sigma_x$.
110
+
The notebook uses `pymc4` and `numpyro` to compute a posterior distribution of $\rho, \sigma_x$. We will use NUTS samplers to generate samples from the posterior in a chain. Both of these libraries support NUTS samplers.
111
111
112
+
NUTS is a form of Monte Carlo Markov Chain (MCMC) algorithm that bypasses random walk behaviour and allows for convergence to a target distribution more quickly. This not only has the advantage of speed, but allows for complex models to be fitted without having to employ specialised knowledge regarding the theory underlying those fitting methods.
112
113
113
-
Thus, we explore consequences of making these alternative assumptions about the distribution of $y_0$:
114
+
Thus, we explore consequences of making these alternative assumptions about the distribution of $y_0$:
114
115
115
-
- A first procedure is to condition on whatever value of $y_0$ is observed. This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
116
+
- A first procedure is to condition on whatever value of $y_0$ is observed. This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
116
117
117
-
- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
118
-
so that $y_0 \sim {\cal N} \left(0, {\sigma_x^2\over (1-\rho)^2} \right) $
118
+
- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
119
+
so that $y_0 \sim {\cal N} \left(0, {\sigma_x^2\over (1-\rho)^2} \right) $
119
120
120
121
When the initial value $y_0$ is far out in a tail of the stationary distribution, conditioning on an initial value gives a posterior that is **more accurate** in a sense that we'll explain.
121
122
122
-
Basically, when $y_0$ happens to be in a tail of the stationary distribution and we **don't condition on $y_0$**, the likelihood function for $\{y_t\}_{t=0}^T$ adjusts the posterior distribution of the parameter pair $\rho, \sigma_x $ to make the observed value of $y_0$ more likely than it really is under the stationary distribution, thereby adversely twisting the posterior in short samples.
123
+
Basically, when $y_0$ happens to be in a tail of the stationary distribution and we **don't condition on $y_0$**, the likelihood function for $\{y_t\}_{t=0}^T$ adjusts the posterior distribution of the parameter pair $\rho, \sigma_x $ to make the observed value of $y_0$ more likely than it really is under the stationary distribution, thereby adversely twisting the posterior in short samples.
123
124
124
125
An example below shows how not conditioning on $y_0$ adversely shifts the posterior probability distribution of $\rho$ toward larger values.
125
126
126
127
127
-
We begin by solving a **direct problem** that simulates an AR(1) process.
128
+
We begin by solving a **direct problem** that simulates an AR(1) process.
128
129
129
130
How we select the initial value $y_0$ matters.
130
131
131
-
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$. Why? Because $y_0$ contains information
132
-
about $\rho, \sigma_x$.
133
-
132
+
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$. Why? Because $y_0$ contains information about $\rho, \sigma_x$.
133
+
134
134
* If we suspect that $y_0$ is far in the tails of the stationary distribution -- so that variation in early observations in the sample have a significant **transient component** -- it is better to condition on $y_0$ by setting $f(y_0) = 1$.
135
-
135
+
136
136
137
137
To illustrate the issue, we'll begin by choosing an initial $y_0$ that is far out in a tail of the stationary distribution.
[pmc.sample](https://www.pymc.io/projects/docs/en/latest/api/generated/pymc.sample.html?highlight=sample#pymc.sample) by default uses the NUTS samplers to generate samples as shown in the below cell:
196
+
195
197
```{code-cell} ipython3
196
198
:tag: [hide-output]
197
199
@@ -204,31 +206,31 @@ with AR1_model:
204
206
az.plot_trace(trace, figsize=(17,6))
205
207
```
206
208
207
-
Evidently, the posteriors aren't centered on the true values of $.5, 1$ that we used to generate the data.
209
+
Evidently, the posteriors aren't centered on the true values of $.5, 1$ that we used to generate the data.
208
210
209
-
This is is a symptom of the classic **Hurwicz bias** for first order autorgressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`.)
211
+
This is a symptom of the classic **Hurwicz bias** for first order autoregressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`.)
210
212
211
-
The Hurwicz bias is worse the smaller is the sample (see {cite}`Orcutt_Winokur_69`.)
213
+
The Hurwicz bias is worse the smaller is the sample (see {cite}`Orcutt_Winokur_69`).
212
214
213
215
214
216
Be that as it may, here is more information about the posterior.
215
217
216
218
```{code-cell} ipython3
217
219
with AR1_model:
218
220
summary = az.summary(trace, round_to=4)
219
-
221
+
220
222
summary
221
223
```
222
224
223
225
Now we shall compute a posterior distribution after seeing the same data but instead assuming that $y_0$ is drawn from the stationary distribution.
224
226
225
-
This means that
227
+
This means that
226
228
227
229
$$
228
230
y_0 \sim N \left(0, \frac{\sigma_x^{2}}{1 - \rho^{2}} \right)
@@ -402,4 +404,3 @@ is telling `numpyro` to explain what it interprets as "explosive" observations
402
404
Bayes' Law is able to generate a plausible likelihood for the first observation by driving $\rho \rightarrow 1$ and $\sigma \uparrow$ in order to raise the variance of the stationary distribution.
403
405
404
406
Our example illustrates the importance of what you assume about the distribution of initial conditions.
0 commit comments