@@ -131,7 +131,10 @@ What vector within a linear subspace of $\mathbb R^n$ best approximates a given
131
131
132
132
The next theorem answers this question.
133
133
134
- ** Theorem** (OPT) Given $y \in \mathbb R^n$ and linear subspace $S \subset \mathbb R^n$,
134
+ ``` {prf:theorem} Orthogonal Projection Theorem
135
+ :label: opt
136
+
137
+ Given $y \in \mathbb R^n$ and linear subspace $S \subset \mathbb R^n$,
135
138
there exists a unique solution to the minimization problem
136
139
137
140
$$
@@ -144,6 +147,7 @@ The minimizer $\hat y$ is the unique vector in $\mathbb R^n$ that satisfies
144
147
* $y - \hat y \perp S$
145
148
146
149
The vector $\hat y$ is called the **orthogonal projection** of $y$ onto $S$.
150
+ ```
147
151
148
152
The next figure provides some intuition
149
153
179
183
y \in Y\; \mapsto \text{ its orthogonal projection } \hat y \in S
180
184
$$
181
185
182
- By the OPT , this is a well-defined mapping or * operator* from $\mathbb R^n$ to $\mathbb R^n$.
186
+ By the {prf : ref } ` opt ` , this is a well-defined mapping or * operator* from $\mathbb R^n$ to $\mathbb R^n$.
183
187
184
188
In what follows we denote this operator by a matrix $P$
185
189
@@ -192,7 +196,7 @@ The operator $P$ is called the **orthogonal projection mapping onto** $S$.
192
196
193
197
```
194
198
195
- It is immediate from the OPT that for any $y \in \mathbb R^n$
199
+ It is immediate from the {prf : ref } ` opt ` that for any $y \in \mathbb R^n$
196
200
197
201
1 . $P y \in S$ and
198
202
1 . $y - P y \perp S$
@@ -224,16 +228,20 @@ such that $y = x_1 + x_2$.
224
228
225
229
Moreover, $x_1 = \hat E_S y$ and $x_2 = y - \hat E_S y$.
226
230
227
- This amounts to another version of the OPT :
231
+ This amounts to another version of the {prf : ref } ` opt ` :
228
232
229
- ** Theorem** . If $S$ is a linear subspace of $\mathbb R^n$, $\hat E_S y = P y$ and $\hat E_ {S^{\perp}} y = M y$, then
233
+ ``` {prf:theorem} Orthogonal Projection Theorem (another version)
234
+ :label: opt_another
235
+
236
+ If $S$ is a linear subspace of $\mathbb R^n$, $\hat E_S y = P y$ and $\hat E_{S^{\perp}} y = M y$, then
230
237
231
238
$$
232
239
P y \perp M y
233
240
\quad \text{and} \quad
234
241
y = P y + M y
235
242
\quad \text{for all } \, y \in \mathbb R^n
236
243
$$
244
+ ```
237
245
238
246
The next figure illustrates
239
247
@@ -285,7 +293,7 @@ Combining this result with {eq}`pob` verifies the claim.
285
293
286
294
When a subspace onto which we project is orthonormal, computing the projection simplifies:
287
295
288
- ** Theorem ** If $\{ u_1, \ldots, u_k\} $ is an orthonormal basis for $S$, then
296
+ ``` {prf:theorem} If $\{ u_1, \ldots, u_k\} $ is an orthonormal basis for $S$, then
289
297
290
298
```{math}
291
299
:label: exp_for_op
@@ -294,8 +302,9 @@ P y = \sum_{i=1}^k \langle y, u_i \rangle u_i,
294
302
\quad
295
303
\forall \; y \in \mathbb R^n
296
304
```
305
+ ```
297
306
298
- Proof: Fix $y \in \mathbb R^n$ and let $P y$ be defined as in {eq}` exp_for_op ` .
307
+ ```{prf:proof} Fix $y \in \mathbb R^n$ and let $P y$ be defined as in {eq}`exp_for_op`.
299
308
300
309
Clearly, $P y \in S$.
301
310
312
321
$$
313
322
314
323
(Why is this sufficient to establish the claim that $y - P y \perp S$?)
324
+ ```
315
325
316
326
## Projection Via Matrix Algebra
317
327
@@ -327,13 +337,17 @@ Evidently $Py$ is a linear function from $y \in \mathbb R^n$ to $P y \in \mathb
327
337
328
338
[ This reference] ( https://en.wikipedia.org/wiki/Linear_map#Matrices ) is useful.
329
339
330
- ** Theorem.** Let the columns of $n \times k$ matrix $X$ form a basis of $S$. Then
340
+ ``` {prf:theorem}
341
+ :label: proj_matrix
342
+
343
+ Let the columns of $n \times k$ matrix $X$ form a basis of $S$. Then
331
344
332
345
$$
333
346
P = X (X'X)^{-1} X'
334
347
$$
348
+ ```
335
349
336
- Proof: Given arbitrary $y \in \mathbb R^n$ and $P = X (X'X)^{-1} X'$, our claim is that
350
+ ``` {prf:proof} Given arbitrary $y \in \mathbb R^n$ and $P = X (X'X)^{-1} X'$, our claim is that
337
351
338
352
1. $P y \in S$, and
339
353
2. $y - P y \perp S$
367
381
$$
368
382
369
383
The proof is now complete.
384
+ ```
370
385
371
386
### Starting with the Basis
372
387
378
393
379
394
Then the columns of $X$ form a basis of $S$.
380
395
381
- From the preceding theorem , $P = X (X' X)^{-1} X' y$ projects $y$ onto $S$.
396
+ From the {prf : ref } ` proj_matrix ` , $P = X (X' X)^{-1} X' y$ projects $y$ onto $S$.
382
397
383
398
In this context, $P$ is often called the ** projection matrix**
384
399
@@ -428,15 +443,16 @@ By approximate solution, we mean a $b \in \mathbb R^k$ such that $X b$ is close
428
443
429
444
The next theorem shows that a best approximation is well defined and unique.
430
445
431
- The proof uses the OPT .
446
+ The proof uses the {prf : ref } ` opt ` .
432
447
433
- ** Theorem ** The unique minimizer of $\| y - X b \| $ over $b \in \mathbb R^K$ is
448
+ ``` {prf:theorem} The unique minimizer of $\| y - X b \| $ over $b \in \mathbb R^K$ is
434
449
435
450
$$
436
451
\hat \beta := (X' X)^{-1} X' y
437
452
$$
453
+ ```
438
454
439
- Proof: Note that
455
+ ``` {prf:proof} Note that
440
456
441
457
$$
442
458
X \hat \beta = X (X' X)^{-1} X' y =
458
474
$$
459
475
460
476
This is what we aimed to show.
477
+ ```
461
478
462
479
## Least Squares Regression
463
480
@@ -594,9 +611,9 @@ Here are some more standard definitions:
594
611
595
612
> TSS = ESS + SSR
596
613
597
- We can prove this easily using the OPT .
614
+ We can prove this easily using the {prf : ref } ` opt ` .
598
615
599
- From the OPT we have $y = \hat y + \hat u$ and $\hat u \perp \hat y$.
616
+ From the {prf : ref } ` opt ` we have $y = \hat y + \hat u$ and $\hat u \perp \hat y$.
600
617
601
618
Applying the Pythagorean law completes the proof.
602
619
@@ -611,7 +628,7 @@ The next section gives details.
611
628
(gram_schmidt)=
612
629
### Gram-Schmidt Orthogonalization
613
630
614
- ** Theorem ** For each linearly independent set $\{ x_1, \ldots, x_k\} \subset \mathbb R^n$, there exists an
631
+ ``` {prf:theorem} For each linearly independent set $\{ x_1, \ldots, x_k\} \subset \mathbb R^n$, there exists an
615
632
orthonormal set $\{u_1, \ldots, u_k\}$ with
616
633
617
634
$$
620
637
\quad \text{for} \quad
621
638
i = 1, \ldots, k
622
639
$$
640
+ ```
623
641
624
642
The ** Gram-Schmidt orthogonalization** procedure constructs an orthogonal set $\{ u_1, u_2, \ldots, u_n\} $.
625
643
@@ -639,12 +657,13 @@ In some exercises below, you are asked to implement this algorithm and test it u
639
657
640
658
The following result uses the preceding algorithm to produce a useful decomposition.
641
659
642
- ** Theorem ** If $X$ is $n \times k$ with linearly independent columns, then there exists a factorization $X = Q R$ where
660
+ ``` {prf:theorem} If $X$ is $n \times k$ with linearly independent columns, then there exists a factorization $X = Q R$ where
643
661
644
662
* $R$ is $k \times k$, upper triangular, and nonsingular
645
663
* $Q$ is $n \times k$ with orthonormal columns
664
+ ```
646
665
647
- Proof sketch: Let
666
+ ``` {prf:proof} Let
648
667
649
668
* $x_j := \col_j (X)$
650
669
* $\{u_1, \ldots, u_k\}$ be orthonormal with the same span as $\{x_1, \ldots, x_k\}$ (to be constructed using Gram--Schmidt)
@@ -658,6 +677,7 @@ x_j = \sum_{i=1}^j \langle u_i, x_j \rangle u_i
658
677
$$
659
678
660
679
Some rearranging gives $X = Q R$.
680
+ ```
661
681
662
682
### Linear Regression via QR Decomposition
663
683
0 commit comments