@@ -60,7 +60,7 @@ For an advanced treatment of projection in the context of least squares predicti
60
60
61
61
## Key Definitions
62
62
63
- Assume $x, z \in \mathbb R^n$.
63
+ Assume $x, z \in \mathbb R^n$.
64
64
65
65
Define $\langle x, z\rangle = \sum_i x_i z_i$.
66
66
@@ -86,7 +86,7 @@ The **orthogonal complement** of linear subspace $S \subset \mathbb R^n$ is the
86
86
87
87
```
88
88
89
- $S^\perp$ is a linear subspace of $\mathbb R^n$
89
+ $S^\perp$ is a linear subspace of $\mathbb R^n$
90
90
91
91
* To see this, fix $x, y \in S^{\perp}$ and $\alpha, \beta \in \mathbb R$.
92
92
* Observe that if $z \in S$, then
@@ -312,7 +312,7 @@ Clearly, $P y \in S$.
312
312
313
313
We claim that $y - P y \perp S$ also holds.
314
314
315
- It suffices to show that $y - P y \perp$ any basis vector $u_i$.
315
+ It suffices to show that $y - P y \perp u_i$ for any basis vector $u_i$.
316
316
317
317
This is true because
318
318
336
336
\hat E_S y = P y
337
337
$$
338
338
339
- Evidently $Py$ is a linear function from $y \in \mathbb R^n$ to $P y \in \mathbb R^n$.
339
+ Evidently $Py$ is a linear function from $y \in \mathbb R^n$ to $P y \in \mathbb R^n$.
340
340
341
341
[ This reference] ( https://en.wikipedia.org/wiki/Linear_map#Matrices ) is useful.
342
342
@@ -391,7 +391,7 @@ The proof is now complete.
391
391
It is common in applications to start with $n \times k$ matrix $X$ with linearly independent columns and let
392
392
393
393
$$
394
- S := \mathop{\mathrm{span}} X := \mathop{\mathrm{span}} \{\mathop{\mathrm{col}}_i X, \ldots, \mathop{\mathrm{col}}_k X \}
394
+ S := \mathop{\mathrm{span}} X := \mathop{\mathrm{span}} \{\mathop{\mathrm{col}}_1 X, \ldots, \mathop{\mathrm{col}}_k X \}
395
395
$$
396
396
397
397
Then the columns of $X$ form a basis of $S$.
@@ -433,7 +433,7 @@ Let $y \in \mathbb R^n$ and let $X$ be $n \times k$ with linearly independent co
433
433
434
434
Given $X$ and $y$, we seek $b \in \mathbb R^k$ that satisfies the system of linear equations $X b = y$.
435
435
436
- If $n > k$ (more equations than unknowns), then $b$ is said to be ** overdetermined** .
436
+ If $n > k$ (more equations than unknowns), then the system is said to be ** overdetermined** .
437
437
438
438
Intuitively, we may not be able to find a $b$ that satisfies all $n$ equations.
439
439
@@ -450,7 +450,7 @@ The proof uses the {prf:ref}`opt`.
450
450
451
451
``` {prf:theorem}
452
452
453
- The unique minimizer of $\| y - X b \|$ over $b \in \mathbb R^K $ is
453
+ The unique minimizer of $\| y - X b \|$ over $b \in \mathbb R^k $ is
454
454
455
455
$$
456
456
\hat \beta := (X' X)^{-1} X' y
@@ -475,7 +475,7 @@ Because $Xb \in \mathop{\mathrm{span}}(X)$
475
475
476
476
$$
477
477
\| y - X \hat \beta \|
478
- \leq \| y - X b \| \text{ for any } b \in \mathbb R^K
478
+ \leq \| y - X b \| \text{ for any } b \in \mathbb R^k
479
479
$$
480
480
481
481
This is what we aimed to show.
@@ -485,7 +485,7 @@ This is what we aimed to show.
485
485
486
486
Let's apply the theory of orthogonal projection to least squares regression.
487
487
488
- This approach provides insights about many geometric properties of linear regression.
488
+ This approach provides insights about many geometric properties of linear regression.
489
489
490
490
We treat only some examples.
491
491
700
700
\hat \beta
701
701
& = (R'Q' Q R)^{-1} R' Q' y \\
702
702
& = (R' R)^{-1} R' Q' y \\
703
- & = R^{-1} (R')^{-1} R' Q' y
704
- = R^{-1} Q' y
703
+ & = R^{-1} Q' y
705
704
\end{aligned}
706
705
$$
707
706
707
+ where the last step uses the fact that $(R' R)^{-1} R' = R^{-1}$ since $R$ is nonsingular.
708
+
708
709
Numerical routines would in this case use the alternative form $R \hat \beta = Q' y$ and back substitution.
709
710
710
711
## Exercises
@@ -817,14 +818,14 @@ def gram_schmidt(X):
817
818
U = np.empty((n, k))
818
819
I = np.eye(n)
819
820
820
- # The first columns of U is just the normalized first columns of X
821
- v1 = X[:,0]
821
+ # The first column of U is just the normalized first column of X
822
+ v1 = X[:, 0]
822
823
U[:, 0] = v1 / np.sqrt(np.sum(v1 * v1))
823
824
824
825
for i in range(1, k):
825
826
# Set up
826
827
b = X[:, i] # The vector we're going to project
827
- Z = X[:, 0 :i] # First i-1 columns of X
828
+ Z = X[:, :i] # First i-1 columns of X
828
829
829
830
# Project onto the orthogonal complement of the columns span of Z
830
831
M = I - Z @ np.linalg.inv(Z.T @ Z) @ Z.T
0 commit comments