-
Notifications
You must be signed in to change notification settings - Fork 13
/
Copy pathmatrix_rank.Rmd
137 lines (106 loc) · 3.51 KB
/
matrix_rank.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
jupyter:
jupytext:
notebook_metadata_filter: all,-language_info
split_at_heading: true
text_representation:
extension: .Rmd
format_name: rmarkdown
format_version: '1.2'
jupytext_version: 1.13.7
kernelspec:
display_name: Python 3
language: python
name: python3
orphan: true
---
# Matrix rank
The *rank* of a matrix is the number of independent rows and / or columns of a
matrix.
We will soon define what we mean by the word *independent*.
For a matrix with more columns than rows, it is the number of independent
rows.
For a matrix with more rows than columns, like a design matrix, it is the
number of independent columns.
In fact, linear algebra tells us that it is impossible to have more
independent columns than there are rows, or more independent rows than there
are columns. Try it with some test matrices.
A column is *dependent* on other columns if the values in the column can
be generated by a weighted sum of one or more other columns.
To put this more formally - let’s say we have a matrix $\mathbf{X}$ with
$M$ rows and $N$ columns. Write column $i$ of
$\mathbf{X}$ as $X_{:,i}$. Column $i$ is *independent* of
the rest of $\mathbf{X}$ if there is no length $N$ column vector
of weights $\vec{c}$, where $c_i = 0$, such that $\mathbf{X}
\cdot \vec{c} = X_{:,i}$.
Let’s make a design with independent columns:
```{python}
#: Standard imports
import numpy as np
# Make numpy print 4 significant digits for prettiness
np.set_printoptions(precision=4, suppress=True)
import matplotlib.pyplot as plt
# Default to gray colormap
import matplotlib
matplotlib.rcParams['image.cmap'] = 'gray'
```
```{python}
trend = np.linspace(0, 1, 10)
X = np.ones((10, 3))
X[:, 0] = trend
X[:, 1] = trend ** 2
plt.imshow(X)
```
In this case, no column can be generated by a weighted sum of the other two.
We can test this with `np.linalg.matrix_rank`:
```{python}
import numpy.linalg as npl
npl.matrix_rank(X)
```
This does not mean the columns are orthogonal:
```{python}
# Orthogonal columns have dot products of zero
X.T @ X
```
Nor does it mean that the columns have zero correlation (see
[Correlation and projection](https://matthew-brett.github.io/teaching/correlation_projection.html) for the relationship between correlation and the
vector dot product):
```{python}
np.corrcoef(X[:,0], X[:, 1])
```
As long as each column cannot be *fully* predicted by the others, the column
is independent.
Now let’s add a fourth column that is a weighted sum of the first three:
```{python}
X_not_full_rank = np.zeros((10, 4))
X_not_full_rank[:, :3] = X
X_not_full_rank[:, 3] = X @ [-1, 0.5, 0.5]
plt.imshow(X_not_full_rank)
```
`matrix_rank` is up to the job:
```{python}
npl.matrix_rank(X_not_full_rank)
```
A more typical situation with design matrices, is that we have some dummy
variable columns coding for group membership, that sum up to a column of ones.
```{python}
dummies = np.kron(np.eye(3), np.ones((4, 1)))
plt.imshow(dummies)
```
So far, so good:
```{python}
npl.matrix_rank(dummies)
```
If we add a column of ones to model the mean, we now have an extra column that
is a linear combination of other columns in the model:
```{python}
dummies_with_mean = np.hstack((dummies, np.ones((12, 1))))
plt.imshow(dummies_with_mean)
```
```{python}
npl.matrix_rank(dummies_with_mean)
```
A matrix is *full rank* if the matrix rank is the same as the number of
columns / rows. That is, a matrix is full rank if all the columns (or rows)
are independent.
If a matrix is not full rank then it is *rank deficient*.