Skip to content

Commit 8655c10

Browse files
committed
Update ml-pca.md
1 parent 8abb152 commit 8655c10

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

doc/python/ml-pca.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ jupyter:
66
extension: .md
77
format_name: markdown
88
format_version: '1.3'
9-
jupytext_version: 1.14.1
9+
jupytext_version: 1.16.1
1010
kernelspec:
11-
display_name: Python 3
11+
display_name: Python 3 (ipykernel)
1212
language: python
1313
name: python3
1414
language_info:
@@ -20,7 +20,7 @@ jupyter:
2020
name: python
2121
nbconvert_exporter: python
2222
pygments_lexer: ipython3
23-
version: 3.8.8
23+
version: 3.10.11
2424
plotly:
2525
description: Visualize Principle Component Analysis (PCA) of your high-dimensional
2626
data in Python with Plotly.
@@ -105,17 +105,17 @@ fig.show()
105105

106106
When you will have too many features to visualize, you might be interested in only visualizing the most relevant components. Those components often capture a majority of the [explained variance](https://en.wikipedia.org/wiki/Explained_variation), which is a good way to tell if those components are sufficient for modelling this dataset.
107107

108-
In the example below, our dataset contains 10 features, but we only select the first 4 components, since they explain over 99% of the total variance.
108+
In the example below, our dataset contains 8 features, but we only select the first 2 components.
109109

110110
```python
111111
import pandas as pd
112112
import plotly.express as px
113113
from sklearn.decomposition import PCA
114-
from sklearn.datasets import load_boston
114+
from sklearn.datasets import fetch_california_housing
115115

116-
boston = load_boston()
117-
df = pd.DataFrame(boston.data, columns=boston.feature_names)
118-
n_components = 4
116+
housing = fetch_california_housing(as_frame=True)
117+
df = housing.data
118+
n_components = 2
119119

120120
pca = PCA(n_components=n_components)
121121
components = pca.fit_transform(df)
@@ -127,7 +127,7 @@ labels['color'] = 'Median Price'
127127

128128
fig = px.scatter_matrix(
129129
components,
130-
color=boston.target,
130+
color=housing.target,
131131
dimensions=range(n_components),
132132
labels=labels,
133133
title=f'Total Explained Variance: {total_var:.2f}%',

0 commit comments

Comments
 (0)