Skip to content

Commit 40e322f

Browse files
committed
Add code examples
1 parent 53c2052 commit 40e322f

File tree

5 files changed

+145
-0
lines changed

5 files changed

+145
-0
lines changed

sklearn-train-test-split/README.md

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Split Your Dataset With scikit-learn's `train_test_split()`
2+
3+
The `train_test_split()` function in `sklearn` is a useful tool to prepare your dataset for machine learning tasks. This folder contains the code examples from the tutorial on [splitting your dataset with scikit-learn's `train_test_split()`]().
4+
5+
## Installation
6+
7+
1. Create a Python virtual environment
8+
9+
```sh
10+
$ python -m venv ./venv
11+
$ source venv/bin/activate
12+
(venv) $
13+
```
14+
15+
2. Install the requirements
16+
17+
```sh
18+
(venv) $ pip install -r requirements.txt
19+
```
20+
21+
## Run the Scripts
22+
23+
```sh
24+
(venv) $ python script_name.py
25+
```
26+
27+
## About the Author
28+
29+
Martin Breuss - Email: [email protected]
30+
31+
## License
32+
33+
Distributed under the MIT license. See ``LICENSE`` for more information.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
from sklearn.datasets import fetch_california_housing
2+
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
3+
from sklearn.linear_model import LinearRegression
4+
from sklearn.model_selection import train_test_split
5+
6+
x, y = fetch_california_housing(return_X_y=True)
7+
8+
x_train, x_test, y_train, y_test = train_test_split(
9+
x, y, test_size=0.4, random_state=0
10+
)
11+
12+
13+
model = LinearRegression().fit(x_train, y_train)
14+
print("LinearRegression:")
15+
print(model.score(x_train, y_train))
16+
print(model.score(x_test, y_test), end="\n\n")
17+
18+
model = GradientBoostingRegressor(random_state=0).fit(x_train, y_train)
19+
print("GradientBoostingRegressor:")
20+
print(model.score(x_train, y_train))
21+
print(model.score(x_test, y_test), end="\n\n")
22+
23+
model = RandomForestRegressor(random_state=0).fit(x_train, y_train)
24+
print("RandomForestRegressor:")
25+
print(model.score(x_train, y_train))
26+
print(model.score(x_test, y_test), end="\n\n")
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
import numpy as np
2+
from sklearn.model_selection import train_test_split
3+
4+
x = np.arange(1, 25).reshape(12, 2)
5+
y = np.array([0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0])
6+
7+
8+
x_train, x_test, y_train, y_test = train_test_split(x, y)
9+
print(x_train)
10+
print(x_test)
11+
print(y_train)
12+
print(y_test)
13+
14+
x_train, x_test, y_train, y_test = train_test_split(
15+
x, y, test_size=4, random_state=4
16+
)
17+
# Uncomment to view output
18+
# print(x_train)
19+
# print(x_test)
20+
# print(y_train)
21+
# print(y_test)
22+
23+
x_train, x_test, y_train, y_test = train_test_split(
24+
x, y, test_size=0.33, random_state=4, stratify=y
25+
)
26+
# Uncomment to view output
27+
# print(x_train)
28+
# print(x_test)
29+
# print(y_train)
30+
# print(y_test)
31+
32+
33+
x_train, x_test, y_train, y_test = train_test_split(
34+
x, y, test_size=0.33, shuffle=False
35+
)
36+
# Uncomment to view output
37+
# print(x_train)
38+
# print(x_test)
39+
# print(y_train)
40+
# print(y_test)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
import numpy as np
2+
from sklearn.linear_model import LinearRegression
3+
from sklearn.model_selection import train_test_split
4+
5+
x = np.arange(20).reshape(-1, 1)
6+
y = np.array(
7+
[
8+
5,
9+
12,
10+
11,
11+
19,
12+
30,
13+
29,
14+
23,
15+
40,
16+
51,
17+
54,
18+
74,
19+
62,
20+
68,
21+
73,
22+
89,
23+
84,
24+
89,
25+
101,
26+
99,
27+
106,
28+
]
29+
)
30+
31+
32+
x_train, x_test, y_train, y_test = train_test_split(
33+
x, y, test_size=8, random_state=0
34+
)
35+
36+
model = LinearRegression().fit(x_train, y_train)
37+
print(model.intercept_)
38+
print(model.coef_)
39+
40+
print(model.score(x_train, y_train))
41+
print(model.score(x_test, y_test))
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
joblib==1.4.2
2+
numpy==2.0.0
3+
scikit-learn==1.5.0
4+
scipy==1.14.0
5+
threadpoolctl==3.5.0

0 commit comments

Comments
 (0)