Skip to content

Commit 3312356

Browse files
authored
add section permutation feature importance (#5)
* add section permutation feature importance * add assignments * add pdp nbs * add link to course, rerun to remove warnings * add link to course, rerun to remove warnings * add link to course, rerun to remove warnings * add link to course, rerun to remove warnings * add link to course, rerun to remove warnings * add link to course, unify assignments * add pdp notebooks * rename folder * tidy code
1 parent ab1f093 commit 3312356

File tree

45 files changed

+10725
-1717
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+10725
-1717
lines changed

Diff for: 04-linear-regression/1-linear-regression-model.ipynb

+6-12
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@
77
"source": [
88
"# Linear regression model - OLS\n",
99
"\n",
10+
"\n",
11+
"[Machine Learning Interpretability Course](https://www.trainindata.com/p/machine-learning-interpretability)\n",
12+
"\n",
1013
"In this notebook, we will compare the linear regression (OLS) implementations of `scikit-learn` and `statsmodels`.\n",
1114
"\n",
1215
"We will evaluate:\n",
@@ -27,16 +30,7 @@
2730
"execution_count": 1,
2831
"id": "b619bd11",
2932
"metadata": {},
30-
"outputs": [
31-
{
32-
"name": "stderr",
33-
"output_type": "stream",
34-
"text": [
35-
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\scipy\\__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.25.1\n",
36-
" warnings.warn(f\"A NumPy version >={np_minversion} and <{np_maxversion}\"\n"
37-
]
38-
}
39-
],
33+
"outputs": [],
4034
"source": [
4135
"import numpy as np\n",
4236
"import matplotlib.pyplot as plt\n",
@@ -700,8 +694,8 @@
700694
"Dep. Variable: MedHouseVal R-squared: 0.550\n",
701695
"Model: OLS Adj. R-squared: 0.550\n",
702696
"Method: Least Squares F-statistic: 2942.\n",
703-
"Date: Wed, 02 Aug 2023 Prob (F-statistic): 0.00\n",
704-
"Time: 11:26:14 Log-Likelihood: -16796.\n",
697+
"Date: Tue, 14 Nov 2023 Prob (F-statistic): 0.00\n",
698+
"Time: 10:45:43 Log-Likelihood: -16796.\n",
705699
"No. Observations: 14448 AIC: 3.361e+04\n",
706700
"Df Residuals: 14441 BIC: 3.366e+04\n",
707701
"Df Model: 6 \n",

Diff for: 04-linear-regression/2-feature-importance-sklearn.ipynb

+6-12
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@
77
"source": [
88
"# Linear regression coefficients\n",
99
"\n",
10+
"\n",
11+
"[Machine Learning Interpretability course](https://www.trainindata.com/p/machine-learning-interpretability)\n",
12+
"\n",
1013
"In this notebook we will understand the information provided by the coefficients of the linear regression model and calculate its variability."
1114
]
1215
},
@@ -15,16 +18,7 @@
1518
"execution_count": 1,
1619
"id": "0da331f9",
1720
"metadata": {},
18-
"outputs": [
19-
{
20-
"name": "stderr",
21-
"output_type": "stream",
22-
"text": [
23-
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\scipy\\__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.25.1\n",
24-
" warnings.warn(f\"A NumPy version >={np_minversion} and <{np_maxversion}\"\n"
25-
]
26-
}
27-
],
21+
"outputs": [],
2822
"source": [
2923
"import numpy as np\n",
3024
"import matplotlib.pyplot as plt\n",
@@ -556,8 +550,8 @@
556550
{
557551
"data": {
558552
"text/plain": [
559-
"{'fit_time': array([0.00599432, 0.00399733, 0.00898552, 0.00499678, 0.00399804]),\n",
560-
" 'score_time': array([0.002002 , 0.00201988, 0.00301433, 0.00201917, 0.00401211]),\n",
553+
"{'fit_time': array([0.0089972 , 0.00499773, 0.00349975, 0.0099988 , 0.00399756]),\n",
554+
" 'score_time': array([0.00600815, 0.00161481, 0.002002 , 0.00299954, 0.0019989 ]),\n",
561555
" 'estimator': [LinearRegression(),\n",
562556
" LinearRegression(),\n",
563557
" LinearRegression(),\n",

Diff for: 04-linear-regression/3-feature-importance-statsmodels.ipynb

+5-12
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77
"source": [
88
"# Linear regression coefficients\n",
99
"\n",
10+
"[Machine Learning Interpretability course](https://www.trainindata.com/p/machine-learning-interpretability)\n",
11+
"\n",
1012
"In this notebook we will evaluate the coefficients of the linear regression models provided by statsmodels."
1113
]
1214
},
@@ -15,16 +17,7 @@
1517
"execution_count": 1,
1618
"id": "90e85181",
1719
"metadata": {},
18-
"outputs": [
19-
{
20-
"name": "stderr",
21-
"output_type": "stream",
22-
"text": [
23-
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\scipy\\__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.25.1\n",
24-
" warnings.warn(f\"A NumPy version >={np_minversion} and <{np_maxversion}\"\n"
25-
]
26-
}
27-
],
20+
"outputs": [],
2821
"source": [
2922
"import numpy as np\n",
3023
"import pandas as pd\n",
@@ -106,8 +99,8 @@
10699
"Dep. Variable: MedHouseVal R-squared: 0.550\n",
107100
"Model: OLS Adj. R-squared: 0.550\n",
108101
"Method: Least Squares F-statistic: 2942.\n",
109-
"Date: Tue, 01 Aug 2023 Prob (F-statistic): 0.00\n",
110-
"Time: 16:08:42 Log-Likelihood: -16796.\n",
102+
"Date: Tue, 14 Nov 2023 Prob (F-statistic): 0.00\n",
103+
"Time: 10:47:31 Log-Likelihood: -16796.\n",
111104
"No. Observations: 14448 AIC: 3.361e+04\n",
112105
"Df Residuals: 14441 BIC: 3.366e+04\n",
113106
"Df Model: 6 \n",

Diff for: 04-linear-regression/4-local-interpretability-sklearn.ipynb

+3-10
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77
"source": [
88
"# Local interpretability\n",
99
"\n",
10+
"[Machine Learning Interpretability course](https://www.trainindata.com/p/machine-learning-interpretability)\n",
11+
"\n",
1012
"In this notebook, we will evaluate the contribution of each feature towards the target value, for a single observation.\n",
1113
"\n",
1214
"Based on [Christoph Molnar's book](https://christophm.github.io/interpretable-ml-book/limo.html#effect-plot)"
@@ -17,16 +19,7 @@
1719
"execution_count": 1,
1820
"id": "0da331f9",
1921
"metadata": {},
20-
"outputs": [
21-
{
22-
"name": "stderr",
23-
"output_type": "stream",
24-
"text": [
25-
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\scipy\\__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.25.1\n",
26-
" warnings.warn(f\"A NumPy version >={np_minversion} and <{np_maxversion}\"\n"
27-
]
28-
}
29-
],
22+
"outputs": [],
3023
"source": [
3124
"import matplotlib.pyplot as plt\n",
3225
"import pandas as pd\n",

Diff for: 04-linear-regression/5-local-interpretability-statsmodels.ipynb

+5-12
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77
"source": [
88
"# Local interpretability\n",
99
"\n",
10+
"[Machine Learning Interpretability course](https://www.trainindata.com/p/machine-learning-interpretability)\n",
11+
"\n",
1012
"In this notebook, we will evaluate the contribution of each feature towards the target value, for a single observation.\n",
1113
"\n",
1214
"Based on [Christoph Molnar's book](https://christophm.github.io/interpretable-ml-book/limo.html#effect-plot)"
@@ -17,16 +19,7 @@
1719
"execution_count": 1,
1820
"id": "90e85181",
1921
"metadata": {},
20-
"outputs": [
21-
{
22-
"name": "stderr",
23-
"output_type": "stream",
24-
"text": [
25-
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\scipy\\__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.25.1\n",
26-
" warnings.warn(f\"A NumPy version >={np_minversion} and <{np_maxversion}\"\n"
27-
]
28-
}
29-
],
22+
"outputs": [],
3023
"source": [
3124
"import pandas as pd\n",
3225
"import matplotlib.pyplot as plt\n",
@@ -104,8 +97,8 @@
10497
"Dep. Variable: MedHouseVal R-squared: 0.523\n",
10598
"Model: OLS Adj. R-squared: 0.523\n",
10699
"Method: Least Squares F-statistic: 3168.\n",
107-
"Date: Tue, 01 Aug 2023 Prob (F-statistic): 0.00\n",
108-
"Time: 14:17:04 Log-Likelihood: -17216.\n",
100+
"Date: Tue, 14 Nov 2023 Prob (F-statistic): 0.00\n",
101+
"Time: 10:48:52 Log-Likelihood: -17216.\n",
109102
"No. Observations: 14448 AIC: 3.444e+04\n",
110103
"Df Residuals: 14442 BIC: 3.449e+04\n",
111104
"Df Model: 5 \n",

Diff for: 04-linear-regression/6-multicolinearity.ipynb

+56-17
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77
"source": [
88
"# Multicolinearity\n",
99
"\n",
10+
"[Machine Learning Interpretability course](https://www.trainindata.com/p/machine-learning-interpretability)\n",
11+
"\n",
1012
"In this notebook we will check the impact of colinearity on the model coefficients."
1113
]
1214
},
@@ -15,16 +17,7 @@
1517
"execution_count": 1,
1618
"id": "90e85181",
1719
"metadata": {},
18-
"outputs": [
19-
{
20-
"name": "stderr",
21-
"output_type": "stream",
22-
"text": [
23-
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\scipy\\__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.25.1\n",
24-
" warnings.warn(f\"A NumPy version >={np_minversion} and <{np_maxversion}\"\n"
25-
]
26-
}
27-
],
20+
"outputs": [],
2821
"source": [
2922
"import pandas as pd\n",
3023
"import matplotlib.pyplot as plt\n",
@@ -161,8 +154,8 @@
161154
"Dep. Variable: MedHouseVal R-squared: 0.550\n",
162155
"Model: OLS Adj. R-squared: 0.550\n",
163156
"Method: Least Squares F-statistic: 2942.\n",
164-
"Date: Tue, 01 Aug 2023 Prob (F-statistic): 0.00\n",
165-
"Time: 18:06:41 Log-Likelihood: -16796.\n",
157+
"Date: Tue, 14 Nov 2023 Prob (F-statistic): 0.00\n",
158+
"Time: 10:49:32 Log-Likelihood: -16796.\n",
166159
"No. Observations: 14448 AIC: 3.361e+04\n",
167160
"Df Residuals: 14441 BIC: 3.366e+04\n",
168161
"Df Model: 6 \n",
@@ -396,8 +389,8 @@
396389
"Dep. Variable: MedHouseVal R-squared: 0.523\n",
397390
"Model: OLS Adj. R-squared: 0.523\n",
398391
"Method: Least Squares F-statistic: 3168.\n",
399-
"Date: Tue, 01 Aug 2023 Prob (F-statistic): 0.00\n",
400-
"Time: 18:06:42 Log-Likelihood: -17216.\n",
392+
"Date: Tue, 14 Nov 2023 Prob (F-statistic): 0.00\n",
393+
"Time: 10:49:33 Log-Likelihood: -17216.\n",
401394
"No. Observations: 14448 AIC: 3.444e+04\n",
402395
"Df Residuals: 14442 BIC: 3.449e+04\n",
403396
"Df Model: 5 \n",
@@ -693,8 +686,8 @@
693686
"Dep. Variable: MedHouseVal R-squared: 0.521\n",
694687
"Model: OLS Adj. R-squared: 0.521\n",
695688
"Method: Least Squares F-statistic: 5246.\n",
696-
"Date: Tue, 01 Aug 2023 Prob (F-statistic): 0.00\n",
697-
"Time: 18:06:43 Log-Likelihood: -17241.\n",
689+
"Date: Tue, 14 Nov 2023 Prob (F-statistic): 0.00\n",
690+
"Time: 10:49:35 Log-Likelihood: -17241.\n",
698691
"No. Observations: 14448 AIC: 3.449e+04\n",
699692
"Df Residuals: 14444 BIC: 3.452e+04\n",
700693
"Df Model: 3 \n",
@@ -814,6 +807,18 @@
814807
"id": "6ee1081e",
815808
"metadata": {},
816809
"outputs": [
810+
{
811+
"name": "stderr",
812+
"output_type": "stream",
813+
"text": [
814+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead\n",
815+
" if pd.api.types.is_categorical_dtype(vector):\n",
816+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead\n",
817+
" if pd.api.types.is_categorical_dtype(vector):\n",
818+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead\n",
819+
" if pd.api.types.is_categorical_dtype(vector):\n"
820+
]
821+
},
817822
{
818823
"data": {
819824
"text/plain": [
@@ -853,10 +858,44 @@
853858
"id": "fc8b100f",
854859
"metadata": {},
855860
"outputs": [
861+
{
862+
"name": "stderr",
863+
"output_type": "stream",
864+
"text": [
865+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead\n",
866+
" if pd.api.types.is_categorical_dtype(vector):\n",
867+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead\n",
868+
" if pd.api.types.is_categorical_dtype(vector):\n",
869+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead\n",
870+
" if pd.api.types.is_categorical_dtype(vector):\n",
871+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead\n",
872+
" if pd.api.types.is_categorical_dtype(vector):\n",
873+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead\n",
874+
" if pd.api.types.is_categorical_dtype(vector):\n",
875+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead\n",
876+
" if pd.api.types.is_categorical_dtype(vector):\n",
877+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead\n",
878+
" if pd.api.types.is_categorical_dtype(vector):\n",
879+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead\n",
880+
" if pd.api.types.is_categorical_dtype(vector):\n",
881+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.\n",
882+
" with pd.option_context('mode.use_inf_as_na', True):\n",
883+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1057: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n",
884+
" grouped_data = data.groupby(\n",
885+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead\n",
886+
" if pd.api.types.is_categorical_dtype(vector):\n",
887+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead\n",
888+
" if pd.api.types.is_categorical_dtype(vector):\n",
889+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.\n",
890+
" with pd.option_context('mode.use_inf_as_na', True):\n",
891+
"C:\\Users\\Sole\\Documents\\Repositories\\envs\\fsml\\lib\\site-packages\\seaborn\\_oldcore.py:1057: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n",
892+
" grouped_data = data.groupby(\n"
893+
]
894+
},
856895
{
857896
"data": {
858897
"text/plain": [
859-
"<seaborn.axisgrid.JointGrid at 0x1341f5be9e0>"
898+
"<seaborn.axisgrid.JointGrid at 0x29750e466b0>"
860899
]
861900
},
862901
"execution_count": 21,

0 commit comments

Comments
 (0)