From 25db95c3aac6309bf6904869948da2d389c4b058 Mon Sep 17 00:00:00 2001 From: Patrick Chong <20418855+patrickxchong@users.noreply.github.com> Date: Fri, 9 Sep 2022 15:46:22 +0800 Subject: [PATCH 1/3] Rename RUS typo to ENN Paragraph is referring to EditedNearestNeighbours --- Chapter_6_ImbalancedLearning/Resampling.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Chapter_6_ImbalancedLearning/Resampling.ipynb b/Chapter_6_ImbalancedLearning/Resampling.ipynb index 68dadd8..89623aa 100644 --- a/Chapter_6_ImbalancedLearning/Resampling.ipynb +++ b/Chapter_6_ImbalancedLearning/Resampling.ipynb @@ -986,7 +986,7 @@ "\n", "The number of neighbors $k$ is by default set to $k=3$. It is worth noting that, contrary to RUS, the number of majority class samples that are removed depends on the degree of overlap between the two classes. The method does not allow to specify an imbalanced ratio. \n", "\n", - "The `imblearn` sampler for RUS is [`imblearn.under_sampling.EditedNearestNeighbours`](https://imbalanced-learn.org/stable/references/generated/imblearn.under_sampling.EditedNearestNeighbours.html). Let us illustrate its use and its impact on the classifier decision boundary and the classification performances. \n" + "The `imblearn` sampler for ENN is [`imblearn.under_sampling.EditedNearestNeighbours`](https://imbalanced-learn.org/stable/references/generated/imblearn.under_sampling.EditedNearestNeighbours.html). Let us illustrate its use and its impact on the classifier decision boundary and the classification performances. \n" ] }, { From 84c781f8031623ef8102b6808fc10d6dc770cb95 Mon Sep 17 00:00:00 2001 From: Patrick Chong <20418855+patrickxchong@users.noreply.github.com> Date: Fri, 9 Sep 2022 15:47:25 +0800 Subject: [PATCH 2/3] Fix previously typo --- Chapter_6_ImbalancedLearning/Resampling.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Chapter_6_ImbalancedLearning/Resampling.ipynb b/Chapter_6_ImbalancedLearning/Resampling.ipynb index 89623aa..33c7f70 100644 --- a/Chapter_6_ImbalancedLearning/Resampling.ipynb +++ b/Chapter_6_ImbalancedLearning/Resampling.ipynb @@ -1071,7 +1071,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "On this dataset, the performances of ENN are poor compared to the previsouly tested techniques. The balanced accuracy was slightly improved compared to the baseline classifier. The performance in terms of AP is however lower than the baseline, and the AUC ROC is the worst of all tested tecniques (and on par with ROS). " + "On this dataset, the performances of ENN are poor compared to the previously tested techniques. The balanced accuracy was slightly improved compared to the baseline classifier. The performance in terms of AP is however lower than the baseline, and the AUC ROC is the worst of all tested tecniques (and on par with ROS). " ] }, { From 081cc400adba92ec5d9a30bc6815c146fc7f916b Mon Sep 17 00:00:00 2001 From: Patrick Chong <20418855+patrickxchong@users.noreply.github.com> Date: Fri, 9 Sep 2022 15:50:11 +0800 Subject: [PATCH 3/3] Fix citation typo --- Chapter_6_ImbalancedLearning/Resampling.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Chapter_6_ImbalancedLearning/Resampling.ipynb b/Chapter_6_ImbalancedLearning/Resampling.ipynb index 33c7f70..da6fb68 100644 --- a/Chapter_6_ImbalancedLearning/Resampling.ipynb +++ b/Chapter_6_ImbalancedLearning/Resampling.ipynb @@ -1196,7 +1196,7 @@ "source": [ "### Combining over and undersampling\n", "\n", - "Oversampling and undersampling are often complementary. On the one hand, oversampling techniques allow to generate synthetic samples from the minority class, and help a classifier in identifying more precisely the decision boundary between the two classes. On the other hand, undersampling techniques reduce the size of the training set, and allow to speed-up the classifier training time. Combining over and undersampling techniques has often been reported to successfully improve the classifier performances (Chapter 5, Section 6 in {cite}fernandez2018learning).\n", + "Oversampling and undersampling are often complementary. On the one hand, oversampling techniques allow to generate synthetic samples from the minority class, and help a classifier in identifying more precisely the decision boundary between the two classes. On the other hand, undersampling techniques reduce the size of the training set, and allow to speed-up the classifier training time. Combining over and undersampling techniques has often been reported to successfully improve the classifier performances (Chapter 5, Section 6 in {cite}`fernandez2018learning`).\n", "\n", "In terms of implementation, the combination of samplers is obtained by chaining the samplers in a `pipeline`. The samplers can then be chained to a classifer. We illustrate below the chaining of an SMOTE oversampling to a random undersampling to a decision tree classifier. \n", "\n",