Potential issues in Classification I

Two more potential issues noted by Hossein Saiedian from KU. The first is because we don't want to get into the splitting procedure in Classification I, but it could probably be fixed with a minor edit given some careful thought. The second I'll need to investigate further, this may just be updated software versions or something.
 
1.  In Section 5.3, the explanation of splitting data into training and test sets is great at setting expectations. However, in Section 5.6, the KNN classifier is trained on `cancer_train` (the full, filtered cancer dataset) without explicitly showing or mentioning a split. For beginners, it might be a bit confusing since they might wonder when the split occurred. Perhaps a quick note mentioning that the split is skipped here for simplicity could make things clearer.
2.  In Section 5.6, it's mentioned that `set_config(transform_output="pandas")` ensures scikit-learn outputs are pandas DataFrames. However, the output of `knn.predict(new_obs)` remains a NumPy array, not a DataFrame. This might be misleading, since `set_config` only affects transformer outputs, not predictions from estimators. I clarified with my students that this setting applies to preprocessing steps, not `.predict()`.I've attached my classroom slides with some notes to help illustrate these points.

As usual we should sync with the R version if we change anything here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential issues in Classification I #361

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential issues in Classification I #361

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions