Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect feature history #1

Open
natss opened this issue Jul 13, 2022 · 0 comments
Open

Incorrect feature history #1

natss opened this issue Jul 13, 2022 · 0 comments

Comments

@natss
Copy link

natss commented Jul 13, 2022

All feature history from Selector (Metrics, Correlation and L1) are rewriting by last model refit.

Example:

from sklearn.datasets import load_breast_cancer
from autowoe import AutoWoE

X, y = load_breast_cancer(return_X_y=True, as_frame=True)

test_aw = AutoWoE(n_jobs=1, debug=True)
test_aw.fit(pd.concat([X, y], axis=1), 'target')

If we look at test_aw.feature_history we see a lot of 'Pruned during regression refit' reason and nothing is about selectors. But we know exactly that selector's reason exists. How did I check this:

history = {k: None if v == 'Pruned during regression refit' else v for k, v in test_aw.feature_history.items()}

selector = Selector(
    interpreted_model=test_aw.params["interpreted_model"],
    task=test_aw.params["task"],
    train=test_aw.train_df,
    target=test_aw.target,
    features_type=test_aw.private_features_type,
    n_jobs=test_aw.params["n_jobs"],
    cv_split=test_aw._cv_split,
    features_mark_values=None,
)

best_features, _sel_result = selector(
    history,
    pearson_th=test_aw.params["pearson_th"],
    metric_th=test_aw.params["metric_th"],
    vif_th=test_aw.params["vif_th"],
    l1_grid_size=test_aw.params["l1_grid_size"],
    l1_exp_scale=test_aw.params['l1_exp_scale'],
    metric_tol=test_aw.params["metric_tol"],
)

And if we look at history now we can see multiply different drop reason

My suggestion is to change third argument for last feature_changing() in AutoWoE.fit() from self._private_features_type to best_features, because now feature_changing() thinks that all input features for selectors is features_before for last refit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant