Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running classification task throws errors #2

Open
ArturDev42 opened this issue Feb 3, 2025 · 0 comments
Open

Running classification task throws errors #2

ArturDev42 opened this issue Feb 3, 2025 · 0 comments

Comments

@ArturDev42
Copy link

Thanks for this cool paper! I was trying to run the classification task using python -m classification.classification_main on the datasets TelcoCustomerChurn.csv and covid_data_pre_processed.csv.

In case of TelcoCustomerChurn.csv, I see that the polluted versions of train and test have been successfully created:

[DEBUG] Polluted version of train_TelcoCustomerChurn.csv with parameter has was already persisted at data/polluted/ConsistentRepresentationPolluter/42/train_TelcoCustomerChurn_c216c4f4bf04699525f3591e76182994.csv.
[DEBUG] Polluted version of test_TelcoCustomerChurn.csv with parameter has was already persisted at data/polluted/ConsistentRepresentationPolluter/42/test_TelcoCustomerChurn_c216c4f4bf04699525f3591e76182994.csv.

However, I am getting the following error afterwards:

025-02-03 14:28:13,852 [INFO ] Starting experiment <class 'classification.experiments.GradientBoostingClassifierExperiment'> for scenario train_clean_test_clean and dataset TelcoCustomerChurn.csv with ConsistentRepresentationPolluter
  0%|                                                                                                                                                                                                                      | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):                                                                                                                                                                                                               
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main                                                                                                                                                                          
    return _run_code(code, main_globals, None,                                                                                                                                                                                                   
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code                                                                                                                                                                                     
    exec(code, run_globals)
  File "/home/tommaso/repos/DQ4AI/classification/classification_main.py", line 149, in <module>
    main()
  File "/home/tommaso/repos/DQ4AI/classification/classification_main.py", line 128, in main
    results = exp.run()
  File "/home/tommaso/repos/DQ4AI/classification/experiments.py", line 398, in run
    self.model.fit(X_train, y_train)
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/ensemble/_gb.py", line 416, in fit
    X, y = self._validate_data(
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/base.py", line 622, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1146, in check_X_y
    X = check_array(
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/utils/validation.py", line 915, in check_array
    array = _asarray_with_order(array, order=order, dtype=dtype, xp=xp)
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/utils/_array_api.py", line 380, in _asarray_with_order
    array = numpy.asarray(array, order=order, dtype=dtype)
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/pandas/core/generic.py", line 2070, in __array__
    return np.asarray(self._values, dtype=dtype)
ValueError: could not convert string to float: '7181-BQYBV'

In case of covid_data_pre_processed.csv I receive the following error:

  0%|                                                                                                                                                                                                                      | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'DIED'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/tommaso/repos/DQ4AI/classification/classification_main.py", line 149, in <module>
    main()
  File "/home/tommaso/repos/DQ4AI/classification/classification_main.py", line 56, in main
    stratify=df[metadata[ds_name]['target']])
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/pandas/core/frame.py", line 3807, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
    raise KeyError(key) from err
KeyError: 'DIED'

Am I doing anything wrong? Any help would be appreciated, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant