Running classification task throws errors #2

ArturDev42 · 2025-02-03T14:37:42Z

Thanks for this cool paper! I was trying to run the classification task using python -m classification.classification_main on the datasets TelcoCustomerChurn.csv and covid_data_pre_processed.csv.

In case of TelcoCustomerChurn.csv, I see that the polluted versions of train and test have been successfully created:

[DEBUG] Polluted version of train_TelcoCustomerChurn.csv with parameter has was already persisted at data/polluted/ConsistentRepresentationPolluter/42/train_TelcoCustomerChurn_c216c4f4bf04699525f3591e76182994.csv.
[DEBUG] Polluted version of test_TelcoCustomerChurn.csv with parameter has was already persisted at data/polluted/ConsistentRepresentationPolluter/42/test_TelcoCustomerChurn_c216c4f4bf04699525f3591e76182994.csv.

However, I am getting the following error afterwards:

025-02-03 14:28:13,852 [INFO ] Starting experiment <class 'classification.experiments.GradientBoostingClassifierExperiment'> for scenario train_clean_test_clean and dataset TelcoCustomerChurn.csv with ConsistentRepresentationPolluter
  0%|                                                                                                                                                                                                                      | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):                                                                                                                                                                                                               
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main                                                                                                                                                                          
    return _run_code(code, main_globals, None,                                                                                                                                                                                                   
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code                                                                                                                                                                                     
    exec(code, run_globals)
  File "/home/tommaso/repos/DQ4AI/classification/classification_main.py", line 149, in <module>
    main()
  File "/home/tommaso/repos/DQ4AI/classification/classification_main.py", line 128, in main
    results = exp.run()
  File "/home/tommaso/repos/DQ4AI/classification/experiments.py", line 398, in run
    self.model.fit(X_train, y_train)
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/ensemble/_gb.py", line 416, in fit
    X, y = self._validate_data(
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/base.py", line 622, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1146, in check_X_y
    X = check_array(
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/utils/validation.py", line 915, in check_array
    array = _asarray_with_order(array, order=order, dtype=dtype, xp=xp)
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/utils/_array_api.py", line 380, in _asarray_with_order
    array = numpy.asarray(array, order=order, dtype=dtype)
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/pandas/core/generic.py", line 2070, in __array__
    return np.asarray(self._values, dtype=dtype)
ValueError: could not convert string to float: '7181-BQYBV'

In case of covid_data_pre_processed.csv I receive the following error:

  0%|                                                                                                                                                                                                                      | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'DIED'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/tommaso/repos/DQ4AI/classification/classification_main.py", line 149, in <module>
    main()
  File "/home/tommaso/repos/DQ4AI/classification/classification_main.py", line 56, in main
    stratify=df[metadata[ds_name]['target']])
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/pandas/core/frame.py", line 3807, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
    raise KeyError(key) from err
KeyError: 'DIED'

Am I doing anything wrong? Any help would be appreciated, thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running classification task throws errors #2

Running classification task throws errors #2

ArturDev42 commented Feb 3, 2025

Running classification task throws errors #2

Running classification task throws errors #2

Comments

ArturDev42 commented Feb 3, 2025