You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for this cool paper! I was trying to run the classification task using python -m classification.classification_main on the datasets TelcoCustomerChurn.csv and covid_data_pre_processed.csv.
In case of TelcoCustomerChurn.csv, I see that the polluted versions of train and test have been successfully created:
[DEBUG] Polluted version of train_TelcoCustomerChurn.csv with parameter has was already persisted at data/polluted/ConsistentRepresentationPolluter/42/train_TelcoCustomerChurn_c216c4f4bf04699525f3591e76182994.csv.
[DEBUG] Polluted version of test_TelcoCustomerChurn.csv with parameter has was already persisted at data/polluted/ConsistentRepresentationPolluter/42/test_TelcoCustomerChurn_c216c4f4bf04699525f3591e76182994.csv.
However, I am getting the following error afterwards:
025-02-03 14:28:13,852 [INFO ] Starting experiment <class 'classification.experiments.GradientBoostingClassifierExperiment'> for scenario train_clean_test_clean and dataset TelcoCustomerChurn.csv with ConsistentRepresentationPolluter
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/tommaso/repos/DQ4AI/classification/classification_main.py", line 149, in <module>
main()
File "/home/tommaso/repos/DQ4AI/classification/classification_main.py", line 128, in main
results = exp.run()
File "/home/tommaso/repos/DQ4AI/classification/experiments.py", line 398, in run
self.model.fit(X_train, y_train)
File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/base.py", line 1152, in wrapper
return fit_method(estimator, *args, **kwargs)
File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/ensemble/_gb.py", line 416, in fit
X, y = self._validate_data(
File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/base.py", line 622, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1146, in check_X_y
X = check_array(
File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/utils/validation.py", line 915, in check_array
array = _asarray_with_order(array, order=order, dtype=dtype, xp=xp)
File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/sklearn/utils/_array_api.py", line 380, in _asarray_with_order
array = numpy.asarray(array, order=order, dtype=dtype)
File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/pandas/core/generic.py", line 2070, in __array__
return np.asarray(self._values, dtype=dtype)
ValueError: could not convert string to float: '7181-BQYBV'
In case of covid_data_pre_processed.csv I receive the following error:
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'DIED'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/tommaso/repos/DQ4AI/classification/classification_main.py", line 149, in <module>
main()
File "/home/tommaso/repos/DQ4AI/classification/classification_main.py", line 56, in main
stratify=df[metadata[ds_name]['target']])
File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/pandas/core/frame.py", line 3807, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/tommaso/repos/DQ4AI/env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: 'DIED'
Am I doing anything wrong? Any help would be appreciated, thanks!
The text was updated successfully, but these errors were encountered:
Thanks for this cool paper! I was trying to run the classification task using
python -m classification.classification_main
on the datasetsTelcoCustomerChurn.csv
andcovid_data_pre_processed.csv
.In case of
TelcoCustomerChurn.csv
, I see that the polluted versions of train and test have been successfully created:However, I am getting the following error afterwards:
In case of
covid_data_pre_processed.csv
I receive the following error:Am I doing anything wrong? Any help would be appreciated, thanks!
The text was updated successfully, but these errors were encountered: