-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Technical issues in Tox21MolNet:
Issue 1 : Missing group Key
I've encountered an issue with the setup_processed method when working with the Tox21MolNet and its data (tox21.csv file). It appears that the file does not include a header or key named "group", which is causing a KeyError in the line:
groups = np.array([d["group"] for d in data])Additionally, the _load_data_from_file method does not seem to utilize the any Reader to create or handle a "group" key in the data. As a result, the group key does not exist in the dictionaries produced by _load_data_from_file, leading to the observed error.
The _load_data_from_file method only yields three keys: features, labels, and ident:
yield dict(features=smiles, labels=labels, ident=row["mol_id"])Issue 2: Generator Issue with train_test_split
Another issue arises from the use of a generator in the _load_data_from_file method. The generator object cannot be directly passed to train_test_split, as it expects a collection (e.g., a list or array). This causes the following error:
TypeError: Singleton array array(<generator object Tox21MolNet._load_data_from_file at 0x000001FD068AB1B0>,
dtype=object) cannot be considered a valid collection.
Solution: To fix this, the generator output should be converted to a list before using it for splitting:
data = list(self._load_data_from_file(os.path.join(self.raw_dir, f"tox21.csv")))Tests
- Tox21MolNet:
- Write unit tests for
setup_processed()with mock data.- Check if output format is correct (the collator) expects a dict with
features,labels,identkeys, features have to be>> able to be converted to a tensor
- Check if output format is correct (the collator) expects a dict with
- Write unit tests for
_load_data_from_file()using mock file operations.
- Write unit tests for