You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It appears that, due to IterableDatasetShard padding the last "distributed batch" with an appropriate number of samples to make it divisible by the number of GPUs being used, whenever the actual dataset size is not divisible by the number of GPUs one ends up with a few samples that are repeated in the final .json output and the accuracy computation. Since these are not many, the accuracy is not significantly affected, but this can definitely create problems for result analysis further down the line for code that asserts the correctness of dataset sizes (which is how I found this bug). I propose a rather simple fix, all regarding the code of eval_mvbench.py:
Hello there,
It appears that, due to
IterableDatasetShard
padding the last "distributed batch" with an appropriate number of samples to make it divisible by the number of GPUs being used, whenever the actual dataset size is not divisible by the number of GPUs one ends up with a few samples that are repeated in the final.json
output and the accuracy computation. Since these are not many, the accuracy is not significantly affected, but this can definitely create problems for result analysis further down the line for code that asserts the correctness of dataset sizes (which is how I found this bug). I propose a rather simple fix, all regarding the code ofeval_mvbench.py
:__iter__
ofEvalDataset
to:for sample_idx, line in tqdm(shard_dataset):
, and after that point always operate on
deduped_output
instead (i.e. for dumping to the json and computing the accuracy)The text was updated successfully, but these errors were encountered: