Minor bug in MVBench style evaluation when the dataset size isn't divisible by the number of GPUs #35

geomlyd · 2024-12-18T10:56:01Z

Hello there,

It appears that, due to IterableDatasetShard padding the last "distributed batch" with an appropriate number of samples to make it divisible by the number of GPUs being used, whenever the actual dataset size is not divisible by the number of GPUs one ends up with a few samples that are repeated in the final .json output and the accuracy computation. Since these are not many, the accuracy is not significantly affected, but this can definitely create problems for result analysis further down the line for code that asserts the correctness of dataset sizes (which is how I found this bug). I propose a rather simple fix, all regarding the code of eval_mvbench.py:

Change __iter__ of EvalDataset to:

def __iter__(self):
 return iter(enumerate(self.data))

Change line 212 to for sample_idx, line in tqdm(shard_dataset):
Change line 357 to

output.append(
  (sample_idx, {
    "question": line["question"],
    "prompt": 
    "answer": answer,
    "pred": pred_idx,
    "task_type": task_type,
    "answer_id": str(ans_id),
    "model_id": model_name,
    "video_name": video_name,
    "metadata": {},
     })
)

Before line 384, do:

deduped_output = []  
seen_indices = set()  
for sample_idx, sample_output in all_output:
  if(sample_idx in seen_indices):
    continue
  deduped_output.append(sample_output)
  seen_indices.add(sample_idx)
assert (len(deduped_output) == len(dataset))

, and after that point always operate on deduped_output instead (i.e. for dumping to the json and computing the accuracy)

The text was updated successfully, but these errors were encountered:

fahadskhan · 2024-12-20T21:38:24Z

@geomlyd May I ask if you are able you reproduce the paper numbers of any model on MVBench?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor bug in MVBench style evaluation when the dataset size isn't divisible by the number of GPUs #35

Minor bug in MVBench style evaluation when the dataset size isn't divisible by the number of GPUs #35

geomlyd commented Dec 18, 2024 •

edited

Loading

fahadskhan commented Dec 20, 2024

Minor bug in MVBench style evaluation when the dataset size isn't divisible by the number of GPUs #35

Minor bug in MVBench style evaluation when the dataset size isn't divisible by the number of GPUs #35

Comments

geomlyd commented Dec 18, 2024 • edited Loading

fahadskhan commented Dec 20, 2024

geomlyd commented Dec 18, 2024 •

edited

Loading