Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor bug in MVBench style evaluation when the dataset size isn't divisible by the number of GPUs #35

Open
geomlyd opened this issue Dec 18, 2024 · 1 comment

Comments

@geomlyd
Copy link

geomlyd commented Dec 18, 2024

Hello there,

It appears that, due to IterableDatasetShard padding the last "distributed batch" with an appropriate number of samples to make it divisible by the number of GPUs being used, whenever the actual dataset size is not divisible by the number of GPUs one ends up with a few samples that are repeated in the final .json output and the accuracy computation. Since these are not many, the accuracy is not significantly affected, but this can definitely create problems for result analysis further down the line for code that asserts the correctness of dataset sizes (which is how I found this bug). I propose a rather simple fix, all regarding the code of eval_mvbench.py:

  1. Change __iter__ of EvalDataset to:
def __iter__(self):
 return iter(enumerate(self.data))
  1. Change line 212 to for sample_idx, line in tqdm(shard_dataset):
  2. Change line 357 to
output.append(
  (sample_idx, {
    "question": line["question"],
    "prompt": 
    "answer": answer,
    "pred": pred_idx,
    "task_type": task_type,
    "answer_id": str(ans_id),
    "model_id": model_name,
    "video_name": video_name,
    "metadata": {},
     })
)
  1. Before line 384, do:
deduped_output = []  
seen_indices = set()  
for sample_idx, sample_output in all_output:
  if(sample_idx in seen_indices):
    continue
  deduped_output.append(sample_output)
  seen_indices.add(sample_idx)
assert (len(deduped_output) == len(dataset))

, and after that point always operate on deduped_output instead (i.e. for dumping to the json and computing the accuracy)

@fahadskhan
Copy link

@geomlyd May I ask if you are able you reproduce the paper numbers of any model on MVBench?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants