Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

number of entries in vggsound.csv do not match the test and train split files #19

Open
carandraug opened this issue Mar 27, 2024 · 1 comment

Comments

@carandraug
Copy link

The file vggsound.csv file lists 199467 entries. That number does not match the sum of the test and train files. See

$ wc -l data/train.csv data/test.csv 
 183730 data/train.csv
  15446 data/test.csv
 199176 total
$ wc -l data/vggsound.csv 
199467 data/vggsound.csv

The vggsound.csv file have an extra 291 entries. The extra entries are in both the train and test split:

$ python3 -c 'import csv; [print(x[3]) for x in csv.reader(open("data/vggsound.csv"))]' | sort | uniq -c
  15496 test
 183971 train

I happen to have a copy of the file vggsound.csv as downloaded from the VGG website and these numbers matched.

@ppx-hub
Copy link

ppx-hub commented May 2, 2024

The file vggsound.csv file lists 199467 entries. That number does not match the sum of the test and train files. See

$ wc -l data/train.csv data/test.csv 
 183730 data/train.csv
  15446 data/test.csv
 199176 total
$ wc -l data/vggsound.csv 
199467 data/vggsound.csv

The vggsound.csv file have an extra 291 entries. The extra entries are in both the train and test split:

$ python3 -c 'import csv; [print(x[3]) for x in csv.reader(open("data/vggsound.csv"))]' | sort | uniq -c
  15496 test
 183971 train

I happen to have a copy of the file vggsound.csv as downloaded from the VGG website and these numbers matched.

I checked the full video compression package provided by the author in here and the total number of videos after decompression is 199,176, which is consistent with the number in the training and test files. I think vggsound.csv does have an extra 291 video files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants