number of entries in vggsound.csv do not match the test and train split files #19

carandraug · 2024-03-27T15:41:28Z

The file vggsound.csv file lists 199467 entries. That number does not match the sum of the test and train files. See

$ wc -l data/train.csv data/test.csv 
 183730 data/train.csv
  15446 data/test.csv
 199176 total
$ wc -l data/vggsound.csv 
199467 data/vggsound.csv

The vggsound.csv file have an extra 291 entries. The extra entries are in both the train and test split:

$ python3 -c 'import csv; [print(x[3]) for x in csv.reader(open("data/vggsound.csv"))]' | sort | uniq -c
  15496 test
 183971 train

I happen to have a copy of the file vggsound.csv as downloaded from the VGG website and these numbers matched.

The text was updated successfully, but these errors were encountered:

ppx-hub · 2024-05-02T10:40:33Z

The file vggsound.csv file lists 199467 entries. That number does not match the sum of the test and train files. See
$ wc -l data/train.csv data/test.csv 
 183730 data/train.csv
  15446 data/test.csv
 199176 total
$ wc -l data/vggsound.csv 
199467 data/vggsound.csv
The vggsound.csv file have an extra 291 entries. The extra entries are in both the train and test split:
$ python3 -c 'import csv; [print(x[3]) for x in csv.reader(open("data/vggsound.csv"))]' | sort | uniq -c
  15496 test
 183971 train
I happen to have a copy of the file vggsound.csv as downloaded from the VGG website and these numbers matched.

I checked the full video compression package provided by the author in here and the total number of videos after decompression is 199,176, which is consistent with the number in the training and test files. I think vggsound.csv does have an extra 291 video files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

number of entries in vggsound.csv do not match the test and train split files #19

number of entries in vggsound.csv do not match the test and train split files #19

carandraug commented Mar 27, 2024

ppx-hub commented May 2, 2024

number of entries in vggsound.csv do not match the test and train split files #19

number of entries in vggsound.csv do not match the test and train split files #19

Comments

carandraug commented Mar 27, 2024

ppx-hub commented May 2, 2024