After dataset generation, the output file has total of 32 tf-record files in train and 8 in val, which is expected.
But in reality, the number of shards is only 25 for some reason and some shards are missing (as below). What could be the cause?
"splits": [
{
"filepathTemplate": "{DATASET}-{SPLIT}.{FILEFORMAT}-{SHARD_X_OF_Y}",
"name": "train",
"numBytes": "3124634980",
"shardLengths": [
"49",
"65",
"51",
"71",
"70",
"54",
"67",
"66",
"61",
"63",
"56",
"53",
"60",
"61",
"71",
"62",
"68",
"71",
"58",
"68",
"64",
"71",
"71",
"70",
"58"
]
},
{
"filepathTemplate": "{DATASET}-{SPLIT}.{FILEFORMAT}-{SHARD_X_OF_Y}",
"name": "val",
"numBytes": "758231159",
"shardLengths": [
"72",
"67",
"58",
"72",
"82",
"68",
"70",
"61"
]
}
After dataset generation, the output file has total of 32 tf-record files in train and 8 in val, which is expected.
But in reality, the number of shards is only 25 for some reason and some shards are missing (as below). What could be the cause?
dataset folder: (missing some shards)

from dataset_info.json