BDD Dataset Benchmark

This repository contains the code used for benchmarking the BDD dataset. We evaluate dataset access performance using direct file system access with torchcodec, alongside h5py, PyArrow, and NVIDIA DALI.

Data Preprocessing Steps

Weather and Time Filtering:
We select only videos recorded during daytime and under clear weather conditions.
Frame Extraction:
From each selected video, every 5th frame is extracted.
Data Shuffling:
The extracted frames are shuffled randomly.
Sliding Window Approach:
A sliding window is used to return 4 frames per iteration, ensuring no frame repetition within a single epoch.

After the preprocessing steps, the dataset size is reduced from 100k videos to around 2k videos, resulting in a total of 396,839 frames.

Libraries Used

File System Access with TorchCodec
torchcodec GitHub
HDF5 File Handling with h5py
h5py Documentation
Data Loading with NVIDIA DALI
DALI PyTorch Example
Columnar Data Format with PyArrow
PyArrow Documentation

Benchmarking

The benchmarking is conducted using 4 GPUs for parallel processing.

Setup

Clone the Repository and set up the environment, run:

bash env/setup.sh

Results

You can find the benchmarking results in this link

Note that the results for PyArrow are not included, as loading the entire dataset took several hours.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
env		env
README.md		README.md
bdd_arrow_loading.py		bdd_arrow_loading.py
bdd_dali_loading.py		bdd_dali_loading.py
bdd_file_access_loading.py		bdd_file_access_loading.py
bdd_h5_loading.py		bdd_h5_loading.py
run_bdd_dali_loading.sh		run_bdd_dali_loading.sh
run_bdd_file_access_loading.sh		run_bdd_file_access_loading.sh
run_bdd_h5_loading.sh		run_bdd_h5_loading.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BDD Dataset Benchmark

Data Preprocessing Steps

Libraries Used

Benchmarking

Setup

Results

About

Uh oh!

Releases

Packages

Languages

nxtAIM/BDD-data-loading-benchmark

Folders and files

Latest commit

History

Repository files navigation

BDD Dataset Benchmark

Data Preprocessing Steps

Libraries Used

Benchmarking

Setup

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages