Skip to content

nxtAIM/BDD-data-loading-benchmark

Repository files navigation

BDD Dataset Benchmark

This repository contains the code used for benchmarking the BDD dataset. We evaluate dataset access performance using direct file system access with torchcodec, alongside h5py, PyArrow, and NVIDIA DALI.

Data Preprocessing Steps

  1. Weather and Time Filtering:
    We select only videos recorded during daytime and under clear weather conditions.

  2. Frame Extraction:
    From each selected video, every 5th frame is extracted.

  3. Data Shuffling:
    The extracted frames are shuffled randomly.

  4. Sliding Window Approach:
    A sliding window is used to return 4 frames per iteration, ensuring no frame repetition within a single epoch.

After the preprocessing steps, the dataset size is reduced from 100k videos to around 2k videos, resulting in a total of 396,839 frames.

Libraries Used

Benchmarking

The benchmarking is conducted using 4 GPUs for parallel processing.

Setup

Clone the Repository and set up the environment, run:

bash env/setup.sh

Results

You can find the benchmarking results in this link

Note that the results for PyArrow are not included, as loading the entire dataset took several hours.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published