Skip to content

basaanithanaveenkumar/object-detection-BBD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Object Detection on BDD100K Dataset

Dataset Download

This project focuses on object detection using the Berkeley DeepDrive (BDD100K) dataset, featuring analysis of class distributions, occlusion patterns, and model development.

🛠️ Setup

Open In Colab

# clone 
git clone https://github.com/basaanithanaveenkumar/object-detection-BBD.git
cd object-detection-BBD

Docker Build and run

sudo docker build -t object-detection .
#run the docker
sudo docker run -it --rm --gpus all object-detection /bin/bash

Dataset Preparation

# Download and extract dataset
mkdir -p data
python scripts/download_dataset.py

Organize directory structure

mv data/100k/val data/100k/valid

Project Overview

Installation

Install the required dependencies using:

pip install -r requirements.txt

Class-wise Statistics

To get class-wise statistics, run the following script:

python scripts/get_stats_classwise.py

scene-wise Statistics

To get scene-wise statistics, run the following script:

python scripts/get_stats_scenewise.py

Class distribution Statistics

To get Class distribution, run the following script:

python scripts/get_stats_dist.py

training distribution

Train dataset distribution

validation distribution

validation dataset distribution

Training dataset Analysis

Train dataset Analusis

Use Case Specific Classes (10 Classes)

Class Count
Car 714,121
Traffic Sign 239,961
Traffic Light 186,301
Person 91,435
Truck 30,012
Bus 11,688
Bike 7,227
Rider 4,522
Motor 3,002
Train 136

All Classes Distribution

Class Count
Car 714,121
Lane/Single White 247,108
Traffic Sign 239,961
Traffic Light 186,301
Lane/Road Curb 109,868
Lane/Crosswalk 108,284
Person 91,435
Area/Drivable 64,050
Area/Alternative 61,799
Lane/Double Yellow 37,519
Truck 30,012
Lane/Single Yellow 20,220
Bus 11,688
Bike 7,227
Lane/Double White 5,674
Rider 4,522
Motor 3,002
Lane/Single Other 249
Train 136

Occlusion Analysis

An analysis was performed on all classes with respect to the occluded attribute. Results show that nearly half of the objects are occluded.

Occlusion Statistics

Occlusion Status Count
False 678,981
True 609,424

⚠️ Observation: A significant portion of the dataset contains occluded objects, which may impact detection accuracy and should be considered during model training and evaluation.

Additional training Dataset Analysis

Statistics on BBOX size and aspect ratio

To get BBOX distribution statstics, run the following script:

python scripts/get_additional_analysis.py

Bounding Box Size Distribution

Size Category Percentage Interpretation
Small (<2% area) 92.2% Dominates dataset - consider higher resolution or small object detection techniques
Medium (2-10%) 6.3% Underrepresented - may need augmentation
Large (>10%) 1.5% Very rare - ensure model can handle context

Recommendations:

  • Implement multi-scale training
  • Use Feature Pyramid Networks (FPN)

Bounding Box Aspect Ratios

Aspect Ratio Percentage Interpretation
Tall (<0.5) 12.3% Vertical objects present - adjust anchor boxes
**Square & rectangle ** (0.5-2) 76.9% Majority class - standard anchors should work well
Wide (≥2) 10.8% Horizontal objects - may need custom anchors

Scene-wise Statistics

To generate scene-wise statistics, run the following script:

python scripts/get_stats_sceneiwise.py

convert the dataset into microsoft coco

To train the model using RF-DETR, it need to be converted into coco:

python scripts/convert_to_coco.py

script to visualize the GT based on ID (COCO based)

To visualize the bboxes, run the following script:

# please modify your script according to your needs
python scripts/visualize_image.py 

sample

Training using RF-DETR

To train the model using RF-DETR, run the following script:

python scripts/train_rfdetr.py

Visualizations & Key Concepts

1. Vision Transformer (ViT) Adaptations

Modified Architecture

ViT

Modified ViT
Description: Custom ViT architecture with hierarchical features through global and windowed attention

Patch Projection

ViT Projection

2. DINO (Self-Supervised Learning Framework)

DINO
Description: Architecture of the DINO self-supervised learning framework, which uses knowledge distillation with a teacher-student network to learn robust visual representations without labeled data.


3. Deformable Convolution

Standard vs. Deformable Comparison

Deformable Convolution
Description: Comparison between standard convolution (fixed grid) and deformable convolution (adaptive sampling locations). Enhances CNNs for irregular object shapes.

Deformable Convolution in Action

Deformable Conv Visualization
Description: Visualization of deformable convolution offsets dynamically adjusting to object geometry.


5. Deformable DETR (Object Detection)

Deformable-DETR
Description: Combines deformable convolutions with Transformer attention for efficient object detection, reducing training complexity of vanilla DETR.


5. GIoU (Generalized Intersection over Union)

GIoU
Description: Improves bounding box regression by accounting for both overlap and enclosure, addressing limitations of standard IoU in non-overlapping cases.


Model Evaluation Results

Below are the key evaluation metrics and visualizations from from RF-DETR model:

1. Confusion Matrix

Confusion Matrix
Description: The confusion matrix shows the model's classification performance across different classes. High diagonal values indicate correct predictions, while off-diagonal values represent misclassifications.

2. Detection Grid

Detection Grid

3. Predictions vs. Annotations

Detections vs. Annotations
Description: Side-by-side overlay of predicted bounding boxes (red) and ground truth (green).

4. mAP Scores

mAP Scores
Description: Mean Average Precision (mAP) across IoU thresholds. Higher mAP (close to 1.0) indicates better detection accuracy.

Notes

Model training
sampling techniques to balance dataset
   --- as the dataset is imbalanced
tweak the focal loss
data aggumentation
   -- 
Hyper parameter tunning
change the backbone
bapatite matching based architecture

Key metrics

Mean Average Precision (mAP) - Standard for COCO/PASCAL VOC

Precision-Recall Curves - Trade-off analysis

F1 Score - Balanced precision/recall

Inference Speed (FPS) - Critical for real-time applications (self-driving)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors