Skip to content
/ YOLO Public

PyTorch implementation of the YOLO object detection algorithm from scratch, trained on the PASCAL VOC dataset to detect and localize objects in images in real-time.

Notifications You must be signed in to change notification settings

Saran416/YOLO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

YOLOv1 from Scratch in PyTorch

This repository contains a complete YOLOv1 (You Only Look Once) object detection model implemented from scratch using PyTorch. It follows the original YOLOv1 paper and is trained on a subset of the PASCAL VOC 2005 dataset.

YOLO Model Architecture


Sample Predictions (Trained for 50 Epochs on ResNet50)

motorbike bicycle
person car

Predictions


Abstract

Object detection is a cornerstone of computer vision, with YOLO standing out for its real-time performance. In this project, I implement YOLOv1 from scratch in PyTorch to understand its core mechanics and architecture. The model uses ResNet50 as a backbone and incorporates a custom YOLO loss function, processing the PASCAL VOC 2005 dataset for training and evaluation.


Requirements

Install dependencies with:

pip install -r requirements.txt

Dataset

  • PASCAL VOC 2005 Dataset: Download here

  • Supported Classes:

    • bicycle
    • car
    • motorbike
    • person

Model Architecture

Base Network

  • ResNet50 pretrained on ImageNet is used as the feature extractor.
  • Fully connected layers are removed.
  • Parameters are frozen during training to leverage transfer learning.

YOLO Head

  • Custom convolutional layers are added on top of the ResNet backbone.

  • Final output is reshaped to predict:

    • Bounding box coordinates
    • Confidence score
    • Class probabilities per grid cell, in a format compatible with YOLOv1.

Data Processing Pipeline

  • Images resized to 224×224.
  • Annotations are parsed from .txt files using a custom preprocess_txt function.
  • Output labels are encoded into YOLOv1’s S×S grid format using a generate_output function.
  • Normalization and augmentation applied using torchvision.transforms.

Loss Function

A custom loss function is implemented following the YOLOv1 paper, combining four components:

  • Localization Loss: Penalizes bounding box prediction errors.
  • Confidence Loss (Object): Penalizes confidence score errors when an object is present.
  • Confidence Loss (No Object): Penalizes false positives.
  • Classification Loss: Ensures correct class prediction.

Training Details

  • Backbone: ResNet50 (pretrained)
  • Epochs: 50
  • Batch Size: 32
  • Optimizer: Adam
  • Final Loss: 6.9583
  • Average Inference Time: < 300ms/image
  • Hardware Used: NVIDIA RTX 4060 GPU

Evaluation

Metric Used: IoU (Intersection over Union)

$$ IoU = \frac{\text{Area of Overlap}}{\text{Area of Union}} $$

Results

  • Overall IoU: 58%
  • Best performance: Motorbike, Bicycle
  • Worst performance: Car (due to limited samples in dataset)

References


About

PyTorch implementation of the YOLO object detection algorithm from scratch, trained on the PASCAL VOC dataset to detect and localize objects in images in real-time.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published