YOLOv1 from Scratch in PyTorch

This repository contains a complete YOLOv1 (You Only Look Once) object detection model implemented from scratch using PyTorch. It follows the original YOLOv1 paper and is trained on a subset of the PASCAL VOC 2005 dataset.

Sample Predictions (Trained for 50 Epochs on ResNet50)

Abstract

Object detection is a cornerstone of computer vision, with YOLO standing out for its real-time performance. In this project, I implement YOLOv1 from scratch in PyTorch to understand its core mechanics and architecture. The model uses ResNet50 as a backbone and incorporates a custom YOLO loss function, processing the PASCAL VOC 2005 dataset for training and evaluation.

Requirements

Install dependencies with:

pip install -r requirements.txt

Dataset

PASCAL VOC 2005 Dataset: Download here
Supported Classes:
- bicycle
- car
- motorbike
- person

Model Architecture

Base Network

ResNet50 pretrained on ImageNet is used as the feature extractor.
Fully connected layers are removed.
Parameters are frozen during training to leverage transfer learning.

YOLO Head

Custom convolutional layers are added on top of the ResNet backbone.
Final output is reshaped to predict:
- Bounding box coordinates
- Confidence score
- Class probabilities per grid cell, in a format compatible with YOLOv1.

Data Processing Pipeline

Images resized to 224×224.
Annotations are parsed from .txt files using a custom preprocess_txt function.
Output labels are encoded into YOLOv1’s S×S grid format using a generate_output function.
Normalization and augmentation applied using torchvision.transforms.

Loss Function

A custom loss function is implemented following the YOLOv1 paper, combining four components:

Localization Loss: Penalizes bounding box prediction errors.
Confidence Loss (Object): Penalizes confidence score errors when an object is present.
Confidence Loss (No Object): Penalizes false positives.
Classification Loss: Ensures correct class prediction.

Training Details

Backbone: ResNet50 (pretrained)
Epochs: 50
Batch Size: 32
Optimizer: Adam
Final Loss: 6.9583
Average Inference Time: < 300ms/image
Hardware Used: NVIDIA RTX 4060 GPU

Evaluation

Metric Used: IoU (Intersection over Union)

$$ IoU = \frac{\text{Area of Overlap}}{\text{Area of Union}} $$

Results

Overall IoU: 58%
Best performance: Motorbike, Bicycle
Worst performance: Car (due to limited samples in dataset)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
YOLOv1		YOLOv1
predictions_yolov1		predictions_yolov1
README.md		README.md
YOLO.png		YOLO.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YOLOv1 from Scratch in PyTorch

Sample Predictions (Trained for 50 Epochs on ResNet50)

Abstract

Requirements

Dataset

Model Architecture

Base Network

YOLO Head

Data Processing Pipeline

Loss Function

Training Details

Evaluation

Metric Used: IoU (Intersection over Union)

Results

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Saran416/YOLO

Folders and files

Latest commit

History

Repository files navigation

YOLOv1 from Scratch in PyTorch

Sample Predictions (Trained for 50 Epochs on ResNet50)

Abstract

Requirements

Dataset

Model Architecture

Base Network

YOLO Head

Data Processing Pipeline

Loss Function

Training Details

Evaluation

Metric Used: IoU (Intersection over Union)

Results

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages