Skip to content

Latest commit

 

History

History
29 lines (16 loc) · 1.39 KB

File metadata and controls

29 lines (16 loc) · 1.39 KB

Enhancing CNN Training on CIFAR-10 Through MPI Parallelization

This repository contains the source code for training a Convolutional Neural Network (CNN) using different parallelization strategies, Project_Report synthetizes the experiments led during this project. Below is a brief overview of the key components:

Models

This file contains the implementation of the CNN architecture used in the training process.

Training Scripts

This script implements the training of the model without any parallelization. It serves as a baseline for performance comparison with parallelized approaches.

In this script, the training is performed with only model replication over processes, without data parallelism. It's designed to showcase the impact of replicating the model across multiple processes.

The core script that implements the data parallelism approach. It includes time measurement functionalities and a fault tolerance simulation. This approach distributes the computational workload across multiple processes, aiming to improve training efficiency.

Usage

mpiexec -n {number of process} python data_parallelism_train.py --nb-proc {number of process}