This repository contains a step-by-step guide and code to run your first GPU job on an HPC (High Performance Computing) cluster using Python. The demo uses the MNIST dataset and a simple edge-detection convolution to help beginners get familiar with HPC, GPU acceleration, SLURM job scheduling, and file transfers.
High Performance Computing (HPC) clusters can seem intimidating at first, with GPUs, job schedulers, and command-line interfaces. This project provides a beginner-friendly tutorial for running a Python script on a GPU node. You'll learn how to:
- Login to your HPC cluster
- Set up your environment and Python virtual environment
- Transfer scripts to the HPC using SFTP
- Run Python code with GPU acceleration
- Submit and manage jobs with SLURM
- Retrieve and verify results locally
The demo uses a single MNIST image for edge detection as a minimal example to ensure everything works before scaling up.
- Access to an HPC cluster with GPU nodes
- Python 3.10+
- PyTorch, torchvision, Pillow, matplotlib
- SLURM or equivalent job scheduler
Optional:
- Familiarity with SSH and command-line navigation
- Clone the repository:
git clone <repo-url>
cd my_hpc_projectpython3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install torch torchvision pillow matplotlibRun the Python script locally first:
python HPC_sample_code.py- This will:
- Download a single MNIST digit
- Apply a GPU-based edge-detection convolution
- Save
sample_digit.pngandoutput_edge_digit.png
Use SFTP to upload files to your HPC cluster:
sftp username@hpc_address
cd my_hpc_project
put -r /path/to/local/project/*
bye- HPC_sample_code.py – Python script to download MNIST, apply edge-detection on GPU, and save images.
- submit.sh – SLURM job script to allocate GPU, CPU, and memory resources for running the Python demo.
ssh username@hpc_address-~/ – Home directory
/scratch/$USER– High-speed temporary storage
module avail
module load python/3.10
module load cuda/11.8
module listsbatch submit.shsqueue -u $USER
tail -f HPC_output_<JOB_ID>.logsftp username@hpc_address
get sample_digit.png
get output_edge_digit.png
byeAfter running the demo, you will have:
- sample_digit.png – original MNIST digit
- output_edge_digit.png – GPU edge-detected output Compare the two images to verify your GPU job ran successfully.
This project is licensed under the MIT License – see the LICENSE file for details.