Skip to content

Latest commit

 

History

History
14 lines (9 loc) · 1.03 KB

File metadata and controls

14 lines (9 loc) · 1.03 KB

Get Started Training Llama 2, Mixtral 8x7B, and Mistral Mathstral with PyTorch FSDP in 5 Minutes

This content provides a quickstart with multinode PyTorch FSDP training on Slurm and Kubernetes. It is designed to be simple with no data preparation or tokenizer to download, and uses Python virtual environment.

Prerequisites

To run FSDP training, you will need to create a training cluster based on Slurm or Kubermetes with an Amazon FSx for Lustre You can find instruction how to create a Amazon SageMaker Hyperpod cluster with Slurm, Kubernetes or with in Amazon EKS.

FSDP Training

This fold provides examples on how to train with PyTorch FSDP with Slurm or Kubernetes. You will find instructions for Slurm or Kubernetes in the subdirectories.