Resource-aware Federated Foundation Models (RaFFM): Bridging the Gap Between Foundation Models and Heterogeneous Federated Learning
This is the official implementation for the paper:
Bridging the Gap Between Foundation Models and Heterogeneous Federated Learning
- [10/30/2023] Scalable ViT Checkpoints released for heterogeneous resource edge-clients
- [11/02/2023] ViT-base CIFAR-100 checkpoints released, trained on large-budget edge-FL settings with 100 clients.
- [11/05/2023] Resource-aware FMs with adapter published in branch adapter
- [11/07/2023] High-level API for real edge-FL
- [11/10/2023] Experiments - Comparison with baselines
- [11/12/2023] Experiments - Post-federated learning model deployment
- [11/12/2023] Scripts for GLUE benchmark
- [11/16/2023] Experiments - Effectiveness of Salient Parameter Prioritization
- [12/04/2023] 💥 APIs for Segment Anything (SAM) released
- [12/10/2023] Scalable FM checkpoints and demo for BERT on SST-2
First, create a conda environment, then install pytorch.
conda create -n raffm python=3.10
conda activate raffm
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
Next, install RaFFM package
cd RaFFM
pip install .
RaFFM enables resource-aware foundation model deployments in edge-FL based on client local resources. That means RaFFM can dynamically scale down the size of FMs to heterogeneous resource local clients and enables efficient and fair local resource utilization.
We provide resource-aware FMs checkpoints trained via FL. You can download here:
-
ViT-base CIFAR-100 large-budget [Trained on Large-budget system heterogeneity edge-FL with 100 clients]
-
ViT-base CIFAR-10 large-budget [Trained on Large-budget system heterogeneity edge-FL with 100 clients]
-
ViT-base CIFAR-10 small-budget [Trained on Small-budget system heterogeneity edge-FL with 100 clients]
-
BERT-base SST-2 small-budget [Trained on Small-budget system heterogeneity edge-FL with 100 clients]
from transformers import ViTImageProcessor, ViTForImageClassification, TrainingArguments, Trainer
from raffm import RaFFM
# Generate resource-aware models for cifar10
ckpt_path = "ckpt_folder_path" # the downloaded and unzipped ckpt folder path
model = ViTForImageClassification.from_pretrained(
ckpt_path,
num_labels=10,
ignore_mismatched_sizes=True,
)
raffm_model = RaFFM(model.to("cpu"))
print("Original FM number of parameters:",raffm_model.total_params)
#Random sample a scaled FM
submodel, params, config = raffm_model.random_resource_aware_model()
print("subnetwork params",params)
Detailed instructions for generating resource-aware FMs can be found in Demo
RaFFM can scale down a given FM based on edge resource constraints, hence, enabling resource-aware federated learning.
Here we provide scripts to reproduce the experimental results we reported in the paper.
Train ViT on 100 clients' edge-FL settings with 10% participation rate each communication round, simply run scripts:
python fl_vit.py --method raffm --spp --model vit --save_dir log/vit --dataset cifar10 --num_clients 100 --lr 3e-5
To check the results, you can:
- Check the output information from the terminal console
- Use tensorboard:
tensorboard --logdir log/vit
[Note]: More APIs and scripts will post, please check the Updates.
The above scripts are simulated on a central device for reproducibility of RaFFM, if you want to deploy RaFFM on edge-FL:
please see EDGE-FL.md for detailed training instructions.
anonymous
- ViT pre-trained ckpts
- ViT FL simulation scripts
- Tensorboard logger
- Elastic space APIs for system-heteo
- Load ckpt high-level APIs
- Simulation scripts on GLUE
- ViT CIFAR-100 ckpts
- High level API for real edge-FL
- API for segment anything (SAM)
- Evaluate Scripts for resource-aware models
- BERT-large, FLAN-T5 ckpts
- Simulation scripts on SQUAD
- ONNX and TensorRT APIs for edge
- Tiny fedlib
If you find our work is helpful, please kindly support our efforts by citing our paper:
under review
The experiments of this work is sponsored by [anonymous institution] and [anonymous institution].