Skip to content

Commit

Permalink
Initialize develop
Browse files Browse the repository at this point in the history
  • Loading branch information
github-classroom[bot] committed Apr 28, 2021
0 parents commit 95a71aa
Show file tree
Hide file tree
Showing 61 changed files with 3,998 additions and 0 deletions.
9 changes: 9 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
venv
default_models
data_loaders
data/cifar-10-batches-py
data/cifar-100-python.tar.gz
data/FashionMNIST
data/cifar-100-python
data/cifar-10-python.tar.gz
simple_example
142 changes: 142 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/


venv
venv-*
default_models
data
data_loaders
simple_example
output
docker_data
.idea
*.tmp.txt
docker-compose.yml
41 changes: 41 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Base image to start with
FROM ubuntu:20.04

# Who maintains this DockerFile
MAINTAINER Bart Cox <[email protected]>

# Run build without interactive dialogue
ARG DEBIAN_FRONTEND=noninteractive

ENV GLOO_SOCKET_IFNAME=eth0
ENV TP_SOCKET_IFNAME=eth0

# Define the working directory of the current Docker container
WORKDIR /opt/federation-lab

# Update the Ubuntu software repository
RUN apt-get update \
&& apt-get install -y vim curl python3 python3-pip net-tools iproute2

# Copy the current folder to the working directory
COPY setup.py ./

# Install all required packages for the generator
RUN pip3 setup.py install

#RUN mkdir -p ./data/MNIST
#COPY ./data/MNIST ../data/MNIST
ADD fltk ./fedsim
#RUN ls -la
COPY federated_learning.py ./
COPY custom_mnist.py ./
#RUN ls -la ./fedsim

# Expose the container's port to the host OS
EXPOSE 5000

# Run command by default for the executing container
# CMD ["python3", "/opt/Generatrix/rpc_parameter_server.py", "--world_size=2", "--rank=0", "--master_addr=192.168.144.2"]

#CMD python3 /opt/federation-lab/rpc_parameter_server.py --world_size=$WORLD_SIZE --rank=$RANK --master_addr=10.5.0.11
CMD python3 /opt/federation-lab/federated_learning.py $RANK $WORLD_SIZE 10.5.0.11
25 changes: 25 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
BSD 2-Clause License

Copyright (c) 2021, Bart Cox
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
138 changes: 138 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# FLTK - Federation Learning Toolkit
[![License](https://img.shields.io/badge/license-BSD-blue.svg)](LICENSE)
[![Python 3.6](https://img.shields.io/badge/python-3.7-blue.svg)](https://www.python.org/downloads/release/python-370/)
[![Python 3.6](https://img.shields.io/badge/python-3.8-blue.svg)](https://www.python.org/downloads/release/python-380/)

This toolkit is can be used to run Federated Learning experiments.
Pytorch Distributed ([docs](https://pytorch.org/tutorials/beginner/dist_overview.html)) is used in this project.
The goal if this project is to launch Federated Learning nodes in truly distribution fashion.

This project is tested with Ubuntu 20.04 and python {3.7, 3.8}.
### Global idea
Pytorch distributed works based on a world_size and ranks. The ranks should be between 0 and world_size-1.
Generally, the federator has rank 0 and the clients have ranks between 1 and world_size-1.

General protocol:

1. Client selection by the federator
2. The selected clients download the model.
2. Local training on the clients for X number of epochs
3. Weights/gradients of the trained model are send to the federator
4. Federator aggregates the weights/gradients to create a new and improved model
5. Updated model is shared to the clients
6. Repeat step 1 to 5 until convergence

Important notes:

* Data between clients is not shared to each other
* The data is non-IID
* Hardware can heterogeneous
* The location of devices matters (network latency and bandwidth)
* Communication can be costly

## Project structure
Structure with important folders and files explained:
```
project
├── configs
│ └── experiment.yaml # Example of an experiment configuration
├── deploy # Templates for automatic deployment
│ └── templates
│ ├── client_stub_default.yml
│ ├── client_stub_medium.yml
│ ├── client_stub_slow.yml
│ └── system_stub.yml # Describes the federator and the network
├── fltk # Source code
│ ├── datasets # Different dataset definitions
│ │ ├── data_distribution # Datasets with distributed sampler
│ │ └── distributed # "regular" datasets for centralized use
│ ├── nets # Available networks
│ ├── schedulers # Learning Rate Schedulers
│ ├── strategy # Client selection and model aggregation algorithms
│ └── util
│ └── generate_docker_compose.py # Generates a docker-compose.yml for a containerized run
├── Dockerfile # Dockerfile to run in containers
├── LICENSE
├── README.md
└── setup.py
```

## Models

* Cifar10-CNN
* Cifar10-ResNet
* Cifar100-ResNet
* Cifar100-VGG
* Fashion-MNIST-CNN
* Fashion-MNIST-ResNet
* Reddit-LSTM

## Datasets

* Cifar10
* Cifar100
* Fashion-MNIST

## Prerequisites

When running in docker containers the following dependencies need to be installed:

* Docker
* Docker-compose

## Install
```bash
python3 setup.py install
```

## Examples
<details><summary>Show Examples</summary>

<p>

### Single machine (Native)

#### Launch single client
Launch Federator
```bash
python3 -m fltk single configs/experiment.yaml --rank=0
```
Launch Client
```bash
python3 -m fltk single configs/experiment.yaml --rank=1
```

#### Spawn FL system
```bash
python3 -m fltk spawn configs/experiment.yaml
```

### Two machines (Native)
To start a cross-machine FL system you have to configure the network interface connected to your network.
For example, if your machine is connected to the network via the wifi interface (for example with the name `wlo1`) this has to be configured as shown below:
```bash
os.environ['GLOO_SOCKET_IFNAME'] = 'wlo1'
os.environ['TP_SOCKET_IFNAME'] = 'wlo1'
```
Use `ifconfig` to find the name of the interface name on your machine.

### Docker Compose
1. Make sure docker and docker-compose are installed.
2. Generate a `docker-compose.yml` file for your experiment. You can use the script `generate_docker_compose.py` for this.
From the root folder: ```python3 fltk/util/generate_docker_compose.py 4``` to generate a system with 4 clients.
Feel free to change/extend `generate_docker_compose.py` for your own need.
A `docker-compose.yml` file is created in the root folder.
3. Run docker-compose to start the system:
```bash
docker-compose up
```
### Google Cloud Platform
TBD

</p>
</details>

## Known issues

* Currently, there is no GPU support docker containers (or docker compose)
* First epoch only can be slow (6x - 8x slower)
19 changes: 19 additions & 0 deletions configs/experiment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
# Experiment configuration
total_epochs: 5
epochs_per_cycle: 1
wait_for_clients: true
net: Cifar10CNN
dataset: cifar10
# Use cuda is available; setting to false will force CPU
cuda: true
experiment_prefix: 'experiment_sample'
output_location: 'output'
tensor_board_active: true
clients_per_round: 1
system:
federator:
hostname: '131.180.40.72'
nic: 'wlo1'
clients:
amount: 1
Loading

0 comments on commit 95a71aa

Please sign in to comment.