forked from tudelft-eemcs-dml/fltk-testbed-group-3
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 95a71aa
Showing
61 changed files
with
3,998 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
venv | ||
default_models | ||
data_loaders | ||
data/cifar-10-batches-py | ||
data/cifar-100-python.tar.gz | ||
data/FashionMNIST | ||
data/cifar-100-python | ||
data/cifar-10-python.tar.gz | ||
simple_example |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,142 @@ | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
pip-wheel-metadata/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
*.py,cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
db.sqlite3-journal | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow | ||
__pypackages__/ | ||
|
||
# Celery stuff | ||
celerybeat-schedule | ||
celerybeat.pid | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
|
||
venv | ||
venv-* | ||
default_models | ||
data | ||
data_loaders | ||
simple_example | ||
output | ||
docker_data | ||
.idea | ||
*.tmp.txt | ||
docker-compose.yml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# Base image to start with | ||
FROM ubuntu:20.04 | ||
|
||
# Who maintains this DockerFile | ||
MAINTAINER Bart Cox <[email protected]> | ||
|
||
# Run build without interactive dialogue | ||
ARG DEBIAN_FRONTEND=noninteractive | ||
|
||
ENV GLOO_SOCKET_IFNAME=eth0 | ||
ENV TP_SOCKET_IFNAME=eth0 | ||
|
||
# Define the working directory of the current Docker container | ||
WORKDIR /opt/federation-lab | ||
|
||
# Update the Ubuntu software repository | ||
RUN apt-get update \ | ||
&& apt-get install -y vim curl python3 python3-pip net-tools iproute2 | ||
|
||
# Copy the current folder to the working directory | ||
COPY setup.py ./ | ||
|
||
# Install all required packages for the generator | ||
RUN pip3 setup.py install | ||
|
||
#RUN mkdir -p ./data/MNIST | ||
#COPY ./data/MNIST ../data/MNIST | ||
ADD fltk ./fedsim | ||
#RUN ls -la | ||
COPY federated_learning.py ./ | ||
COPY custom_mnist.py ./ | ||
#RUN ls -la ./fedsim | ||
|
||
# Expose the container's port to the host OS | ||
EXPOSE 5000 | ||
|
||
# Run command by default for the executing container | ||
# CMD ["python3", "/opt/Generatrix/rpc_parameter_server.py", "--world_size=2", "--rank=0", "--master_addr=192.168.144.2"] | ||
|
||
#CMD python3 /opt/federation-lab/rpc_parameter_server.py --world_size=$WORLD_SIZE --rank=$RANK --master_addr=10.5.0.11 | ||
CMD python3 /opt/federation-lab/federated_learning.py $RANK $WORLD_SIZE 10.5.0.11 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
BSD 2-Clause License | ||
|
||
Copyright (c) 2021, Bart Cox | ||
All rights reserved. | ||
|
||
Redistribution and use in source and binary forms, with or without | ||
modification, are permitted provided that the following conditions are met: | ||
|
||
1. Redistributions of source code must retain the above copyright notice, this | ||
list of conditions and the following disclaimer. | ||
|
||
2. Redistributions in binary form must reproduce the above copyright notice, | ||
this list of conditions and the following disclaimer in the documentation | ||
and/or other materials provided with the distribution. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | ||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | ||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | ||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | ||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | ||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | ||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | ||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,138 @@ | ||
# FLTK - Federation Learning Toolkit | ||
[](LICENSE) | ||
[](https://www.python.org/downloads/release/python-370/) | ||
[](https://www.python.org/downloads/release/python-380/) | ||
|
||
This toolkit is can be used to run Federated Learning experiments. | ||
Pytorch Distributed ([docs](https://pytorch.org/tutorials/beginner/dist_overview.html)) is used in this project. | ||
The goal if this project is to launch Federated Learning nodes in truly distribution fashion. | ||
|
||
This project is tested with Ubuntu 20.04 and python {3.7, 3.8}. | ||
### Global idea | ||
Pytorch distributed works based on a world_size and ranks. The ranks should be between 0 and world_size-1. | ||
Generally, the federator has rank 0 and the clients have ranks between 1 and world_size-1. | ||
|
||
General protocol: | ||
|
||
1. Client selection by the federator | ||
2. The selected clients download the model. | ||
2. Local training on the clients for X number of epochs | ||
3. Weights/gradients of the trained model are send to the federator | ||
4. Federator aggregates the weights/gradients to create a new and improved model | ||
5. Updated model is shared to the clients | ||
6. Repeat step 1 to 5 until convergence | ||
|
||
Important notes: | ||
|
||
* Data between clients is not shared to each other | ||
* The data is non-IID | ||
* Hardware can heterogeneous | ||
* The location of devices matters (network latency and bandwidth) | ||
* Communication can be costly | ||
|
||
## Project structure | ||
Structure with important folders and files explained: | ||
``` | ||
project | ||
├── configs | ||
│ └── experiment.yaml # Example of an experiment configuration | ||
├── deploy # Templates for automatic deployment | ||
│ └── templates | ||
│ ├── client_stub_default.yml | ||
│ ├── client_stub_medium.yml | ||
│ ├── client_stub_slow.yml | ||
│ └── system_stub.yml # Describes the federator and the network | ||
├── fltk # Source code | ||
│ ├── datasets # Different dataset definitions | ||
│ │ ├── data_distribution # Datasets with distributed sampler | ||
│ │ └── distributed # "regular" datasets for centralized use | ||
│ ├── nets # Available networks | ||
│ ├── schedulers # Learning Rate Schedulers | ||
│ ├── strategy # Client selection and model aggregation algorithms | ||
│ └── util | ||
│ └── generate_docker_compose.py # Generates a docker-compose.yml for a containerized run | ||
├── Dockerfile # Dockerfile to run in containers | ||
├── LICENSE | ||
├── README.md | ||
└── setup.py | ||
``` | ||
|
||
## Models | ||
|
||
* Cifar10-CNN | ||
* Cifar10-ResNet | ||
* Cifar100-ResNet | ||
* Cifar100-VGG | ||
* Fashion-MNIST-CNN | ||
* Fashion-MNIST-ResNet | ||
* Reddit-LSTM | ||
|
||
## Datasets | ||
|
||
* Cifar10 | ||
* Cifar100 | ||
* Fashion-MNIST | ||
|
||
## Prerequisites | ||
|
||
When running in docker containers the following dependencies need to be installed: | ||
|
||
* Docker | ||
* Docker-compose | ||
|
||
## Install | ||
```bash | ||
python3 setup.py install | ||
``` | ||
|
||
## Examples | ||
<details><summary>Show Examples</summary> | ||
|
||
<p> | ||
|
||
### Single machine (Native) | ||
|
||
#### Launch single client | ||
Launch Federator | ||
```bash | ||
python3 -m fltk single configs/experiment.yaml --rank=0 | ||
``` | ||
Launch Client | ||
```bash | ||
python3 -m fltk single configs/experiment.yaml --rank=1 | ||
``` | ||
|
||
#### Spawn FL system | ||
```bash | ||
python3 -m fltk spawn configs/experiment.yaml | ||
``` | ||
|
||
### Two machines (Native) | ||
To start a cross-machine FL system you have to configure the network interface connected to your network. | ||
For example, if your machine is connected to the network via the wifi interface (for example with the name `wlo1`) this has to be configured as shown below: | ||
```bash | ||
os.environ['GLOO_SOCKET_IFNAME'] = 'wlo1' | ||
os.environ['TP_SOCKET_IFNAME'] = 'wlo1' | ||
``` | ||
Use `ifconfig` to find the name of the interface name on your machine. | ||
|
||
### Docker Compose | ||
1. Make sure docker and docker-compose are installed. | ||
2. Generate a `docker-compose.yml` file for your experiment. You can use the script `generate_docker_compose.py` for this. | ||
From the root folder: ```python3 fltk/util/generate_docker_compose.py 4``` to generate a system with 4 clients. | ||
Feel free to change/extend `generate_docker_compose.py` for your own need. | ||
A `docker-compose.yml` file is created in the root folder. | ||
3. Run docker-compose to start the system: | ||
```bash | ||
docker-compose up | ||
``` | ||
### Google Cloud Platform | ||
TBD | ||
|
||
</p> | ||
</details> | ||
|
||
## Known issues | ||
|
||
* Currently, there is no GPU support docker containers (or docker compose) | ||
* First epoch only can be slow (6x - 8x slower) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
--- | ||
# Experiment configuration | ||
total_epochs: 5 | ||
epochs_per_cycle: 1 | ||
wait_for_clients: true | ||
net: Cifar10CNN | ||
dataset: cifar10 | ||
# Use cuda is available; setting to false will force CPU | ||
cuda: true | ||
experiment_prefix: 'experiment_sample' | ||
output_location: 'output' | ||
tensor_board_active: true | ||
clients_per_round: 1 | ||
system: | ||
federator: | ||
hostname: '131.180.40.72' | ||
nic: 'wlo1' | ||
clients: | ||
amount: 1 |
Oops, something went wrong.