Skip to content

Commit b1deb65

Browse files
Md files linting fixes (#245)
1 parent 21c7317 commit b1deb65

15 files changed

+363
-213
lines changed

README.md

Lines changed: 13 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,22 +2,22 @@
22

33
Colearn is a library that enables privacy-preserving decentralized machine learning tasks on the [FET network](https://fetch.ai/).
44

5-
This blockchain-mediated collective learning system enables multiple stakeholders to build a shared
6-
machine learning model without needing to rely on a central authority.
7-
This library is currently in development.
5+
This blockchain-mediated collective learning system enables multiple stakeholders to build a shared
6+
machine learning model without needing to rely on a central authority.
7+
This library is currently in development.
88

99
The collective learning protocol allows learners to collaborate on training a model without requiring trust between the participants. Learners vote on updates to the model, and only updates which pass the quality threshold are accepted. This makes the system robust to attempts to interfere with the model by providing bad updates. For more details on the collective learning system see [here](https://fetchai.github.io/colearn/about/)
1010

11-
### Current Version
11+
## Current Version
1212

13-
We have released *v0.2.8* of the Colearn Machine Learning Interface, the first version of an interface that will
14-
allow developers to prepare for future releases.
13+
We have released *v0.2.8* of the Colearn Machine Learning Interface, the first version of an interface that will
14+
allow developers to prepare for future releases.
1515
Together with the interface we provide a simple backend for local experiments. This is the first backend with upcoming blockchain ledger based backends to follow.
1616
Future releases will use similar interfaces so that learners built with the current system will work on a different backend that integrates a distributed ledger and provides other improvements.
1717
The current framework will then be used mainly for model development and debugging.
1818
We invite all users to experiment with the framework, develop their own models, and provide feedback!
1919

20-
See the most up-to-date documentation at [fetchai.github.io/colearn](https://fetchai.github.io/colearn/)
20+
See the most up-to-date documentation at [fetchai.github.io/colearn](https://fetchai.github.io/colearn/)
2121
or the documentation for the latest release at [docs.fetch.ai/colearn](https://docs.fetch.ai/colearn/).
2222

2323
## Installation
@@ -27,9 +27,11 @@ Currently we only support macos and unix systems.
2727
To use the latest stable release we recommend installing the [package from PyPi](https://pypi.org/project/colearn/)
2828

2929
To install with support for Keras and Pytorch:
30+
3031
```bash
3132
pip install colearn[all]
3233
```
34+
3335
To install with just support for Keras or Pytorch:
3436

3537
```bash
@@ -40,22 +42,18 @@ To install with just support for Keras or Pytorch:
4042
## Running the examples
4143

4244
Examples are available in the colearn_examples module. To run the Mnist demo in Keras or Pytorch run:
45+
4346
```bash
4447
python -m colearn_examples.ml_interface.keras_mnist
4548
python -m colearn_examples.ml_interface.pytorch_mnist
4649
```
47-
- Or they can be accessed from colearn/colearn_examples by cloning the colearn repo
4850

49-
Please note that although all the examples are always available, which you can run will depend on your installation.
50-
If you installed only `colearn[keras]` or `colearn[pytorch]` then only their respective examples will work.
51+
- Or they can be accessed from colearn/colearn_examples by cloning the colearn repo
5152

53+
Please note that although all the examples are always available, which you can run will depend on your installation.
54+
If you installed only `colearn[keras]` or `colearn[pytorch]` then only their respective examples will work.
5255

5356
For more instructions see the documentation at [fetchai.github.io/colearn/installation](https://fetchai.github.io/colearn/installation/)
5457

5558
After installation we recommend [running a demo](https://fetchai.github.io/colearn/demo/)
5659
, or seeing [the examples](https://fetchai.github.io/colearn/examples/)
57-
58-
59-
60-
61-

docs/about.md

Lines changed: 32 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,17 @@
11
# How collective learning works
2-
A Colearn experiment begins when a group of entities, referred to as *learners*, decide on a model architecture and
3-
begin learning. Together they will train a single global model. The goal is to train a model that performs better
4-
than any of the learners can produce by training on their private data set.
2+
3+
A Colearn experiment begins when a group of entities, referred to as *learners*, decide on a model architecture and
4+
begin learning. Together they will train a single global model. The goal is to train a model that performs better
5+
than any of the learners can produce by training on their private data set.
56

67
### How Training Works
78

8-
Training occurs in rounds; during each round the learners attempt to improve the performance of the global shared
9-
model.
10-
To do so each round an **update** of the global model (for example new set of weights in a neural network) is proposed.
9+
Training occurs in rounds; during each round the learners attempt to improve the performance of the global shared
10+
model.
11+
To do so each round an **update** of the global model (for example new set of weights in a neural network) is proposed.
1112
The learners then **validate** the update and decide if the new model is better than the current global model.
12-
If enough learners *approve* the update then the global model is updated. After an update is approved or rejected a
13-
new round begins.
13+
If enough learners *approve* the update then the global model is updated. After an update is approved or rejected a
14+
new round begins.
1415

1516
The detailed steps of a round updating a global model *M* are as follows:
1617

@@ -19,33 +20,33 @@ The detailed steps of a round updating a global model *M* are as follows:
1920
- If *M'* has better performance than *M* against their private data set then the learner votes to approve
2021
- If not, the learner votes to reject
2122
3. The total votes are tallied
22-
- If more than some threshold (typically 50%) of learners approve then *M'* becomes the new global model. If not,
23+
- If more than some threshold (typically 50%) of learners approve then *M'* becomes the new global model. If not,
2324
*M* continues to be the global model
2425
4. A new round begins.
2526

26-
By using a decentralized ledger (a blockchain) this learning process can be run in a completely decentralized,
27-
secure and auditable way. Further security can be provided by using
28-
[differential privacy](https://en.wikipedia.org/wiki/Differential_privacy) to avoid exposing your private data
27+
By using a decentralized ledger (a blockchain) this learning process can be run in a completely decentralized,
28+
secure and auditable way. Further security can be provided by using
29+
[differential privacy](https://en.wikipedia.org/wiki/Differential_privacy) to avoid exposing your private data
2930
set when generating an update.
3031

3132
## Learning algorithms that work for collective learning
32-
Collective learning is not just for neural networks; any learning algorithm that can be trained on subsets of the
33+
34+
Collective learning is not just for neural networks; any learning algorithm that can be trained on subsets of the
3335
data and which can use the results of previous training rounds as the basis for subsequent rounds can be used.
34-
Neural networks fit both these constraints: training can be done on mini-batches of data and each training step uses
36+
Neural networks fit both these constraints: training can be done on mini-batches of data and each training step uses
3537
the weights of the previous training step as its starting point.
3638
More generally, any model that is trained using mini-batch stochastic gradient descent is fine.
3739
Other algorithms can be made to work with collective learning as well.
38-
For example, a random forest can be trained iteratively by having each learner add new trees
40+
For example, a random forest can be trained iteratively by having each learner add new trees
3941
(see example in [mli_random_forest_iris.py]({{ repo_root }}/examples/mli_random_forest_iris.py)).
4042
For more discussion, see [here](./intro_tutorial_mli.md).
4143

42-
43-
4444
## The driver
45-
The driver implements the voting protocol, so it handles selecting a learner to train,
46-
sending the update out for voting, calculating the vote and accepting or declining the update.
47-
Here we have a very minimal driver that doesn't use networking or a blockchain. Eventually the driver will be a
48-
smart contract.
45+
46+
The driver implements the voting protocol, so it handles selecting a learner to train,
47+
sending the update out for voting, calculating the vote and accepting or declining the update.
48+
Here we have a very minimal driver that doesn't use networking or a blockchain. Eventually the driver will be a
49+
smart contract.
4950
This is the code that implements one round of voting:
5051

5152
```python
@@ -65,28 +66,30 @@ def run_one_round(round_index: int, learners: Sequence[MachineLearningInterface]
6566

6667
return prop_weights_list, vote
6768
```
69+
6870
The driver has a list of learners, and each round it selects one learner to be the proposer.
6971
The proposer does some training and proposes an updated set of weights.
70-
The driver then sends the proposed weights to each of the learners, and they each vote on whether this is
72+
The driver then sends the proposed weights to each of the learners, and they each vote on whether this is
7173
an improvement.
72-
If the number of approving votes is greater than the vote threshold the proposed weights are accepted, and if not
74+
If the number of approving votes is greater than the vote threshold the proposed weights are accepted, and if not
7375
they're rejected.
7476

75-
7677
## The Machine Learning Interface
77-
```Python
78+
79+
```Python
7880
{!../colearn/ml_interface.py!}
7981
```
82+
8083
There are four methods that need to be implemented:
8184

8285
1. `propose_weights` causes the model to do some training and then return a
83-
new set of weights that are proposed to the other learners.
86+
new set of weights that are proposed to the other learners.
8487
This method shouldn't change the current weights of the model - that
8588
only happens when `accept_weights` is called.
86-
2. `test_weights` - the models takes some new weights and returns a vote on whether the new weights are an improvement.
87-
As with propose_weights, this shouldn't change the current weights of the model -
89+
2. `test_weights` - the models takes some new weights and returns a vote on whether the new weights are an improvement.
90+
As with propose_weights, this shouldn't change the current weights of the model -
8891
that only happens when `accept_weights` is called.
89-
3. `accept_weights` - the models accepts some weights that have been voted on and approved by the set of learners.
92+
3. `accept_weights` - the models accepts some weights that have been voted on and approved by the set of learners.
9093
The old weighs of the model are discarded and replaced by the new weights.
9194
4. `current_weights` should return the current weights of the model.
9295

docs/demo.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,29 @@
11
# How to run the demo
22

3-
You can try collective learning for yourself using the simple demo in [run_demo]({{repo_root }}/colearn_examples/ml_interface/run_demo.py).
3+
You can try collective learning for yourself using the simple demo in [run_demo]({{repo_root }}/colearn_examples/ml_interface/run_demo.py).
44
This demo creates n learners for one of six learning tasks and co-ordinates the collective learning between them.
55

66
There are six potential models for the demo
77

88
* KERAS_MNIST is the Tensorflow implementation of a small model for the standard handwritten digits recognition dataset
99
* KERAS_MNIST_RESNET is the Tensorflow implementation of a Resnet model for the standard handwritten digits recognition dataset
1010
* KERAS_CIFAR10 is the Tensorflow implementation of the classical image recognition dataset
11-
* PYTORCH_XRAY is Pytorch implementation of a binary classification task that requires predicting pneumonia from images of chest X-rays.
11+
* PYTORCH_XRAY is Pytorch implementation of a binary classification task that requires predicting pneumonia from images of chest X-rays.
1212
The data need to be downloaded from [Kaggle](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia)
13-
* PYTORCH_COVID_XRAY is Pytorch implementation of a 3 class classification task that requires predicting no finding, covid or pneumonia from images of chest X-rays.
13+
* PYTORCH_COVID_XRAY is Pytorch implementation of a 3 class classification task that requires predicting no finding, covid or pneumonia from images of chest X-rays.
1414
This dataset is not currently publicly available.
15-
* FRAUD The fraud dataset consists of information about credit card transactions, and the task is to predict whether
16-
transactions are fraudulent or not.
15+
* FRAUD The fraud dataset consists of information about credit card transactions, and the task is to predict whether
16+
transactions are fraudulent or not.
1717
The data need to be downloaded from [Kaggle](https://www.kaggle.com/c/ieee-fraud-detection)
1818

1919
Use the -h flag to see the options:
20+
2021
```bash
2122
python -m colearn_examples.ml_interface.run_demo -h
2223
```
2324

2425
Arguments to run the demo:
26+
2527
```
2628
--data_dir: Directory containing training data, not required for MNIST and CIFAR10
2729
--test_dir: Optional directory containing test data. A fraction of the training set will be used as a test set when not specified
@@ -36,32 +38,40 @@ Arguments to run the demo:
3638
```
3739

3840
## Running MNIST
41+
3942
The simplest task to run is MNIST because the data are downloaded automatically from `tensorflow_datasets`.
4043
The command below runs the MNIST task with five learners for 15 rounds.
44+
4145
```bash
4246
python -m colearn_examples.ml_interface.run_demo --model KERAS_MNIST --n_learners 5 --n_rounds 15
4347
```
48+
4449
You should see a graph of the vote score and the test score (the score used here is categorical accuracy).
45-
The new model is accepted if the fraction of positive votes (green colour) is higher than 0.5.
46-
The new model is rejected if the fraction of negative votes (red color) is lower than 0.5.
50+
The new model is accepted if the fraction of positive votes (green colour) is higher than 0.5.
51+
The new model is rejected if the fraction of negative votes (red color) is lower than 0.5.
4752

4853
![Alt text](images/mnist_plot.png?raw=true "Collective learning graph")
4954

5055
As you can see, there are five learners, and initially they perform poorly.
5156
In round one, learner 0 is selected to propose a new set of weights.
5257

5358
## Other datasets
59+
5460
To run the CIFAR10 dataset:
61+
5562
```bash
5663
python -m colearn_examples.ml_interface.run_demo --model KERAS_CIFAR10 --n_learners 5 --n_rounds 15
5764
```
65+
5866
The Fraud and X-ray datasets need to be downloaded from kaggle (this requires a kaggle account).
5967
To run the fraud dataset:
68+
6069
```bash
6170
python -m colearn_examples.ml_interface.run_demo --model FRAUD --n_learners 5 --n_rounds 15 --data_dir ./data/fraud
6271
```
72+
6373
To run the X-ray dataset:
74+
6475
```bash
6576
python -m colearn_examples.ml_interface.run_demo --model PYTORCH_XRAY --n_learners 5 --n_rounds 15 --data_dir ./data/xray
6677
```
67-

docs/dev_notes.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,18 @@
33
These are some notes for developers working on the colearn code repo
44

55
## Google Cloud Storage
6+
67
To have access to the google cloud storage you need to set up your google authentication and
7-
have the $GOOGLE_APPLICATION_CREDENTIALS set up correctly.
8+
have the $GOOGLE_APPLICATION_CREDENTIALS set up correctly.
89
For more details ask or see the contract-learn documentation
910

1011
## Build image
1112

1213
To build ML server image and push to google cloud use the following command:
14+
1315
```
1416
cd docker
1517
python3 ./build.py --publish --allow_dirty
1618
# Check this worked correctly
1719
docker images
1820
```
19-

docs/differential_privacy.md

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,22 @@
11
# What is differential privacy?
2-
To make a machine learning system that protects privacy we first need to have a definition of what privacy is.
3-
Differential privacy (DP) is one such definition.
2+
3+
To make a machine learning system that protects privacy we first need to have a definition of what privacy is.
4+
Differential privacy (DP) is one such definition.
45
First we need to have three concepts: the _database_ is a collection of data about _individuals_ (for example, their medical records), and we want to make a _query_ about that data (for example "How much does smoking increase someone's risk of cancer?").
56
DP says that privacy is preserved if the result of the query cannot be used to determine if any particular individual is present in the database.
67

7-
So if person A has their medical data in a database, and the query that we want to make on that database is
8+
So if person A has their medical data in a database, and the query that we want to make on that database is
89
"How much does smoking increase someone's risk of cancer" then the result of that query shouldn't disclose whether or not person A's details are in the database.
910

10-
From this comes the idea of _sensitivity_ of a query.
11-
The _sensitivity_ of a query determines how much the result of the query depends on an individual's data.
11+
From this comes the idea of _sensitivity_ of a query.
12+
The _sensitivity_ of a query determines how much the result of the query depends on an individual's data.
1213
For example, the query "How much does smoking increase the risk of cancer for adults in the UK?" is less sensitive than the query "How much does smoking increase the risk of cancer for men aged 50-55 in Cambridge?" because the second query uses a smaller set of individuals.
1314

1415
## Epsilon-differential privacy
15-
EDP is a scheme for preserving differential privacy.
16+
17+
EDP is a scheme for preserving differential privacy.
1618
In EDP all queries have random noise added to them, so they are no longer deterministic.
17-
So if the query was "What fraction of people in the database are male", and the true result is 0.5 then the results of calling this query three times might be 0.53, 0.49 and 0.51.
19+
So if the query was "What fraction of people in the database are male", and the true result is 0.5 then the results of calling this query three times might be 0.53, 0.49 and 0.51.
1820
This makes it harder to tell if an individual's data is in the database, because the effect of adding a person can't be distinguished from the effect of the random noise.
1921
Intuitively this is a bit like blurring an image: adding noise obscures personal information.
2022
The amount of personal information that is revealed isn't zero, but it is guaranteed to be below a certain threshold.
@@ -24,14 +26,15 @@ Queries that are more sensitive have more noise added, because they reveal more
2426
It is important to add as little noise as possible, because adding more noise obscures the patterns that you want to extract from the data.
2527

2628
## Differential privacy when training neural networks
29+
2730
Each training step for a neural network can be though of as a complicated query on a database of training data.
2831
Differential privacy mechanisms tell you how much noise you need to add to guarantee a certain level of privacy.
2932
The `opacus` and `tensorflow-privacy` libraries implement epsilon-differential privacy for training neural networks for pytorch and keras respectively.
3033

31-
3234
# How to use differential privacy with colearn
35+
3336
By using `opacus` and `tensorflow-privacy` we can make collective learning use differential privacy.
3437
The learner that is proposing weights does so using a DP-enabled optimiser.
3538

36-
To see an example of using this see [dp_pytorch]({{ repo_root }}/colearn_examples/ml_interface/pytorch_mnist_diffpriv.py)
37-
and [dp_keras]({{ repo_root }}/colearn_examples/ml_interface/keras_mnist_diffpriv.py).
39+
To see an example of using this see [dp_pytorch]({{ repo_root }}/colearn_examples/ml_interface/pytorch_mnist_diffpriv.py)
40+
and [dp_keras]({{ repo_root }}/colearn_examples/ml_interface/keras_mnist_diffpriv.py).

0 commit comments

Comments
 (0)