You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+13-15Lines changed: 13 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -2,22 +2,22 @@
2
2
3
3
Colearn is a library that enables privacy-preserving decentralized machine learning tasks on the [FET network](https://fetch.ai/).
4
4
5
-
This blockchain-mediated collective learning system enables multiple stakeholders to build a shared
6
-
machine learning model without needing to rely on a central authority.
7
-
This library is currently in development.
5
+
This blockchain-mediated collective learning system enables multiple stakeholders to build a shared
6
+
machine learning model without needing to rely on a central authority.
7
+
This library is currently in development.
8
8
9
9
The collective learning protocol allows learners to collaborate on training a model without requiring trust between the participants. Learners vote on updates to the model, and only updates which pass the quality threshold are accepted. This makes the system robust to attempts to interfere with the model by providing bad updates. For more details on the collective learning system see [here](https://fetchai.github.io/colearn/about/)
10
10
11
-
###Current Version
11
+
## Current Version
12
12
13
-
We have released *v0.2.8* of the Colearn Machine Learning Interface, the first version of an interface that will
14
-
allow developers to prepare for future releases.
13
+
We have released *v0.2.8* of the Colearn Machine Learning Interface, the first version of an interface that will
14
+
allow developers to prepare for future releases.
15
15
Together with the interface we provide a simple backend for local experiments. This is the first backend with upcoming blockchain ledger based backends to follow.
16
16
Future releases will use similar interfaces so that learners built with the current system will work on a different backend that integrates a distributed ledger and provides other improvements.
17
17
The current framework will then be used mainly for model development and debugging.
18
18
We invite all users to experiment with the framework, develop their own models, and provide feedback!
19
19
20
-
See the most up-to-date documentation at [fetchai.github.io/colearn](https://fetchai.github.io/colearn/)
20
+
See the most up-to-date documentation at [fetchai.github.io/colearn](https://fetchai.github.io/colearn/)
21
21
or the documentation for the latest release at [docs.fetch.ai/colearn](https://docs.fetch.ai/colearn/).
22
22
23
23
## Installation
@@ -27,9 +27,11 @@ Currently we only support macos and unix systems.
27
27
To use the latest stable release we recommend installing the [package from PyPi](https://pypi.org/project/colearn/)
28
28
29
29
To install with support for Keras and Pytorch:
30
+
30
31
```bash
31
32
pip install colearn[all]
32
33
```
34
+
33
35
To install with just support for Keras or Pytorch:
34
36
35
37
```bash
@@ -40,22 +42,18 @@ To install with just support for Keras or Pytorch:
40
42
## Running the examples
41
43
42
44
Examples are available in the colearn_examples module. To run the Mnist demo in Keras or Pytorch run:
Copy file name to clipboardExpand all lines: docs/demo.md
+18-8Lines changed: 18 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -1,27 +1,29 @@
1
1
# How to run the demo
2
2
3
-
You can try collective learning for yourself using the simple demo in [run_demo]({{repo_root }}/colearn_examples/ml_interface/run_demo.py).
3
+
You can try collective learning for yourself using the simple demo in [run_demo]({{repo_root }}/colearn_examples/ml_interface/run_demo.py).
4
4
This demo creates n learners for one of six learning tasks and co-ordinates the collective learning between them.
5
5
6
6
There are six potential models for the demo
7
7
8
8
* KERAS_MNIST is the Tensorflow implementation of a small model for the standard handwritten digits recognition dataset
9
9
* KERAS_MNIST_RESNET is the Tensorflow implementation of a Resnet model for the standard handwritten digits recognition dataset
10
10
* KERAS_CIFAR10 is the Tensorflow implementation of the classical image recognition dataset
11
-
* PYTORCH_XRAY is Pytorch implementation of a binary classification task that requires predicting pneumonia from images of chest X-rays.
11
+
* PYTORCH_XRAY is Pytorch implementation of a binary classification task that requires predicting pneumonia from images of chest X-rays.
12
12
The data need to be downloaded from [Kaggle](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia)
13
-
* PYTORCH_COVID_XRAY is Pytorch implementation of a 3 class classification task that requires predicting no finding, covid or pneumonia from images of chest X-rays.
13
+
* PYTORCH_COVID_XRAY is Pytorch implementation of a 3 class classification task that requires predicting no finding, covid or pneumonia from images of chest X-rays.
14
14
This dataset is not currently publicly available.
15
-
* FRAUD The fraud dataset consists of information about credit card transactions, and the task is to predict whether
16
-
transactions are fraudulent or not.
15
+
* FRAUD The fraud dataset consists of information about credit card transactions, and the task is to predict whether
16
+
transactions are fraudulent or not.
17
17
The data need to be downloaded from [Kaggle](https://www.kaggle.com/c/ieee-fraud-detection)
To make a machine learning system that protects privacy we first need to have a definition of what privacy is.
3
-
Differential privacy (DP) is one such definition.
2
+
3
+
To make a machine learning system that protects privacy we first need to have a definition of what privacy is.
4
+
Differential privacy (DP) is one such definition.
4
5
First we need to have three concepts: the _database_ is a collection of data about _individuals_ (for example, their medical records), and we want to make a _query_ about that data (for example "How much does smoking increase someone's risk of cancer?").
5
6
DP says that privacy is preserved if the result of the query cannot be used to determine if any particular individual is present in the database.
6
7
7
-
So if person A has their medical data in a database, and the query that we want to make on that database is
8
+
So if person A has their medical data in a database, and the query that we want to make on that database is
8
9
"How much does smoking increase someone's risk of cancer" then the result of that query shouldn't disclose whether or not person A's details are in the database.
9
10
10
-
From this comes the idea of _sensitivity_ of a query.
11
-
The _sensitivity_ of a query determines how much the result of the query depends on an individual's data.
11
+
From this comes the idea of _sensitivity_ of a query.
12
+
The _sensitivity_ of a query determines how much the result of the query depends on an individual's data.
12
13
For example, the query "How much does smoking increase the risk of cancer for adults in the UK?" is less sensitive than the query "How much does smoking increase the risk of cancer for men aged 50-55 in Cambridge?" because the second query uses a smaller set of individuals.
13
14
14
15
## Epsilon-differential privacy
15
-
EDP is a scheme for preserving differential privacy.
16
+
17
+
EDP is a scheme for preserving differential privacy.
16
18
In EDP all queries have random noise added to them, so they are no longer deterministic.
17
-
So if the query was "What fraction of people in the database are male", and the true result is 0.5 then the results of calling this query three times might be 0.53, 0.49 and 0.51.
19
+
So if the query was "What fraction of people in the database are male", and the true result is 0.5 then the results of calling this query three times might be 0.53, 0.49 and 0.51.
18
20
This makes it harder to tell if an individual's data is in the database, because the effect of adding a person can't be distinguished from the effect of the random noise.
19
21
Intuitively this is a bit like blurring an image: adding noise obscures personal information.
20
22
The amount of personal information that is revealed isn't zero, but it is guaranteed to be below a certain threshold.
@@ -24,14 +26,15 @@ Queries that are more sensitive have more noise added, because they reveal more
24
26
It is important to add as little noise as possible, because adding more noise obscures the patterns that you want to extract from the data.
25
27
26
28
## Differential privacy when training neural networks
29
+
27
30
Each training step for a neural network can be though of as a complicated query on a database of training data.
28
31
Differential privacy mechanisms tell you how much noise you need to add to guarantee a certain level of privacy.
29
32
The `opacus` and `tensorflow-privacy` libraries implement epsilon-differential privacy for training neural networks for pytorch and keras respectively.
30
33
31
-
32
34
# How to use differential privacy with colearn
35
+
33
36
By using `opacus` and `tensorflow-privacy` we can make collective learning use differential privacy.
34
37
The learner that is proposing weights does so using a DP-enabled optimiser.
35
38
36
-
To see an example of using this see [dp_pytorch]({{ repo_root }}/colearn_examples/ml_interface/pytorch_mnist_diffpriv.py)
37
-
and [dp_keras]({{ repo_root }}/colearn_examples/ml_interface/keras_mnist_diffpriv.py).
39
+
To see an example of using this see [dp_pytorch]({{ repo_root }}/colearn_examples/ml_interface/pytorch_mnist_diffpriv.py)
40
+
and [dp_keras]({{ repo_root }}/colearn_examples/ml_interface/keras_mnist_diffpriv.py).
0 commit comments