fetchai
diff --git a/‎README.md
Lines changed: 13 additions & 15 deletions b/‎README.md
Lines changed: 13 additions & 15 deletions
diff --git a/‎docs/about.md
Lines changed: 32 additions & 29 deletions b/‎docs/about.md
Lines changed: 32 additions & 29 deletions
diff --git a/‎docs/demo.md
Lines changed: 18 additions & 8 deletions b/‎docs/demo.md
Lines changed: 18 additions & 8 deletions
diff --git a/‎docs/dev_notes.md
Lines changed: 3 additions & 2 deletions b/‎docs/dev_notes.md
Lines changed: 3 additions & 2 deletions
diff --git a/‎docs/differential_privacy.md
Lines changed: 13 additions & 10 deletions b/‎docs/differential_privacy.md
Lines changed: 13 additions & 10 deletions
@@ -2,22 +2,22 @@
 
 Colearn is a library that enables privacy-preserving decentralized machine learning tasks on the [FET network](https://fetch.ai/).
 
-This blockchain-mediated collective learning system enables multiple stakeholders to build a shared 
-machine learning model without needing to rely on a central authority. 
-This library is currently in development. 
+This blockchain-mediated collective learning system enables multiple stakeholders to build a shared
+machine learning model without needing to rely on a central authority.
+This library is currently in development.
 
 The collective learning protocol allows learners to collaborate on training a model without requiring trust between the participants. Learners vote on updates to the model, and only updates which pass the quality threshold are accepted. This makes the system robust to attempts to interfere with the model by providing bad updates. For more details on the collective learning system see [here](https://fetchai.github.io/colearn/about/)
 
-### Current Version
+## Current Version
 
-We have released *v0.2.8* of the Colearn Machine Learning Interface, the first version of an interface that will 
-allow developers to prepare for future releases. 
+We have released *v0.2.8* of the Colearn Machine Learning Interface, the first version of an interface that will
+allow developers to prepare for future releases.
 Together with the interface we provide a simple backend for local experiments. This is the first backend with upcoming blockchain ledger based backends to follow.  
 Future releases will use similar interfaces so that learners built with the current system will work on a different backend that integrates a distributed ledger and provides other improvements.
 The current framework will then be used mainly for model development and debugging.
 We invite all users to experiment with the framework, develop their own models, and provide feedback!
 
-See the most up-to-date documentation at [fetchai.github.io/colearn](https://fetchai.github.io/colearn/) 
+See the most up-to-date documentation at [fetchai.github.io/colearn](https://fetchai.github.io/colearn/)
 or the documentation for the latest release at [docs.fetch.ai/colearn](https://docs.fetch.ai/colearn/).
 
 ## Installation
@@ -27,9 +27,11 @@ Currently we only support macos and unix systems.
 To use the latest stable release we recommend installing the [package from PyPi](https://pypi.org/project/colearn/)
 
 To install with support for Keras and Pytorch:
+
    ```bash
    pip install colearn[all]
    ```
+
 To install with just support for Keras or Pytorch:
 
    ```bash
@@ -40,22 +42,18 @@ To install with just support for Keras or Pytorch:
 ## Running the examples
 
 Examples are available in the colearn_examples module. To run the Mnist demo in Keras or Pytorch run:
+
    ```bash
    python -m colearn_examples.ml_interface.keras_mnist
    python -m colearn_examples.ml_interface.pytorch_mnist
    ```
-- Or they can be accessed from colearn/colearn_examples by cloning the colearn repo
 
-    Please note that although all the examples are always available, which you can run will depend on your installation. 
-    If you installed only `colearn[keras]` or `colearn[pytorch]` then only their respective examples will work. 
+- Or they can be accessed from colearn/colearn_examples by cloning the colearn repo
 
+    Please note that although all the examples are always available, which you can run will depend on your installation.
+    If you installed only `colearn[keras]` or `colearn[pytorch]` then only their respective examples will work.
 
 For more instructions see the documentation at [fetchai.github.io/colearn/installation](https://fetchai.github.io/colearn/installation/)
 
 After installation we recommend [running a demo](https://fetchai.github.io/colearn/demo/)
 , or seeing [the examples](https://fetchai.github.io/colearn/examples/)
-
-
-
-
-
@@ -1,16 +1,17 @@
 # How collective learning works
-A Colearn experiment begins when a group of entities, referred to as  *learners*, decide on a model architecture and 
-begin learning. Together they will train a single global model. The goal is to train a model that performs better 
-than any of the learners can produce by training on their private data set. 
+
+A Colearn experiment begins when a group of entities, referred to as  *learners*, decide on a model architecture and
+begin learning. Together they will train a single global model. The goal is to train a model that performs better
+than any of the learners can produce by training on their private data set.
 
 ### How Training Works
 
-Training occurs in rounds; during each round the learners attempt to improve the performance of the global shared 
-model. 
-To do so each round an **update** of the global model (for example new set of weights in a neural network) is proposed. 
+Training occurs in rounds; during each round the learners attempt to improve the performance of the global shared
+model.
+To do so each round an **update** of the global model (for example new set of weights in a neural network) is proposed.
 The learners then **validate** the update and decide if the new model is better than the current global model.  
-If enough learners *approve* the update then the global model is updated. After an update is approved or rejected a 
-new round begins. 
+If enough learners *approve* the update then the global model is updated. After an update is approved or rejected a
+new round begins.
 
 The detailed steps of a round updating a global model *M* are as follows:
 
@@ -19,33 +20,33 @@ The detailed steps of a round updating a global model *M* are as follows:
    - If *M'* has better performance than *M* against their private data set then the learner votes to approve
    - If not, the learner votes to reject
 3. The total votes are tallied
-   - If more than some threshold (typically 50%) of learners approve then *M'* becomes the new global model. If not, 
+   - If more than some threshold (typically 50%) of learners approve then *M'* becomes the new global model. If not,
      *M* continues to be the global model
 4. A new round begins.
 
-By using a decentralized ledger (a blockchain) this learning process can be run in a completely decentralized, 
-secure and auditable way. Further security can be provided by using 
-[differential privacy](https://en.wikipedia.org/wiki/Differential_privacy) to avoid exposing your private data 
+By using a decentralized ledger (a blockchain) this learning process can be run in a completely decentralized,
+secure and auditable way. Further security can be provided by using
+[differential privacy](https://en.wikipedia.org/wiki/Differential_privacy) to avoid exposing your private data
 set when generating an update.
 
 ## Learning algorithms that work for collective learning
-Collective learning is not just for neural networks; any learning algorithm that can be trained on subsets of the 
+
+Collective learning is not just for neural networks; any learning algorithm that can be trained on subsets of the
 data and which can use the results of previous training rounds as the basis for subsequent rounds can be used.
-Neural networks fit both these constraints: training can be done on mini-batches of data and each training step uses 
+Neural networks fit both these constraints: training can be done on mini-batches of data and each training step uses
 the weights of the previous training step as its starting point.
 More generally, any model that is trained using mini-batch stochastic gradient descent is fine.
 Other algorithms can be made to work with collective learning as well.
-For example, a random forest can be trained iteratively by having each learner add new trees 
+For example, a random forest can be trained iteratively by having each learner add new trees
 (see example in [mli_random_forest_iris.py]({{ repo_root }}/examples/mli_random_forest_iris.py)).
 For more discussion, see [here](./intro_tutorial_mli.md).
 
-
-
 ## The driver
-The driver implements the voting protocol, so it handles selecting a learner to train, 
-sending the update out for voting, calculating the vote and accepting or declining the update. 
-Here we have a very minimal driver that doesn't use networking or a blockchain. Eventually the driver will be a 
-smart contract. 
+
+The driver implements the voting protocol, so it handles selecting a learner to train,
+sending the update out for voting, calculating the vote and accepting or declining the update.
+Here we have a very minimal driver that doesn't use networking or a blockchain. Eventually the driver will be a
+smart contract.
 This is the code that implements one round of voting:
 
 ```python
@@ -65,28 +66,30 @@ def run_one_round(round_index: int, learners: Sequence[MachineLearningInterface]
 
     return prop_weights_list, vote
 ```
+
 The driver has a list of learners, and each round it selects one learner to be the proposer.
 The proposer does some training and proposes an updated set of weights.
-The driver then sends the proposed weights to each of the learners, and they each vote on whether this is 
+The driver then sends the proposed weights to each of the learners, and they each vote on whether this is
 an improvement.
-If the number of approving votes is greater than the vote threshold the proposed weights are accepted, and if not 
+If the number of approving votes is greater than the vote threshold the proposed weights are accepted, and if not
 they're rejected.
 
-
 ## The Machine Learning Interface
-```Python 
+
+```Python
 {!../colearn/ml_interface.py!} 
 ```
+
 There are four methods that need to be implemented:
 
 1. `propose_weights` causes the model to do some training and then return a
-   new set of weights that are proposed to the other learners. 
+   new set of weights that are proposed to the other learners.
    This method shouldn't change the current weights of the model - that
    only happens when `accept_weights` is called.
-2. `test_weights` - the models takes some new weights and returns a vote on whether the new weights are an improvement. 
-   As with propose_weights, this shouldn't change the current weights of the model - 
+2. `test_weights` - the models takes some new weights and returns a vote on whether the new weights are an improvement.
+   As with propose_weights, this shouldn't change the current weights of the model -
    that only happens when `accept_weights` is called.
-3. `accept_weights` - the models accepts some weights that have been voted on and approved by the set of learners. 
+3. `accept_weights` - the models accepts some weights that have been voted on and approved by the set of learners.
     The old weighs of the model are discarded and replaced by the new weights.
 4. `current_weights` should return the current weights of the model.
 
 
@@ -1,27 +1,29 @@
 # How to run the demo
 
-You can try collective learning for yourself using the simple demo in [run_demo]({{repo_root }}/colearn_examples/ml_interface/run_demo.py). 
+You can try collective learning for yourself using the simple demo in [run_demo]({{repo_root }}/colearn_examples/ml_interface/run_demo.py).
 This demo creates n learners for one of six learning tasks and co-ordinates the collective learning between them.
 
 There are six potential models for the demo
 
 * KERAS_MNIST is the Tensorflow implementation of a small model for the standard handwritten digits recognition dataset
 * KERAS_MNIST_RESNET is the Tensorflow implementation of a Resnet model for the standard handwritten digits recognition dataset
 * KERAS_CIFAR10 is the Tensorflow implementation of the classical image recognition dataset
-* PYTORCH_XRAY is Pytorch implementation of a binary classification task that requires predicting pneumonia from images of chest X-rays. 
+* PYTORCH_XRAY is Pytorch implementation of a binary classification task that requires predicting pneumonia from images of chest X-rays.
   The data need to be downloaded from [Kaggle](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia)
-* PYTORCH_COVID_XRAY is Pytorch implementation of a 3 class classification task that requires predicting no finding, covid or pneumonia from images of chest X-rays. 
+* PYTORCH_COVID_XRAY is Pytorch implementation of a 3 class classification task that requires predicting no finding, covid or pneumonia from images of chest X-rays.
   This dataset is not currently publicly available.
-* FRAUD The fraud dataset consists of information about credit card transactions, and the task is to predict whether 
-  transactions are fraudulent or not. 
+* FRAUD The fraud dataset consists of information about credit card transactions, and the task is to predict whether
+  transactions are fraudulent or not.
   The data need to be downloaded from [Kaggle](https://www.kaggle.com/c/ieee-fraud-detection)
 
 Use the -h flag to see the options:
+
 ```bash
 python -m colearn_examples.ml_interface.run_demo -h
 ```
 
 Arguments to run the demo:
+
 ```
 --data_dir:       Directory containing training data, not required for MNIST and CIFAR10
 --test_dir:       Optional directory containing test data. A fraction of the training set will be used as a test set when not specified
@@ -36,32 +38,40 @@ Arguments to run the demo:
 ```
 
 ## Running MNIST
+
 The simplest task to run is MNIST because the data are downloaded automatically from `tensorflow_datasets`.
 The command below runs the MNIST task with five learners for 15 rounds.
+
 ```bash
 python -m colearn_examples.ml_interface.run_demo --model KERAS_MNIST --n_learners 5 --n_rounds 15
 ```
+
 You should see a graph of the vote score and the test score (the score used here is categorical accuracy).
-The new model is accepted if the fraction of positive votes (green colour) is higher than 0.5. 
-The new model is rejected if the fraction of negative votes (red color) is lower than 0.5. 
+The new model is accepted if the fraction of positive votes (green colour) is higher than 0.5.
+The new model is rejected if the fraction of negative votes (red color) is lower than 0.5.
 
 ![Alt text](images/mnist_plot.png?raw=true "Collective learning graph")
 
 As you can see, there are five learners, and initially they perform poorly.
 In round one, learner 0 is selected to propose a new set of weights.
 
 ## Other datasets
+
 To run the CIFAR10 dataset:
+
 ```bash
 python -m colearn_examples.ml_interface.run_demo --model KERAS_CIFAR10 --n_learners 5 --n_rounds 15
 ```
+
 The Fraud and X-ray datasets need to be downloaded from kaggle (this requires a kaggle account).
 To run the fraud dataset:
+
 ```bash
 python -m colearn_examples.ml_interface.run_demo --model FRAUD --n_learners 5 --n_rounds 15 --data_dir ./data/fraud
 ```
+
 To run the X-ray dataset:
+
 ```bash
 python -m colearn_examples.ml_interface.run_demo --model PYTORCH_XRAY --n_learners 5 --n_rounds 15 --data_dir ./data/xray
 ```
-
 
@@ -3,17 +3,18 @@
 These are some notes for developers working on the colearn code repo
 
 ## Google Cloud Storage
+
 To have access to the google cloud storage you need to set up your google authentication and
-have the $GOOGLE_APPLICATION_CREDENTIALS set up correctly. 
+have the $GOOGLE_APPLICATION_CREDENTIALS set up correctly.
 For more details ask or see the contract-learn documentation
 
 ## Build image
 
 To build ML server image and push to google cloud use the following command:
+
 ```
 cd docker
 python3 ./build.py --publish --allow_dirty
 # Check this worked correctly
 docker images
 ```
-
 
@@ -1,20 +1,22 @@
 # What is differential privacy?
-To make a machine learning system that protects privacy we first need to have a definition of what privacy is. 
-Differential privacy (DP) is one such definition. 
+
+To make a machine learning system that protects privacy we first need to have a definition of what privacy is.
+Differential privacy (DP) is one such definition.
 First we need to have three concepts: the _database_ is a collection of data about _individuals_ (for example, their medical records), and we want to make a _query_ about that data (for example "How much does smoking increase someone's risk of cancer?").
 DP says that privacy is preserved if the result of the query cannot be used to determine if any particular individual is present in the database.
 
-So if person A has their medical data in a database, and the query that we want to make on that database is 
+So if person A has their medical data in a database, and the query that we want to make on that database is
 "How much does smoking increase someone's risk of cancer" then the result of that query shouldn't disclose whether or not person A's details are in the database.
 
-From this comes the idea of _sensitivity_ of a query. 
-The _sensitivity_ of a query determines how much the result of the query depends on an individual's data. 
+From this comes the idea of _sensitivity_ of a query.
+The _sensitivity_ of a query determines how much the result of the query depends on an individual's data.
 For example, the query "How much does smoking increase the risk of cancer for adults in the UK?" is less sensitive than the query "How much does smoking increase the risk of cancer for men aged 50-55 in Cambridge?" because the second query uses a smaller set of individuals.
 
 ## Epsilon-differential privacy
-EDP is a scheme for preserving differential privacy. 
+
+EDP is a scheme for preserving differential privacy.
 In EDP all queries have random noise added to them, so they are no longer deterministic.
-So if the query was "What fraction of people in the database are male", and the true result is 0.5 then the results of calling this query three times might be 0.53, 0.49 and 0.51. 
+So if the query was "What fraction of people in the database are male", and the true result is 0.5 then the results of calling this query three times might be 0.53, 0.49 and 0.51.
 This makes it harder to tell if an individual's data is in the database, because the effect of adding a person can't be distinguished from the effect of the random noise.
 Intuitively this is a bit like blurring an image: adding noise obscures personal information.
 The amount of personal information that is revealed isn't zero, but it is guaranteed to be below a certain threshold.
@@ -24,14 +26,15 @@ Queries that are more sensitive have more noise added, because they reveal more
 It is important to add as little noise as possible, because adding more noise obscures the patterns that you want to extract from the data.
 
 ## Differential privacy when training neural networks
+
 Each training step for a neural network can be though of as a complicated query on a database of training data.
 Differential privacy mechanisms tell you how much noise you need to add to guarantee a certain level of privacy.
 The `opacus` and `tensorflow-privacy` libraries implement epsilon-differential privacy for training neural networks for pytorch and keras respectively.
 
-
 # How to use differential privacy with colearn
+
 By using `opacus` and `tensorflow-privacy` we can make collective learning use differential privacy.
 The learner that is proposing weights does so using a DP-enabled optimiser.
 
-To see an example of using this see [dp_pytorch]({{ repo_root }}/colearn_examples/ml_interface/pytorch_mnist_diffpriv.py) 
-and [dp_keras]({{ repo_root }}/colearn_examples/ml_interface/keras_mnist_diffpriv.py).
+To see an example of using this see [dp_pytorch]({{ repo_root }}/colearn_examples/ml_interface/pytorch_mnist_diffpriv.py)
+and [dp_keras]({{ repo_root }}/colearn_examples/ml_interface/keras_mnist_diffpriv.py).