labmlai
diff --git a/‎images/python-autocomplete.png
227 KB b/‎images/python-autocomplete.png
227 KB
diff --git a/‎notebooks/evaluate.ipynb
+207-28 b/‎notebooks/evaluate.ipynb
+207-28
@@ -1,5 +1,44 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "[![Github](https://img.shields.io/github/stars/lab-ml/python_autocomplete?style=social)](https://github.com/lab-ml/python_autocomplete)\n",
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lab-ml/python_autocomplete/blob/master/notebooks/evaluate.ipynb)\n",
+    "\n",
+    "# Evaluate a model trained on predicting Python code\n",
+    "\n",
+    "This notebook evaluates a model trained on Python code.\n",
+    "\n",
+    "Here's a link to [training notebook](https://github.com/lab-ml/python_autocomplete/blob/master/notebooks/train.ipynb)\n",
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lab-ml/python_autocomplete/blob/master/notebooks/train.ipynb)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Install dependencies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%capture\n",
+    "!pip install labml labml_python_autocomplete"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Imports"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -22,25 +61,31 @@
     "from python_autocomplete.evaluate import evaluate, anomalies, complete, Predictor"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We load the model from a training run. For this demo I'm loading from a run I trained at home.\n",
+    "\n",
+    "[![View Run](https://img.shields.io/badge/labml-experiment-brightgreen)](https://web.lab-ml.com/run?uuid=39b03a1e454011ebbaff2b26e3148b3d)\n",
+    "\n",
+    "*If you want to try this on Colab you need to run this on the same space where you run the training, because models are saved locally.*"
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 1,
    "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'39b03a1e454011ebbaff2b26e3148b3d'"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
-    "TRAINING_RUN_UUID = '39b03a1e454011ebbaff2b26e3148b3d'\n",
-    "TRAINING_RUN_UUID"
+    "TRAINING_RUN_UUID = '39b03a1e454011ebbaff2b26e3148b3d'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We initialize `Configs` object defined in [`train.py`](https://github.com/lab-ml/python_autocomplete/blob/master/python_autocomplete/train.py)."
    ]
   },
   {
@@ -49,10 +94,32 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "conf = Configs()\n",
+    "conf = Configs()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Create a new experiment in evaluation mode. In evaluation mode a new training run is not created. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
     "experiment.evaluate()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Load custom configurations/hyper-parameters used in the training run."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 4,
@@ -78,18 +145,15 @@
     }
    ],
    "source": [
-    "conf_dict = experiment.load_configs(TRAINING_RUN_UUID)\n",
-    "conf_dict"
+    "custom_conf = experiment.load_configs(TRAINING_RUN_UUID)\n",
+    "custom_conf"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 5,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "conf_dict['device.cuda_device'] = 1\n",
-    "# conf_dict['device.use_cuda'] = False"
+    "Set the custom configurations"
    ]
   },
   {
@@ -111,7 +175,14 @@
     }
    ],
    "source": [
-    "experiment.configs(conf, conf_dict)"
+    "experiment.configs(conf, custom_conf)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Set models for saving and loading. This will load `conf.model` from the specified run."
    ]
   },
   {
@@ -150,6 +221,13 @@
     "experiment.add_pytorch_models({'model': conf.model})"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Specify which run to load from"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 8,
@@ -159,6 +237,13 @@
     "experiment.load(TRAINING_RUN_UUID)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Start the experiment"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 9,
@@ -198,16 +283,47 @@
     "experiment.start()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Initialize the `Predictor` defined in [`evaluate.py`](https://github.com/lab-ml/python_autocomplete/blob/master/python_autocomplete/evaluate.py).\n",
+    "\n",
+    "We load `stoi` and `itos` from cache, so that we don't have to read the dataset to generate them. `stoi` is the map for character to an integer index and `itos` is the map of integer to character map. These indexes are used in the model embeddings for each character."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 10,
    "metadata": {},
    "outputs": [],
    "source": [
-    "p = Predictor(conf.model, cache('stoi', lambda: conf.text.stoi), cache('itos', lambda: conf.text.itos))\n",
+    "p = Predictor(conf.model, cache('stoi', lambda: conf.text.stoi), cache('itos', lambda: conf.text.itos))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Set model to evaluation mode"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
     "_ = conf.model.eval()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A python prompt to test completion."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 11,
@@ -228,6 +344,13 @@
     "                 n_layers int):\"\"\""
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Get a token. `get_token` predicts character by character greedily (no beam search) until it find and end of token character (non alpha-numeric character)."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 12,
@@ -250,6 +373,13 @@
     "print('\"' + res + '\"')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Try another token"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 13,
@@ -264,11 +394,17 @@
     }
    ],
    "source": [
-    "PROMPT += res\n",
-    "res = p.get_token(PROMPT)\n",
+    "res = p.get_token(PROMPT + res)\n",
     "print('\"' + res + '\"')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Load a sample python file to test our model"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 14,
@@ -293,6 +429,22 @@
     "print(sample[-50:])"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Test the model on a sample python file\n",
+    "\n",
+    "`evaluate` function defined in\n",
+    "[`evaluate.py`](https://github.com/lab-ml/python_autocomplete/blob/master/python_autocomplete/evaluate.py)\n",
+    "will predict token by token using the `Predictor`, and simulates an editor autocompletion.\n",
+    "\n",
+    "Colors:\n",
+    "* <span style=\"color:yellow\">yellow</span>: the token predicted is wrong and the user needs to type that character.\n",
+    "* <span style=\"color:blue\">blue</span>: the token predicted is correct and the user selects it with a special key press, such as TAB or ENTER.\n",
+    "* <span style=\"color:green\">green</span>: autocompleted characters based on the prediction"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 15,
@@ -434,6 +586,26 @@
     "evaluate(p, sample)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`accuracy` is the fraction of charactors predicted correctly. `key_strokes` is the number of key strokes required to write the code with help of the model and `length` is the number of characters in the code, that is the number of key strokes required without the model.\n",
+    "\n",
+    "*Note that this sample is a classic MNIST example, and the model must have overfitted to similar codes (exept for it's use of [LabML](https://github.com/lab-ml/labml) 😛).*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Test anomalies in code\n",
+    "\n",
+    "We run the model through the same sample code and visualize the probabilty of predicting each character.\n",
+    "<span style=\"color:green\">green</span> means the probabilty of that character is high and \n",
+    "<span style=\"color:red\">red</span> means the probability is low."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 16,
@@ -563,6 +735,13 @@
     "anomalies(p, sample)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here we try to autocomplete 100 characters"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 17,
@@ -633,7 +812,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.5"
+   "version": "3.7.5"
   }
  },
  "nbformat": 4,