|
1 | 1 | {
|
2 | 2 | "cells": [
|
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "[](https://github.com/lab-ml/python_autocomplete)\n", |
| 8 | + "[](https://colab.research.google.com/github/lab-ml/python_autocomplete/blob/master/notebooks/evaluate.ipynb)\n", |
| 9 | + "\n", |
| 10 | + "# Evaluate a model trained on predicting Python code\n", |
| 11 | + "\n", |
| 12 | + "This notebook evaluates a model trained on Python code.\n", |
| 13 | + "\n", |
| 14 | + "Here's a link to [training notebook](https://github.com/lab-ml/python_autocomplete/blob/master/notebooks/train.ipynb)\n", |
| 15 | + "[](https://colab.research.google.com/github/lab-ml/python_autocomplete/blob/master/notebooks/train.ipynb)" |
| 16 | + ] |
| 17 | + }, |
| 18 | + { |
| 19 | + "cell_type": "markdown", |
| 20 | + "metadata": {}, |
| 21 | + "source": [ |
| 22 | + "### Install dependencies" |
| 23 | + ] |
| 24 | + }, |
| 25 | + { |
| 26 | + "cell_type": "code", |
| 27 | + "execution_count": null, |
| 28 | + "metadata": {}, |
| 29 | + "outputs": [], |
| 30 | + "source": [ |
| 31 | + "%%capture\n", |
| 32 | + "!pip install labml labml_python_autocomplete" |
| 33 | + ] |
| 34 | + }, |
| 35 | + { |
| 36 | + "cell_type": "markdown", |
| 37 | + "metadata": {}, |
| 38 | + "source": [ |
| 39 | + "Imports" |
| 40 | + ] |
| 41 | + }, |
3 | 42 | {
|
4 | 43 | "cell_type": "code",
|
5 | 44 | "execution_count": 1,
|
|
22 | 61 | "from python_autocomplete.evaluate import evaluate, anomalies, complete, Predictor"
|
23 | 62 | ]
|
24 | 63 | },
|
| 64 | + { |
| 65 | + "cell_type": "markdown", |
| 66 | + "metadata": {}, |
| 67 | + "source": [ |
| 68 | + "We load the model from a training run. For this demo I'm loading from a run I trained at home.\n", |
| 69 | + "\n", |
| 70 | + "[](https://web.lab-ml.com/run?uuid=39b03a1e454011ebbaff2b26e3148b3d)\n", |
| 71 | + "\n", |
| 72 | + "*If you want to try this on Colab you need to run this on the same space where you run the training, because models are saved locally.*" |
| 73 | + ] |
| 74 | + }, |
25 | 75 | {
|
26 | 76 | "cell_type": "code",
|
27 |
| - "execution_count": 2, |
| 77 | + "execution_count": 1, |
28 | 78 | "metadata": {},
|
29 |
| - "outputs": [ |
30 |
| - { |
31 |
| - "data": { |
32 |
| - "text/plain": [ |
33 |
| - "'39b03a1e454011ebbaff2b26e3148b3d'" |
34 |
| - ] |
35 |
| - }, |
36 |
| - "execution_count": 2, |
37 |
| - "metadata": {}, |
38 |
| - "output_type": "execute_result" |
39 |
| - } |
40 |
| - ], |
| 79 | + "outputs": [], |
41 | 80 | "source": [
|
42 |
| - "TRAINING_RUN_UUID = '39b03a1e454011ebbaff2b26e3148b3d'\n", |
43 |
| - "TRAINING_RUN_UUID" |
| 81 | + "TRAINING_RUN_UUID = '39b03a1e454011ebbaff2b26e3148b3d'" |
| 82 | + ] |
| 83 | + }, |
| 84 | + { |
| 85 | + "cell_type": "markdown", |
| 86 | + "metadata": {}, |
| 87 | + "source": [ |
| 88 | + "We initialize `Configs` object defined in [`train.py`](https://github.com/lab-ml/python_autocomplete/blob/master/python_autocomplete/train.py)." |
44 | 89 | ]
|
45 | 90 | },
|
46 | 91 | {
|
|
49 | 94 | "metadata": {},
|
50 | 95 | "outputs": [],
|
51 | 96 | "source": [
|
52 |
| - "conf = Configs()\n", |
| 97 | + "conf = Configs()" |
| 98 | + ] |
| 99 | + }, |
| 100 | + { |
| 101 | + "cell_type": "markdown", |
| 102 | + "metadata": {}, |
| 103 | + "source": [ |
| 104 | + "Create a new experiment in evaluation mode. In evaluation mode a new training run is not created. " |
| 105 | + ] |
| 106 | + }, |
| 107 | + { |
| 108 | + "cell_type": "code", |
| 109 | + "execution_count": null, |
| 110 | + "metadata": {}, |
| 111 | + "outputs": [], |
| 112 | + "source": [ |
53 | 113 | "experiment.evaluate()"
|
54 | 114 | ]
|
55 | 115 | },
|
| 116 | + { |
| 117 | + "cell_type": "markdown", |
| 118 | + "metadata": {}, |
| 119 | + "source": [ |
| 120 | + "Load custom configurations/hyper-parameters used in the training run." |
| 121 | + ] |
| 122 | + }, |
56 | 123 | {
|
57 | 124 | "cell_type": "code",
|
58 | 125 | "execution_count": 4,
|
|
78 | 145 | }
|
79 | 146 | ],
|
80 | 147 | "source": [
|
81 |
| - "conf_dict = experiment.load_configs(TRAINING_RUN_UUID)\n", |
82 |
| - "conf_dict" |
| 148 | + "custom_conf = experiment.load_configs(TRAINING_RUN_UUID)\n", |
| 149 | + "custom_conf" |
83 | 150 | ]
|
84 | 151 | },
|
85 | 152 | {
|
86 |
| - "cell_type": "code", |
87 |
| - "execution_count": 5, |
| 153 | + "cell_type": "markdown", |
88 | 154 | "metadata": {},
|
89 |
| - "outputs": [], |
90 | 155 | "source": [
|
91 |
| - "conf_dict['device.cuda_device'] = 1\n", |
92 |
| - "# conf_dict['device.use_cuda'] = False" |
| 156 | + "Set the custom configurations" |
93 | 157 | ]
|
94 | 158 | },
|
95 | 159 | {
|
|
111 | 175 | }
|
112 | 176 | ],
|
113 | 177 | "source": [
|
114 |
| - "experiment.configs(conf, conf_dict)" |
| 178 | + "experiment.configs(conf, custom_conf)" |
| 179 | + ] |
| 180 | + }, |
| 181 | + { |
| 182 | + "cell_type": "markdown", |
| 183 | + "metadata": {}, |
| 184 | + "source": [ |
| 185 | + "Set models for saving and loading. This will load `conf.model` from the specified run." |
115 | 186 | ]
|
116 | 187 | },
|
117 | 188 | {
|
|
150 | 221 | "experiment.add_pytorch_models({'model': conf.model})"
|
151 | 222 | ]
|
152 | 223 | },
|
| 224 | + { |
| 225 | + "cell_type": "markdown", |
| 226 | + "metadata": {}, |
| 227 | + "source": [ |
| 228 | + "Specify which run to load from" |
| 229 | + ] |
| 230 | + }, |
153 | 231 | {
|
154 | 232 | "cell_type": "code",
|
155 | 233 | "execution_count": 8,
|
|
159 | 237 | "experiment.load(TRAINING_RUN_UUID)"
|
160 | 238 | ]
|
161 | 239 | },
|
| 240 | + { |
| 241 | + "cell_type": "markdown", |
| 242 | + "metadata": {}, |
| 243 | + "source": [ |
| 244 | + "Start the experiment" |
| 245 | + ] |
| 246 | + }, |
162 | 247 | {
|
163 | 248 | "cell_type": "code",
|
164 | 249 | "execution_count": 9,
|
|
198 | 283 | "experiment.start()"
|
199 | 284 | ]
|
200 | 285 | },
|
| 286 | + { |
| 287 | + "cell_type": "markdown", |
| 288 | + "metadata": {}, |
| 289 | + "source": [ |
| 290 | + "Initialize the `Predictor` defined in [`evaluate.py`](https://github.com/lab-ml/python_autocomplete/blob/master/python_autocomplete/evaluate.py).\n", |
| 291 | + "\n", |
| 292 | + "We load `stoi` and `itos` from cache, so that we don't have to read the dataset to generate them. `stoi` is the map for character to an integer index and `itos` is the map of integer to character map. These indexes are used in the model embeddings for each character." |
| 293 | + ] |
| 294 | + }, |
201 | 295 | {
|
202 | 296 | "cell_type": "code",
|
203 | 297 | "execution_count": 10,
|
204 | 298 | "metadata": {},
|
205 | 299 | "outputs": [],
|
206 | 300 | "source": [
|
207 |
| - "p = Predictor(conf.model, cache('stoi', lambda: conf.text.stoi), cache('itos', lambda: conf.text.itos))\n", |
| 301 | + "p = Predictor(conf.model, cache('stoi', lambda: conf.text.stoi), cache('itos', lambda: conf.text.itos))" |
| 302 | + ] |
| 303 | + }, |
| 304 | + { |
| 305 | + "cell_type": "markdown", |
| 306 | + "metadata": {}, |
| 307 | + "source": [ |
| 308 | + "Set model to evaluation mode" |
| 309 | + ] |
| 310 | + }, |
| 311 | + { |
| 312 | + "cell_type": "code", |
| 313 | + "execution_count": null, |
| 314 | + "metadata": {}, |
| 315 | + "outputs": [], |
| 316 | + "source": [ |
208 | 317 | "_ = conf.model.eval()"
|
209 | 318 | ]
|
210 | 319 | },
|
| 320 | + { |
| 321 | + "cell_type": "markdown", |
| 322 | + "metadata": {}, |
| 323 | + "source": [ |
| 324 | + "A python prompt to test completion." |
| 325 | + ] |
| 326 | + }, |
211 | 327 | {
|
212 | 328 | "cell_type": "code",
|
213 | 329 | "execution_count": 11,
|
|
228 | 344 | " n_layers int):\"\"\""
|
229 | 345 | ]
|
230 | 346 | },
|
| 347 | + { |
| 348 | + "cell_type": "markdown", |
| 349 | + "metadata": {}, |
| 350 | + "source": [ |
| 351 | + "Get a token. `get_token` predicts character by character greedily (no beam search) until it find and end of token character (non alpha-numeric character)." |
| 352 | + ] |
| 353 | + }, |
231 | 354 | {
|
232 | 355 | "cell_type": "code",
|
233 | 356 | "execution_count": 12,
|
|
250 | 373 | "print('\"' + res + '\"')"
|
251 | 374 | ]
|
252 | 375 | },
|
| 376 | + { |
| 377 | + "cell_type": "markdown", |
| 378 | + "metadata": {}, |
| 379 | + "source": [ |
| 380 | + "Try another token" |
| 381 | + ] |
| 382 | + }, |
253 | 383 | {
|
254 | 384 | "cell_type": "code",
|
255 | 385 | "execution_count": 13,
|
|
264 | 394 | }
|
265 | 395 | ],
|
266 | 396 | "source": [
|
267 |
| - "PROMPT += res\n", |
268 |
| - "res = p.get_token(PROMPT)\n", |
| 397 | + "res = p.get_token(PROMPT + res)\n", |
269 | 398 | "print('\"' + res + '\"')"
|
270 | 399 | ]
|
271 | 400 | },
|
| 401 | + { |
| 402 | + "cell_type": "markdown", |
| 403 | + "metadata": {}, |
| 404 | + "source": [ |
| 405 | + "Load a sample python file to test our model" |
| 406 | + ] |
| 407 | + }, |
272 | 408 | {
|
273 | 409 | "cell_type": "code",
|
274 | 410 | "execution_count": 14,
|
|
293 | 429 | "print(sample[-50:])"
|
294 | 430 | ]
|
295 | 431 | },
|
| 432 | + { |
| 433 | + "cell_type": "markdown", |
| 434 | + "metadata": {}, |
| 435 | + "source": [ |
| 436 | + "## Test the model on a sample python file\n", |
| 437 | + "\n", |
| 438 | + "`evaluate` function defined in\n", |
| 439 | + "[`evaluate.py`](https://github.com/lab-ml/python_autocomplete/blob/master/python_autocomplete/evaluate.py)\n", |
| 440 | + "will predict token by token using the `Predictor`, and simulates an editor autocompletion.\n", |
| 441 | + "\n", |
| 442 | + "Colors:\n", |
| 443 | + "* <span style=\"color:yellow\">yellow</span>: the token predicted is wrong and the user needs to type that character.\n", |
| 444 | + "* <span style=\"color:blue\">blue</span>: the token predicted is correct and the user selects it with a special key press, such as TAB or ENTER.\n", |
| 445 | + "* <span style=\"color:green\">green</span>: autocompleted characters based on the prediction" |
| 446 | + ] |
| 447 | + }, |
296 | 448 | {
|
297 | 449 | "cell_type": "code",
|
298 | 450 | "execution_count": 15,
|
|
434 | 586 | "evaluate(p, sample)"
|
435 | 587 | ]
|
436 | 588 | },
|
| 589 | + { |
| 590 | + "cell_type": "markdown", |
| 591 | + "metadata": {}, |
| 592 | + "source": [ |
| 593 | + "`accuracy` is the fraction of charactors predicted correctly. `key_strokes` is the number of key strokes required to write the code with help of the model and `length` is the number of characters in the code, that is the number of key strokes required without the model.\n", |
| 594 | + "\n", |
| 595 | + "*Note that this sample is a classic MNIST example, and the model must have overfitted to similar codes (exept for it's use of [LabML](https://github.com/lab-ml/labml) 😛).*" |
| 596 | + ] |
| 597 | + }, |
| 598 | + { |
| 599 | + "cell_type": "markdown", |
| 600 | + "metadata": {}, |
| 601 | + "source": [ |
| 602 | + "## Test anomalies in code\n", |
| 603 | + "\n", |
| 604 | + "We run the model through the same sample code and visualize the probabilty of predicting each character.\n", |
| 605 | + "<span style=\"color:green\">green</span> means the probabilty of that character is high and \n", |
| 606 | + "<span style=\"color:red\">red</span> means the probability is low." |
| 607 | + ] |
| 608 | + }, |
437 | 609 | {
|
438 | 610 | "cell_type": "code",
|
439 | 611 | "execution_count": 16,
|
|
563 | 735 | "anomalies(p, sample)"
|
564 | 736 | ]
|
565 | 737 | },
|
| 738 | + { |
| 739 | + "cell_type": "markdown", |
| 740 | + "metadata": {}, |
| 741 | + "source": [ |
| 742 | + "Here we try to autocomplete 100 characters" |
| 743 | + ] |
| 744 | + }, |
566 | 745 | {
|
567 | 746 | "cell_type": "code",
|
568 | 747 | "execution_count": 17,
|
|
633 | 812 | "name": "python",
|
634 | 813 | "nbconvert_exporter": "python",
|
635 | 814 | "pygments_lexer": "ipython3",
|
636 |
| - "version": "3.8.5" |
| 815 | + "version": "3.7.5" |
637 | 816 | }
|
638 | 817 | },
|
639 | 818 | "nbformat": 4,
|
|
0 commit comments