engineersCode · piyueh · Aug 4, 2021 · Aug 4, 2021 · Aug 6, 2021 · Aug 6, 2021
diff --git a/images/neural-networks.png b/images/neural-networks.png
diff --git a/images/neural-networks.svg b/images/neural-networks.svg
diff --git a/notebooks_en/5_Multiple_Logistic_Regression.ipynb b/notebooks_en/5_Multiple_Logistic_Regression.ipynb
diff --git a/notebooks_en/7_The_First_Deep_Neural_Network.ipynb b/notebooks_en/7_The_First_Deep_Neural_Network.ipynb
diff --git a/notebooks_en/8_More_about_optimization.ipynb b/notebooks_en/8_More_about_optimization.ipynb
diff --git a/notebooks_en/Appendix.ipynb b/notebooks_en/Appendix.ipynb
@@ -0,0 +1,175 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "cc66e939",
+   "metadata": {},
+   "source": [
+    "## Beyond the math: all the fancy terms"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ecaec976",
+   "metadata": {},
+   "source": [
+    "Though knowing math is enough for us to implement and use neural network models, understanding the jargon helps communicate with people from different backgrounds. In this section, we would like to fill in the gap between math and the terminology commonly used in the community of machine learning and deep learning.\n",
+    "\n",
+    "Let's use the math from a fully connected neural network as an example:\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\mathbf{z}^1 &= \\sigma_0\\left(\\left(W^0\\right)^\\mathsf{T}\\mathbf{x} + \\mathbf{b}^0\\right) \\\\\n",
+    "\\mathbf{z}^2 &= \\sigma_1\\left(\\left(W^1\\right)^\\mathsf{T}\\mathbf{z}^1 + \\mathbf{b}^1\\right) \\\\\n",
+    "&\\vdots \\\\\n",
+    "\\hat{\\mathbf{y}} &= \\sigma_L\\left(\\left(W^L\\right)^\\mathsf{T}\\mathbf{z}^L + \\mathbf{b}^L\\right)\n",
+    "\\end{aligned}\n",
+    "$$\n",
+    "\n",
+    "Often you see people trying to use this kind of graph to explain a fully connected neural network:\n",
+    "\n",
+    "<img src=\"../images/neural-networks.png\" style=\"width: 400px;\"/> \n",
+    "\n",
+    "This graphical illustration may come from the fact that neural network models are inspired by the true neural networks in human bodies. However, this figure does not really tell us how exactly a model should be implemented.\n",
+    "\n",
+    "Nevertheless, the graphical illustration makes it's easier to understand why the deep learning community names the math components with the following terms."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fde3fb39",
+   "metadata": {},
+   "source": [
+    "##### Features\n",
+    "\n",
+    "The elements in an input vector are called features. Recall in lesson 3, the vector $x$ of each sample has some characteristics of a car. It's why the elements in $x$ are called *features*."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "13d31e91",
+   "metadata": {},
+   "source": [
+    "##### Neurons\n",
+    "\n",
+    "Each value in the intermediate and final results is called a neuron. Though we use vectors $\\mathbf{z}^1$, $\\dots$, $\\hat{\\mathbf{y}}$ to denote the calculation results, they are composed of many elements. In other words, elements in vectors are called neurons. For example, if $\\mathbf{z}^1$ has $12$ elements, i.e., $n_1=12$, then we say $\\mathbf{z}^1$ has 12 neurons.\n",
+    "\n",
+    "A neuron in a neural network means it is a signal processing node. It receives values/signals from other neurons, does some calculations, and then provides the calculation result (or says processed signal) to other neurons. For example, in the above fully connected neural network model, if we expand the vector-matrix form to explicit equations, the element $z_1^2$ from the intermediate vector $\\mathbf{z}^2$ is obtained through:\n",
+    "\n",
+    "$$\n",
+    "z_1^2 = \\sigma_0\\left(W_{1,1}^{1}z_1^{1}+W_{2,1}^{1}z_2^1+\\cdots+W_{n_1,1}^1 z_{n_1}^1 + b_1^1\\right)\n",
+    "$$\n",
+    "\n",
+    "So we say the neuron $z_1^2$ receives signals from neurons $z_1^1$, $z_2^1$, $\\cdots$, $z_{n_1}^1$. And because the value of neuron $z_1^2$ is needed to calculate the intermediate vector $\\mathbf{z}^3$, we also say the neuron $z_1^2$ provides the signal to neurons $z_1^3$, $z_2^3$, $\\cdots$, $z_{n_3}^3$ for further signal processing."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce6812fe",
+   "metadata": {},
+   "source": [
+    "##### Layers\n",
+    "\n",
+    "Neurons that are independent of each other are put together and becomes a layer. For example, to calculate $z_1^2$ from the input $\\mathbf{x}$, we don't need the value of $z_2^2$, and vice versa. We say $z_1^2$ and $z_2^2$ belong to the same layer. In fact, if we look at the vector-matrix format of the model, we can see that each vector is a layer, e.g., $\\mathbf{z}^1$ is a layer because all elements in the vector $\\mathbf{z}^1$ are independent of each other. The same applies to $\\mathbf{z}^2$, $\\dots$, $\\hat{\\mathbf{y}}$.\n",
+    "\n",
+    "We call the vector $\\hat{\\mathbf{y}}$ the output layer because it's the final output of a neural network. Some people extend the naming system to the input vector $\\mathbf{x}$ and call it an input layer. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "860d9943",
+   "metadata": {},
+   "source": [
+    "##### Hidden layers\n",
+    "\n",
+    "Intermediate vectors $\\mathbf{z}^1$, $\\dots$, $\\mathbf{z}^L$ are called hidden layers because we don't see them if we treat a neural network as a black box. The variable $L$ hence denotes the number of hidden layers."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf50d014",
+   "metadata": {},
+   "source": [
+    "##### Forward propagation\n",
+    "\n",
+    "Forward propagation means the procedure of the output $\\hat{\\mathbf{y}}$, starting from the input $\\mathbf{x}$, then $\\mathbf{z}^1$, $\\mathbf{z}^2$, $\\dots$, and so on. The word propagation may come from the propagation of signals. Each neuron layer receives signals from the previous layer, does some processing, and then passes the processed signal to the next layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "241403b9",
+   "metadata": {},
+   "source": [
+    "##### Backward propagation\n",
+    "\n",
+    "Backward propagation is related to calculating the gradients of parameters. We didn't see it in our teaching material as we rely on `autograd` to do the job. However, in a nutshell, backward propagation is a technique for obtaining the gradients of parameters. It is an application of the chain rule from calculus. Nowadays, it's more common to use third-party libraries for calculating gradients, and the backward propagation is usually how these libraries work under the hood."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3032d86",
+   "metadata": {},
+   "source": [
+    "##### Training and learning\n",
+    "\n",
+    "You may notice we use the term *optimization* most of the time, though sometimes the terms *training* and *learning* slip through and are present in the teaching material. They all mean the same thing in machine learning: finding the best set of parameters that makes a model best fits given data. In other words, if you have taken any courses in numerical methods or numerical analysis, they are synonyms of *model fitting*.\n",
+    "\n",
+    "In lesson 5, we used the example of take-home exercises, quizzes, and final exams to explain the concepts of training, validation, and test datasets. We can see the optimization of a model is indeed similar to training a student to some degree."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66f3e93d",
+   "metadata": {},
+   "source": [
+    "##### Learning rate\n",
+    "\n",
+    "The *step size* in gradient-descent-based optimization methods is called learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dbdcb6fa",
+   "metadata": {},
+   "source": [
+    "##### Hyperparameters\n",
+    "\n",
+    "Matrices and vectors, $W^0$, $\\mathbf{b}^0$, $W^1$, $\\mathbf{b}^1$, etc., are called *model parameters* or simply *parameters*. However, we also have other parameters such as the coefficient used in gradient descent (i.e., learning rate), the coefficients for regularization, etc. These parameters are not part of a model and are called hyperparameters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e97a8795",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Execute this cell to load the notebook's style sheet, then ignore it\\n\",\n",
+    "from IPython.core.display import HTML\n",
+    "css_file = '../style/custom.css'\n",
+    "HTML(open(css_file, \"r\").read())"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/scripts/lesson_5_functions.py b/scripts/lesson_5_functions.py
@@ -0,0 +1,111 @@
+#! /usr/bin/env python3
+# -*- coding: utf-8 -*-
+# vim:fenc=utf-8
+"""Functions from lesson 5.
+
+Usage:
+    from lesson_5_functions import *
+
+This will make the following functions available:
+    - logistic
+    - classify
+    - performance
+"""
+from autograd import numpy
+
+
+# doing `from lesson_5_functions import *` will import these objects
+__all__ = ["logistic", "classify", "performance"]
+
+
+def logistic(x):
+    """Logistic/sigmoid function.
+
+    Arguments
+    ---------
+    x : numpy.ndarray
+        The input to the logistic function.
+
+    Returns
+    -------
+    numpy.ndarray
+        The output.
+
+    Notes
+    -----
+    The function does not restrict the shape of the input array. The output
+    has the same shape as the input.
+    """
+    x = numpy.clip(x, -300., 300.)
+    return 1. / (1. + numpy.exp(-x))
+
+
+def classify(x, params, model):
+    """Use a logistic model to label data with 0 or/and 1.
+
+    Arguments
+    ---------
+    x : numpy.ndarray
+        The input of the model. The shape should be (n_images, n_total_pixels).
+    params : a tuple/list of two elements
+        The first element is a 2D array with shape (n_total_pixels, 1). The
+        second elenment is a scalar.
+    model : a callable object
+        The model that takes in `x` and `params` and then returns the probabilities.
+
+    Returns
+    -------
+    labels : numpy.ndarray
+        The shape of the label is the same with `probability`.
+
+    Notes
+    -----
+    This function only works with multiple images, i.e., x has a shape of
+    (n_images, n_total_pixels).
+    """
+    probabilities = model(x, params)
+    labels = (probabilities >= 0.5).astype(float)
+    return labels
+
+
+def performance(predictions, answers, beta=1.0):
+    """Calculate precision, recall, and F-score.
+
+    Arguments
+    ---------
+    predictions : numpy.ndarray of integers
+        The predicted labels.
+    answers : numpy.ndarray of integers
+        The true labels.
+    beta : float
+        A coefficient representing the weight of recall.
+
+    Returns
+    -------
+    precision, recall, score : float
+        Precision, recall, and F-score, respectively.
+    """
+    true_idx = (answers == 1)  # the location where the answers are 1
+    false_idx = (answers == 0)  # the location where the answers are 0
+
+    # true positive: answers are 1 and predictions are also 1
+    n_tp = numpy.count_nonzero(predictions[true_idx] == 1)
+
+    # false positive: answers are 0 but predictions are 1
+    n_fp = numpy.count_nonzero(predictions[false_idx] == 1)
+
+    # true negative: answers are 0 and predictions are also 0
+    n_tn = numpy.count_nonzero(predictions[false_idx] == 0)
+
+    # false negative: answers are 1 but predictions are 0
+    n_fn = numpy.count_nonzero(predictions[true_idx] == 0)
+
+    # precision, recall, and f-score
+    precision = n_tp / (n_tp + n_fp)
+    recall = n_tp / (n_tp + n_fn)
+    score = (
+        (1.0 + beta**2) * precision * recall /
+        (beta**2 * precision + recall)
+    )
+
+    return precision, recall, score
diff --git a/scripts/lesson_7_functions.py b/scripts/lesson_7_functions.py
@@ -0,0 +1,100 @@
+#! /usr/bin/env python3
+# -*- coding: utf-8 -*-
+# vim:fenc=utf-8
+
+"""The functions used in lesson 7.
+
+Usage:
+    from lesson_7_functions import *
+
+This will make the following functions available:
+    - classify
+    - performance
+    - neural_network_model
+    - model_loss
+"""
+from autograd import numpy
+from lesson_5_functions import logistic, classify, performance
+
+
+# doing `from lesson_y_functions import *` will import these objects
+__all__ = ["neural_network_model", "model_loss", "regularized_loss", "classify", "performance"]
+
+
+def neural_network_model(x, params):
+    """A fully-connected neural network with L=1.
+
+    Arguments
+    ---------
+    x : numpy.ndarray
+        The input of the model. It's shape should be (n_images, n_total_pixels).
+    params : a tuple/list of four elements
+        - The first element is W0, a 2D array with shape (n_total_pixels, n_z1).
+        - The second elenment is b0, an 1D array with length n_z1.
+        - The third element is W1, an 1D array with length n_z1.
+        - The fourth element is b1, a scalar.
+
+    Returns
+    -------
+    yhat : numpy.ndarray
+        The predicted values obtained from the model. It's an 1D array with
+        length n_images.
+    """
+    z1 = logistic(numpy.dot(x, params[0])+params[1])
+    yhat = logistic(numpy.dot(z1, params[2])+params[3])
+    return yhat
+
+
+def model_loss(x, true_labels, params):
+    """Calculate the predictions and the loss w.r.t. the true values.
+
+    Arguments
+    ---------
+    x : numpy.ndarray
+        The input of the model. The shape should be (n_images, n_total_pixels).
+    true_labels : numpy.ndarray
+        The true labels of the input images. Should be 1D and have length of
+        n_images.
+    params : a tuple/list of two elements
+        - The first element is W0, a 2D array with shape (n_total_pixels, n_z1).
+        - The second elenment is b0, an 1D array with length n_z1.
+        - The third element is W1, an 1D array with length n_z1.
+        - The fourth element is b1, a scalar.
+
+    Returns
+    -------
+    loss : a scalar
+        The summed loss.
+    """
+    pred = neural_network_model(x, params)
+
+    n_images = x.shape[0]
+
+    # major loss
+    loss = - (
+        numpy.dot(true_labels, numpy.log(pred+1e-15)) +
+        numpy.dot(1.-true_labels, numpy.log(1.-pred+1e-15))
+    ) / n_images
+
+    return loss
+
+
+def regularized_loss(x, true_labels, params, _lambda=1.):
+    """Return the loss with regularization.
+
+    Arguments
+    ---------
+    x, true_labels, params :
+        Parameters for function `model_loss`.
+    _lambda : float
+        The weight of the regularization term. Default: 0.01
+
+    Returns
+    -------
+    loss : a scalar
+        The summed loss.
+    """
+    loss = model_loss(x, true_labels, params)
+    Nw = params[0].shape[0] * params[0].shape[1] + params[2].size
+    reg = ((params[0]**2).sum() + (params[2]**2).sum()) / Nw
+    return loss + _lambda * reg