Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lesson8 #8

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added images/neural-networks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10,131 changes: 10,131 additions & 0 deletions images/neural-networks.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
259 changes: 177 additions & 82 deletions notebooks_en/5_Multiple_Logistic_Regression.ipynb

Large diffs are not rendered by default.

854 changes: 854 additions & 0 deletions notebooks_en/7_The_First_Deep_Neural_Network.ipynb

Large diffs are not rendered by default.

978 changes: 978 additions & 0 deletions notebooks_en/8_More_about_optimization.ipynb

Large diffs are not rendered by default.

175 changes: 175 additions & 0 deletions notebooks_en/Appendix.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "cc66e939",
"metadata": {},
"source": [
"## Beyond the math: all the fancy terms"
]
},
{
"cell_type": "markdown",
"id": "ecaec976",
"metadata": {},
"source": [
"Though knowing math is enough for us to implement and use neural network models, understanding the jargon helps communicate with people from different backgrounds. In this section, we would like to fill in the gap between math and the terminology commonly used in the community of machine learning and deep learning.\n",
"\n",
"Let's use the math from a fully connected neural network as an example:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\mathbf{z}^1 &= \\sigma_0\\left(\\left(W^0\\right)^\\mathsf{T}\\mathbf{x} + \\mathbf{b}^0\\right) \\\\\n",
"\\mathbf{z}^2 &= \\sigma_1\\left(\\left(W^1\\right)^\\mathsf{T}\\mathbf{z}^1 + \\mathbf{b}^1\\right) \\\\\n",
"&\\vdots \\\\\n",
"\\hat{\\mathbf{y}} &= \\sigma_L\\left(\\left(W^L\\right)^\\mathsf{T}\\mathbf{z}^L + \\mathbf{b}^L\\right)\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Often you see people trying to use this kind of graph to explain a fully connected neural network:\n",
"\n",
"<img src=\"../images/neural-networks.png\" style=\"width: 400px;\"/> \n",
"\n",
"This graphical illustration may come from the fact that neural network models are inspired by the true neural networks in human bodies. However, this figure does not really tell us how exactly a model should be implemented.\n",
"\n",
"Nevertheless, the graphical illustration makes it's easier to understand why the deep learning community names the math components with the following terms."
]
},
{
"cell_type": "markdown",
"id": "fde3fb39",
"metadata": {},
"source": [
"##### Features\n",
"\n",
"The elements in an input vector are called features. Recall in lesson 3, the vector $x$ of each sample has some characteristics of a car. It's why the elements in $x$ are called *features*."
]
},
{
"cell_type": "markdown",
"id": "13d31e91",
"metadata": {},
"source": [
"##### Neurons\n",
"\n",
"Each value in the intermediate and final results is called a neuron. Though we use vectors $\\mathbf{z}^1$, $\\dots$, $\\hat{\\mathbf{y}}$ to denote the calculation results, they are composed of many elements. In other words, elements in vectors are called neurons. For example, if $\\mathbf{z}^1$ has $12$ elements, i.e., $n_1=12$, then we say $\\mathbf{z}^1$ has 12 neurons.\n",
"\n",
"A neuron in a neural network means it is a signal processing node. It receives values/signals from other neurons, does some calculations, and then provides the calculation result (or says processed signal) to other neurons. For example, in the above fully connected neural network model, if we expand the vector-matrix form to explicit equations, the element $z_1^2$ from the intermediate vector $\\mathbf{z}^2$ is obtained through:\n",
"\n",
"$$\n",
"z_1^2 = \\sigma_0\\left(W_{1,1}^{1}z_1^{1}+W_{2,1}^{1}z_2^1+\\cdots+W_{n_1,1}^1 z_{n_1}^1 + b_1^1\\right)\n",
"$$\n",
"\n",
"So we say the neuron $z_1^2$ receives signals from neurons $z_1^1$, $z_2^1$, $\\cdots$, $z_{n_1}^1$. And because the value of neuron $z_1^2$ is needed to calculate the intermediate vector $\\mathbf{z}^3$, we also say the neuron $z_1^2$ provides the signal to neurons $z_1^3$, $z_2^3$, $\\cdots$, $z_{n_3}^3$ for further signal processing."
]
},
{
"cell_type": "markdown",
"id": "ce6812fe",
"metadata": {},
"source": [
"##### Layers\n",
"\n",
"Neurons that are independent of each other are put together and becomes a layer. For example, to calculate $z_1^2$ from the input $\\mathbf{x}$, we don't need the value of $z_2^2$, and vice versa. We say $z_1^2$ and $z_2^2$ belong to the same layer. In fact, if we look at the vector-matrix format of the model, we can see that each vector is a layer, e.g., $\\mathbf{z}^1$ is a layer because all elements in the vector $\\mathbf{z}^1$ are independent of each other. The same applies to $\\mathbf{z}^2$, $\\dots$, $\\hat{\\mathbf{y}}$.\n",
"\n",
"We call the vector $\\hat{\\mathbf{y}}$ the output layer because it's the final output of a neural network. Some people extend the naming system to the input vector $\\mathbf{x}$ and call it an input layer. "
]
},
{
"cell_type": "markdown",
"id": "860d9943",
"metadata": {},
"source": [
"##### Hidden layers\n",
"\n",
"Intermediate vectors $\\mathbf{z}^1$, $\\dots$, $\\mathbf{z}^L$ are called hidden layers because we don't see them if we treat a neural network as a black box. The variable $L$ hence denotes the number of hidden layers."
]
},
{
"cell_type": "markdown",
"id": "cf50d014",
"metadata": {},
"source": [
"##### Forward propagation\n",
"\n",
"Forward propagation means the procedure of the output $\\hat{\\mathbf{y}}$, starting from the input $\\mathbf{x}$, then $\\mathbf{z}^1$, $\\mathbf{z}^2$, $\\dots$, and so on. The word propagation may come from the propagation of signals. Each neuron layer receives signals from the previous layer, does some processing, and then passes the processed signal to the next layer."
]
},
{
"cell_type": "markdown",
"id": "241403b9",
"metadata": {},
"source": [
"##### Backward propagation\n",
"\n",
"Backward propagation is related to calculating the gradients of parameters. We didn't see it in our teaching material as we rely on `autograd` to do the job. However, in a nutshell, backward propagation is a technique for obtaining the gradients of parameters. It is an application of the chain rule from calculus. Nowadays, it's more common to use third-party libraries for calculating gradients, and the backward propagation is usually how these libraries work under the hood."
]
},
{
"cell_type": "markdown",
"id": "e3032d86",
"metadata": {},
"source": [
"##### Training and learning\n",
"\n",
"You may notice we use the term *optimization* most of the time, though sometimes the terms *training* and *learning* slip through and are present in the teaching material. They all mean the same thing in machine learning: finding the best set of parameters that makes a model best fits given data. In other words, if you have taken any courses in numerical methods or numerical analysis, they are synonyms of *model fitting*.\n",
"\n",
"In lesson 5, we used the example of take-home exercises, quizzes, and final exams to explain the concepts of training, validation, and test datasets. We can see the optimization of a model is indeed similar to training a student to some degree."
]
},
{
"cell_type": "markdown",
"id": "66f3e93d",
"metadata": {},
"source": [
"##### Learning rate\n",
"\n",
"The *step size* in gradient-descent-based optimization methods is called learning rate."
]
},
{
"cell_type": "markdown",
"id": "dbdcb6fa",
"metadata": {},
"source": [
"##### Hyperparameters\n",
"\n",
"Matrices and vectors, $W^0$, $\\mathbf{b}^0$, $W^1$, $\\mathbf{b}^1$, etc., are called *model parameters* or simply *parameters*. However, we also have other parameters such as the coefficient used in gradient descent (i.e., learning rate), the coefficients for regularization, etc. These parameters are not part of a model and are called hyperparameters."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e97a8795",
"metadata": {},
"outputs": [],
"source": [
"# Execute this cell to load the notebook's style sheet, then ignore it\\n\",\n",
"from IPython.core.display import HTML\n",
"css_file = '../style/custom.css'\n",
"HTML(open(css_file, \"r\").read())"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
111 changes: 111 additions & 0 deletions scripts/lesson_5_functions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
#! /usr/bin/env python3
# -*- coding: utf-8 -*-
# vim:fenc=utf-8
"""Functions from lesson 5.

Usage:
from lesson_5_functions import *

This will make the following functions available:
- logistic
- classify
- performance
"""
from autograd import numpy


# doing `from lesson_5_functions import *` will import these objects
__all__ = ["logistic", "classify", "performance"]


def logistic(x):
"""Logistic/sigmoid function.

Arguments
---------
x : numpy.ndarray
The input to the logistic function.

Returns
-------
numpy.ndarray
The output.

Notes
-----
The function does not restrict the shape of the input array. The output
has the same shape as the input.
"""
x = numpy.clip(x, -300., 300.)
return 1. / (1. + numpy.exp(-x))


def classify(x, params, model):
"""Use a logistic model to label data with 0 or/and 1.

Arguments
---------
x : numpy.ndarray
The input of the model. The shape should be (n_images, n_total_pixels).
params : a tuple/list of two elements
The first element is a 2D array with shape (n_total_pixels, 1). The
second elenment is a scalar.
model : a callable object
The model that takes in `x` and `params` and then returns the probabilities.

Returns
-------
labels : numpy.ndarray
The shape of the label is the same with `probability`.

Notes
-----
This function only works with multiple images, i.e., x has a shape of
(n_images, n_total_pixels).
"""
probabilities = model(x, params)
labels = (probabilities >= 0.5).astype(float)
return labels


def performance(predictions, answers, beta=1.0):
"""Calculate precision, recall, and F-score.

Arguments
---------
predictions : numpy.ndarray of integers
The predicted labels.
answers : numpy.ndarray of integers
The true labels.
beta : float
A coefficient representing the weight of recall.

Returns
-------
precision, recall, score : float
Precision, recall, and F-score, respectively.
"""
true_idx = (answers == 1) # the location where the answers are 1
false_idx = (answers == 0) # the location where the answers are 0

# true positive: answers are 1 and predictions are also 1
n_tp = numpy.count_nonzero(predictions[true_idx] == 1)

# false positive: answers are 0 but predictions are 1
n_fp = numpy.count_nonzero(predictions[false_idx] == 1)

# true negative: answers are 0 and predictions are also 0
n_tn = numpy.count_nonzero(predictions[false_idx] == 0)

# false negative: answers are 1 but predictions are 0
n_fn = numpy.count_nonzero(predictions[true_idx] == 0)

# precision, recall, and f-score
precision = n_tp / (n_tp + n_fp)
recall = n_tp / (n_tp + n_fn)
score = (
(1.0 + beta**2) * precision * recall /
(beta**2 * precision + recall)
)

return precision, recall, score
100 changes: 100 additions & 0 deletions scripts/lesson_7_functions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
#! /usr/bin/env python3
# -*- coding: utf-8 -*-
# vim:fenc=utf-8

"""The functions used in lesson 7.

Usage:
from lesson_7_functions import *

This will make the following functions available:
- classify
- performance
- neural_network_model
- model_loss
"""
from autograd import numpy
from lesson_5_functions import logistic, classify, performance


# doing `from lesson_y_functions import *` will import these objects
__all__ = ["neural_network_model", "model_loss", "regularized_loss", "classify", "performance"]


def neural_network_model(x, params):
"""A fully-connected neural network with L=1.

Arguments
---------
x : numpy.ndarray
The input of the model. It's shape should be (n_images, n_total_pixels).
params : a tuple/list of four elements
- The first element is W0, a 2D array with shape (n_total_pixels, n_z1).
- The second elenment is b0, an 1D array with length n_z1.
- The third element is W1, an 1D array with length n_z1.
- The fourth element is b1, a scalar.

Returns
-------
yhat : numpy.ndarray
The predicted values obtained from the model. It's an 1D array with
length n_images.
"""
z1 = logistic(numpy.dot(x, params[0])+params[1])
yhat = logistic(numpy.dot(z1, params[2])+params[3])
return yhat


def model_loss(x, true_labels, params):
"""Calculate the predictions and the loss w.r.t. the true values.

Arguments
---------
x : numpy.ndarray
The input of the model. The shape should be (n_images, n_total_pixels).
true_labels : numpy.ndarray
The true labels of the input images. Should be 1D and have length of
n_images.
params : a tuple/list of two elements
- The first element is W0, a 2D array with shape (n_total_pixels, n_z1).
- The second elenment is b0, an 1D array with length n_z1.
- The third element is W1, an 1D array with length n_z1.
- The fourth element is b1, a scalar.

Returns
-------
loss : a scalar
The summed loss.
"""
pred = neural_network_model(x, params)

n_images = x.shape[0]

# major loss
loss = - (
numpy.dot(true_labels, numpy.log(pred+1e-15)) +
numpy.dot(1.-true_labels, numpy.log(1.-pred+1e-15))
) / n_images

return loss


def regularized_loss(x, true_labels, params, _lambda=1.):
"""Return the loss with regularization.

Arguments
---------
x, true_labels, params :
Parameters for function `model_loss`.
_lambda : float
The weight of the regularization term. Default: 0.01

Returns
-------
loss : a scalar
The summed loss.
"""
loss = model_loss(x, true_labels, params)
Nw = params[0].shape[0] * params[0].shape[1] + params[2].size
reg = ((params[0]**2).sum() + (params[2]**2).sum()) / Nw
return loss + _lambda * reg
Loading