updated perceptron notebook

dcavar · dcavar · commit 19dca46afdf9 · 2024-10-27T21:32:51.000-04:00
diff --git a/notebooks/Multilayer_Perceptron.ipynb b/notebooks/Multilayer_Perceptron.ipynb
@@ -14,42 +14,259 @@
     "**License:** [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/) ([CA BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/))\n",
     "\n",
     "**Literature:**\n",
+    "\n",
+    "- Samy Baladram \"[Multilayer Perceptron, Explained: A Visual Guide with Mini 2D Dataset](https://towardsdatascience.com/multilayer-perceptron-explained-a-visual-guide-with-mini-2d-dataset-0ae8100c5d1c)\"\n",
     "\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
    "source": [
-    "import numpy as np\n"
+    "import numpy as np"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": []
+   "source": [
+    "## Data\n",
+    "\n",
+    "We will use the data set from Samy Baladram's article listed above. The data shows scores for temperature and humidity from 0 to 3, and a corresponding decision whether playing golf is possible. See [here](https://towardsdatascience.com/support-vector-classifier-explained-a-visual-guide-with-mini-2d-dataset-62e831e7b9e9) for an explanation of the data set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "training_data = [\n",
+    "    (0, 0, 1),\n",
+    "    (1, 0, 0),\n",
+    "    (1, 1, 0),\n",
+    "    (2, 0, 0),\n",
+    "    (3, 1, 1),\n",
+    "    (3, 2, 1),\n",
+    "    (2, 3, 1),\n",
+    "    (3, 3, 0)\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "test_data = [\n",
+    "    (0, 1, 0),\n",
+    "    (0, 2, 0),\n",
+    "    (1, 3, 1),\n",
+    "    (2, 2, 1),\n",
+    "    (3, 1, 1)\n",
+    "]"
+   ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Introduction"
+    "## Introduction\n",
+    "\n",
+    "The network architecture will consume an input vector with two dimensions. One dimension is the score for temperature and the other is the score for humidity.\n",
+    "\n",
+    "We can design the first hidden layer with three nodes, a second subsequent hidden layer with two nodes, and an output layer with one node.\n",
+    "\n",
+    "All nodes are fully connected and represented as a matrix $W$ of 2 x 3 dimensions. The second hidden layer is a matrix $U$ with 3 x 2 dimensions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "W [[0.57916493 0.1989773  0.71685006]\n",
+      " [0.06420334 0.23917944 0.03679699]]\n",
+      "U [[0.44530666 0.60784364]\n",
+      " [0.77164787 0.40612112]\n",
+      " [0.83222563 0.69558143]]\n",
+      "bias_W [[0.90328775 0.89391968 0.63126251]]\n",
+      "bias_U [[0.93231218 0.7755912 ]]\n",
+      "O [[0.6369282 ]\n",
+      " [0.36734706]]\n",
+      "bias_O [[0.93714153]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "W = np.random.random((2, 3))\n",
+    "print(f\"W {W}\")\n",
+    "U = np.random.random((3, 2))\n",
+    "print(f\"U {U}\")\n",
+    "bias_W = np.random.random((1, 3))\n",
+    "print(f\"bias_W {bias_W}\")\n",
+    "bias_U = np.random.random((1, 2))\n",
+    "print(f\"bias_U {bias_U}\")\n",
+    "O = np.random.random((2, 1))\n",
+    "print(f\"O {O}\")\n",
+    "bias_O = np.random.random((1, 1))\n",
+    "print(f\"bias_O {bias_O}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "input_data [[0 0]\n",
+      " [1 0]\n",
+      " [1 1]\n",
+      " [2 0]\n",
+      " [3 1]\n",
+      " [3 2]\n",
+      " [2 3]\n",
+      " [3 3]]\n",
+      "input_data_ground_truth [[1]\n",
+      " [0]\n",
+      " [0]\n",
+      " [0]\n",
+      " [1]\n",
+      " [1]\n",
+      " [1]\n",
+      " [0]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "input_data = np.array([[x[0], x[1]] for x in training_data])\n",
+    "input_data_ground_truth = np.array([[x[2]] for x in training_data])\n",
+    "print(f\"input_data {input_data}\")\n",
+    "print(f\"input_data_ground_truth {input_data_ground_truth}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([1, 0])"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "one_hot = np.array([0, 1, 0, 0, 0, 0, 0, 0])\n",
+    "one_hot.dot(input_data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[0 0] [1]\n",
+      "[1 0] [0]\n",
+      "[1 1] [0]\n",
+      "[2 0] [0]\n",
+      "[3 1] [1]\n",
+      "[3 2] [1]\n",
+      "[2 3] [1]\n",
+      "[3 3] [0]\n"
+     ]
+    }
+   ],
+   "source": [
+    "for row, true_score in zip(input_data, input_data_ground_truth):\n",
+    "    print(row, true_score)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def loss_function(predicted, actual):\n",
+    "    return np.log(predicted) if actual else np.log(1 - predicted)"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": []
+   "source": [
+    "learning_rate = 0.01"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 50,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "output 0.9658545034605426 - true score: 1 - loss -0.03474207364924937\n",
+      "output 0.986959889282255 - true score: 0 - loss -4.3397252318950565\n",
+      "output 0.9894527613414252 - true score: 0 - loss -4.5518911918432865\n",
+      "output 0.995086368253607 - true score: 0 - loss -5.315741947375225\n",
+      "output 0.9985133193959704 - true score: 1 - loss -0.0014877868101581678\n",
+      "output 0.9988002123932317 - true score: 1 - loss -0.0012005079281317262\n",
+      "output 0.9974135571146144 - true score: 1 - loss -0.002589793507494032\n",
+      "output 0.9990317957413032 - true score: 0 - loss -6.940067481896969\n"
+     ]
+    }
+   ],
+   "source": [
+    "for row, true_score in zip(input_data, input_data_ground_truth):\n",
+    "    # print(row, true_score)\n",
+    "    hidden_layer_W = np.maximum(row.dot(W) + bias_W, 0)[0]  # ReLU activation\n",
+    "    # print(f\"hidden_layer_W {hidden_layer_W}\")\n",
+    "    hidden_layer_U = np.maximum(hidden_layer_W.dot(U) + bias_U, 0)[0]  # ReLU activation\n",
+    "    # print(f\"hidden_layer_U {hidden_layer_U}\")\n",
+    "    output = (sigmoid(hidden_layer_U.dot(O) + bias_O))[0][0]\n",
+    "    loss = loss_function(output, true_score[0])\n",
+    "    print(f\"output {output} - true score: {true_score[0]} - loss {loss}\")"
+   ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Inference"
+    "Adding a loss function using binary cross-entropy:"
    ]
   },
   {
@@ -66,6 +283,52 @@
     "## Training"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Backpropagation \n",
+    "\n",
+    "\n",
+    "### Derivative Rules\n",
+    "\n",
+    "\n",
+    "#### Constant Rule\n",
+    "\n",
+    "$y = k$ with $k$ a constant: $\\frac{dy}{dx}=0$\n",
+    "\n",
+    "\n",
+    "#### Power Rule\n",
+    "\n",
+    "$y=x^n$ the derivative is: $\\frac{dy}{dx} (n -1)x^{n-1}$ \n",
+    "\n",
+    "\n",
+    "#### Exponential Rule\n",
+    "\n",
+    "$y=e^{kx}$ the derivative is: $\\frac{dy}{dx}= k e^{kx}$\n",
+    "\n",
+    "\n",
+    "#### Natural Logarithm Rule\n",
+    "\n",
+    "$y=ln(x)$ the derivative is: $\\frac{dy}{dx}=\\frac{1}{x}$\n",
+    "\n",
+    "\n",
+    "#### Sum and Difference Rule\n",
+    "\n",
+    "$y = u + v$ or $y = u - v$ the derivatives are: $\\frac{dy}{dx} = \\frac{du}{dx} + \\frac{dv}{dx}$ or $\\frac{dy}{dx} = \\frac{du}{dx} - \\frac{dv}{dx}$\n",
+    "\n",
+    "\n",
+    "#### Product Rule\n",
+    "\n",
+    "$y = u v$  the derivative is: $\\frac{dy}{dx} = \\frac{du}{dx} v + \\frac{dv}{dx} u$\n",
+    "\n",
+    "\n",
+    "#### Chain Rule\n",
+    "\n",
+    "$y(x) = u(v(x))$ the derivative is: $\\frac{dy(x)}{dx} = \\frac{du(v(x))}{dx} \\frac{dv(x)}{dx}$\n",
+    "\n"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -82,8 +345,22 @@
   }
  ],
  "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
   "language_info": {
-   "name": "python"
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.7"
   }
  },
  "nbformat": 4,