Skip to content

Commit 19dca46

Browse files
committed
updated perceptron notebook
1 parent 8836172 commit 19dca46

File tree

1 file changed

+284
-7
lines changed

1 file changed

+284
-7
lines changed

notebooks/Multilayer_Perceptron.ipynb

+284-7
Original file line numberDiff line numberDiff line change
@@ -14,42 +14,259 @@
1414
"**License:** [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/) ([CA BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/))\n",
1515
"\n",
1616
"**Literature:**\n",
17+
"\n",
18+
"- Samy Baladram \"[Multilayer Perceptron, Explained: A Visual Guide with Mini 2D Dataset](https://towardsdatascience.com/multilayer-perceptron-explained-a-visual-guide-with-mini-2d-dataset-0ae8100c5d1c)\"\n",
1719
"\n"
1820
]
1921
},
2022
{
2123
"cell_type": "code",
22-
"execution_count": null,
24+
"execution_count": 1,
2325
"metadata": {},
2426
"outputs": [],
2527
"source": [
26-
"import numpy as np\n"
28+
"import numpy as np"
2729
]
2830
},
2931
{
3032
"cell_type": "markdown",
3133
"metadata": {},
32-
"source": []
34+
"source": [
35+
"## Data\n",
36+
"\n",
37+
"We will use the data set from Samy Baladram's article listed above. The data shows scores for temperature and humidity from 0 to 3, and a corresponding decision whether playing golf is possible. See [here](https://towardsdatascience.com/support-vector-classifier-explained-a-visual-guide-with-mini-2d-dataset-62e831e7b9e9) for an explanation of the data set."
38+
]
39+
},
40+
{
41+
"cell_type": "code",
42+
"execution_count": 5,
43+
"metadata": {},
44+
"outputs": [],
45+
"source": [
46+
"training_data = [\n",
47+
" (0, 0, 1),\n",
48+
" (1, 0, 0),\n",
49+
" (1, 1, 0),\n",
50+
" (2, 0, 0),\n",
51+
" (3, 1, 1),\n",
52+
" (3, 2, 1),\n",
53+
" (2, 3, 1),\n",
54+
" (3, 3, 0)\n",
55+
"]"
56+
]
57+
},
58+
{
59+
"cell_type": "code",
60+
"execution_count": 6,
61+
"metadata": {},
62+
"outputs": [],
63+
"source": [
64+
"test_data = [\n",
65+
" (0, 1, 0),\n",
66+
" (0, 2, 0),\n",
67+
" (1, 3, 1),\n",
68+
" (2, 2, 1),\n",
69+
" (3, 1, 1)\n",
70+
"]"
71+
]
3372
},
3473
{
3574
"cell_type": "markdown",
3675
"metadata": {},
3776
"source": [
38-
"## Introduction"
77+
"## Introduction\n",
78+
"\n",
79+
"The network architecture will consume an input vector with two dimensions. One dimension is the score for temperature and the other is the score for humidity.\n",
80+
"\n",
81+
"We can design the first hidden layer with three nodes, a second subsequent hidden layer with two nodes, and an output layer with one node.\n",
82+
"\n",
83+
"All nodes are fully connected and represented as a matrix $W$ of 2 x 3 dimensions. The second hidden layer is a matrix $U$ with 3 x 2 dimensions."
84+
]
85+
},
86+
{
87+
"cell_type": "code",
88+
"execution_count": 37,
89+
"metadata": {},
90+
"outputs": [
91+
{
92+
"name": "stdout",
93+
"output_type": "stream",
94+
"text": [
95+
"W [[0.57916493 0.1989773 0.71685006]\n",
96+
" [0.06420334 0.23917944 0.03679699]]\n",
97+
"U [[0.44530666 0.60784364]\n",
98+
" [0.77164787 0.40612112]\n",
99+
" [0.83222563 0.69558143]]\n",
100+
"bias_W [[0.90328775 0.89391968 0.63126251]]\n",
101+
"bias_U [[0.93231218 0.7755912 ]]\n",
102+
"O [[0.6369282 ]\n",
103+
" [0.36734706]]\n",
104+
"bias_O [[0.93714153]]\n"
105+
]
106+
}
107+
],
108+
"source": [
109+
"W = np.random.random((2, 3))\n",
110+
"print(f\"W {W}\")\n",
111+
"U = np.random.random((3, 2))\n",
112+
"print(f\"U {U}\")\n",
113+
"bias_W = np.random.random((1, 3))\n",
114+
"print(f\"bias_W {bias_W}\")\n",
115+
"bias_U = np.random.random((1, 2))\n",
116+
"print(f\"bias_U {bias_U}\")\n",
117+
"O = np.random.random((2, 1))\n",
118+
"print(f\"O {O}\")\n",
119+
"bias_O = np.random.random((1, 1))\n",
120+
"print(f\"bias_O {bias_O}\")"
121+
]
122+
},
123+
{
124+
"cell_type": "code",
125+
"execution_count": 16,
126+
"metadata": {},
127+
"outputs": [
128+
{
129+
"name": "stdout",
130+
"output_type": "stream",
131+
"text": [
132+
"input_data [[0 0]\n",
133+
" [1 0]\n",
134+
" [1 1]\n",
135+
" [2 0]\n",
136+
" [3 1]\n",
137+
" [3 2]\n",
138+
" [2 3]\n",
139+
" [3 3]]\n",
140+
"input_data_ground_truth [[1]\n",
141+
" [0]\n",
142+
" [0]\n",
143+
" [0]\n",
144+
" [1]\n",
145+
" [1]\n",
146+
" [1]\n",
147+
" [0]]\n"
148+
]
149+
}
150+
],
151+
"source": [
152+
"input_data = np.array([[x[0], x[1]] for x in training_data])\n",
153+
"input_data_ground_truth = np.array([[x[2]] for x in training_data])\n",
154+
"print(f\"input_data {input_data}\")\n",
155+
"print(f\"input_data_ground_truth {input_data_ground_truth}\")"
156+
]
157+
},
158+
{
159+
"cell_type": "code",
160+
"execution_count": 17,
161+
"metadata": {},
162+
"outputs": [
163+
{
164+
"data": {
165+
"text/plain": [
166+
"array([1, 0])"
167+
]
168+
},
169+
"execution_count": 17,
170+
"metadata": {},
171+
"output_type": "execute_result"
172+
}
173+
],
174+
"source": [
175+
"one_hot = np.array([0, 1, 0, 0, 0, 0, 0, 0])\n",
176+
"one_hot.dot(input_data)"
177+
]
178+
},
179+
{
180+
"cell_type": "code",
181+
"execution_count": 18,
182+
"metadata": {},
183+
"outputs": [
184+
{
185+
"name": "stdout",
186+
"output_type": "stream",
187+
"text": [
188+
"[0 0] [1]\n",
189+
"[1 0] [0]\n",
190+
"[1 1] [0]\n",
191+
"[2 0] [0]\n",
192+
"[3 1] [1]\n",
193+
"[3 2] [1]\n",
194+
"[2 3] [1]\n",
195+
"[3 3] [0]\n"
196+
]
197+
}
198+
],
199+
"source": [
200+
"for row, true_score in zip(input_data, input_data_ground_truth):\n",
201+
" print(row, true_score)"
202+
]
203+
},
204+
{
205+
"cell_type": "code",
206+
"execution_count": 38,
207+
"metadata": {},
208+
"outputs": [],
209+
"source": [
210+
"def sigmoid(z):\n",
211+
" return 1/(1 + np.exp(-z))"
212+
]
213+
},
214+
{
215+
"cell_type": "code",
216+
"execution_count": 42,
217+
"metadata": {},
218+
"outputs": [],
219+
"source": [
220+
"def loss_function(predicted, actual):\n",
221+
" return np.log(predicted) if actual else np.log(1 - predicted)"
39222
]
40223
},
41224
{
42225
"cell_type": "code",
43226
"execution_count": null,
44227
"metadata": {},
45228
"outputs": [],
46-
"source": []
229+
"source": [
230+
"learning_rate = 0.01"
231+
]
232+
},
233+
{
234+
"cell_type": "code",
235+
"execution_count": 50,
236+
"metadata": {},
237+
"outputs": [
238+
{
239+
"name": "stdout",
240+
"output_type": "stream",
241+
"text": [
242+
"output 0.9658545034605426 - true score: 1 - loss -0.03474207364924937\n",
243+
"output 0.986959889282255 - true score: 0 - loss -4.3397252318950565\n",
244+
"output 0.9894527613414252 - true score: 0 - loss -4.5518911918432865\n",
245+
"output 0.995086368253607 - true score: 0 - loss -5.315741947375225\n",
246+
"output 0.9985133193959704 - true score: 1 - loss -0.0014877868101581678\n",
247+
"output 0.9988002123932317 - true score: 1 - loss -0.0012005079281317262\n",
248+
"output 0.9974135571146144 - true score: 1 - loss -0.002589793507494032\n",
249+
"output 0.9990317957413032 - true score: 0 - loss -6.940067481896969\n"
250+
]
251+
}
252+
],
253+
"source": [
254+
"for row, true_score in zip(input_data, input_data_ground_truth):\n",
255+
" # print(row, true_score)\n",
256+
" hidden_layer_W = np.maximum(row.dot(W) + bias_W, 0)[0] # ReLU activation\n",
257+
" # print(f\"hidden_layer_W {hidden_layer_W}\")\n",
258+
" hidden_layer_U = np.maximum(hidden_layer_W.dot(U) + bias_U, 0)[0] # ReLU activation\n",
259+
" # print(f\"hidden_layer_U {hidden_layer_U}\")\n",
260+
" output = (sigmoid(hidden_layer_U.dot(O) + bias_O))[0][0]\n",
261+
" loss = loss_function(output, true_score[0])\n",
262+
" print(f\"output {output} - true score: {true_score[0]} - loss {loss}\")"
263+
]
47264
},
48265
{
49266
"cell_type": "markdown",
50267
"metadata": {},
51268
"source": [
52-
"## Inference"
269+
"Adding a loss function using binary cross-entropy:"
53270
]
54271
},
55272
{
@@ -66,6 +283,52 @@
66283
"## Training"
67284
]
68285
},
286+
{
287+
"cell_type": "markdown",
288+
"metadata": {},
289+
"source": [
290+
"## Backpropagation \n",
291+
"\n",
292+
"\n",
293+
"### Derivative Rules\n",
294+
"\n",
295+
"\n",
296+
"#### Constant Rule\n",
297+
"\n",
298+
"$y = k$ with $k$ a constant: $\\frac{dy}{dx}=0$\n",
299+
"\n",
300+
"\n",
301+
"#### Power Rule\n",
302+
"\n",
303+
"$y=x^n$ the derivative is: $\\frac{dy}{dx} (n -1)x^{n-1}$ \n",
304+
"\n",
305+
"\n",
306+
"#### Exponential Rule\n",
307+
"\n",
308+
"$y=e^{kx}$ the derivative is: $\\frac{dy}{dx}= k e^{kx}$\n",
309+
"\n",
310+
"\n",
311+
"#### Natural Logarithm Rule\n",
312+
"\n",
313+
"$y=ln(x)$ the derivative is: $\\frac{dy}{dx}=\\frac{1}{x}$\n",
314+
"\n",
315+
"\n",
316+
"#### Sum and Difference Rule\n",
317+
"\n",
318+
"$y = u + v$ or $y = u - v$ the derivatives are: $\\frac{dy}{dx} = \\frac{du}{dx} + \\frac{dv}{dx}$ or $\\frac{dy}{dx} = \\frac{du}{dx} - \\frac{dv}{dx}$\n",
319+
"\n",
320+
"\n",
321+
"#### Product Rule\n",
322+
"\n",
323+
"$y = u v$ the derivative is: $\\frac{dy}{dx} = \\frac{du}{dx} v + \\frac{dv}{dx} u$\n",
324+
"\n",
325+
"\n",
326+
"#### Chain Rule\n",
327+
"\n",
328+
"$y(x) = u(v(x))$ the derivative is: $\\frac{dy(x)}{dx} = \\frac{du(v(x))}{dx} \\frac{dv(x)}{dx}$\n",
329+
"\n"
330+
]
331+
},
69332
{
70333
"cell_type": "code",
71334
"execution_count": null,
@@ -82,8 +345,22 @@
82345
}
83346
],
84347
"metadata": {
348+
"kernelspec": {
349+
"display_name": "Python 3",
350+
"language": "python",
351+
"name": "python3"
352+
},
85353
"language_info": {
86-
"name": "python"
354+
"codemirror_mode": {
355+
"name": "ipython",
356+
"version": 3
357+
},
358+
"file_extension": ".py",
359+
"mimetype": "text/x-python",
360+
"name": "python",
361+
"nbconvert_exporter": "python",
362+
"pygments_lexer": "ipython3",
363+
"version": "3.12.7"
87364
}
88365
},
89366
"nbformat": 4,

0 commit comments

Comments
 (0)