A toy neural network for binary classification in dc
.
dc
is one of the oldest UNIX programs that is still around.
It is a reverse polish notation (RPN) calculator (Wikipedia).
The version used here is GNU dc
, which comes preinstalled with all major GNU systems.
This program is purely as a proof of concept and has no real value other than maybe some educational value.
One day I thought to myself if it would be possible to run a neural network in a RPN calculator and here it is.
dc
works with multiple registers, stacks, macros, and even some kind of arrays. The input data is stored into arrays,
while all calculation is done through macros stored in various registers.
Read more in the manual.
sn
: store the top-of-stack in registern
ln
: copy value in registern
to top-of-stackz
: put the depth of the stack on the stackx
: execute macro on top-of-stack:r
: top-of-stack is index to arrayr
while next-on-stack is saved at that position;r
: retrive contenten from arrayr
with index from top-of-stack- Conditionals
- Arithmetic
#
everything that follows is interpreted as a comment
The neural network is used for binary classifcation and has two input neurons, one hidden layer with two neurons and one output neuron.
The weigths and biases are initiated as random values. dc
has no random function so these were generated using python and then copied to the program code.
Victor Zhou's blog post was used as the main guide. The structure of the network is the same, as well as the used input data.
The network uses backpropagation with gradient descent for learning. The (partial) derivatives of the neurons are all hardcoded.
The activation function is not the sigmoid, as in the blog post, because the exponential function doesn't exist and exponentiation can only be done with integer exponents. Instead an algebraic function is used, which is very similar to the sigmoid function:
This function returns values between 0 and 1, which is perfect for binary classification.
MSE loss is used and reported after each training step.
dc nn.dc
dc
doesn's require any whitespace (besides between different numbers to be put on the stack). That's why almost all whitespace (and comments) can be omitted and the program can be executed as a (loooong) one-liner:
20k[q]sQ0.1sp1000sE[d2^1+v2*/0.5+]sA[2^1+3^v2*1r/]sa[dd;yr;Y-2^LM+sM1-d0!=m]sm[0sMlnlmxlMln/dsM]sl0.8761123145115244 1:w0.1113624055358604 2:w_2.4816227332892038 3:w 0.1630997065297297 4:w0.8309874697929129 5:w_0.4641056543425773 6:w_0.1252385391632506 1:b0.3514078803331424 2:b0.0999569028091161 3:b[lXx2;w*r1;w*+1;b+d1:1lAxd2:1lXx4;w*r3;w*+2;b+d1:2lAxd2:26;w*r5;w*+3;b+d1:olAxd2:o]sf[[_2 _1]1][[ 25 6]0][[ 17 4]0][[_15 _6]1]zsn[zszxlz:ylz:xlZx]s_[z0<_]sZlZx[li;xsXlfxdli:Yli;yr-_2*0:L1;olax2;1*1:L1;olax2;2*2:L1;olax3:L1;olax5;w*4:L1;olax6;w*5:LlXx1;1lax*6:L1;1lax*7:L1;1lax8:LlXx1;2lax*9:L1;2lax*10:L1;2lax11:L1;wlp0;L4;L7;L***-1:w2;wlp0;L4;L6;L***-2:w1;blp0;L4;L8;L***-1:b3;wlp0;L5;L10;L***-3:w4;wlp0;L5;L9;L***-4:w2;blp0;L5;L11;L***-2:b5;wlp0;L1;L**-5:w6;wlp0;L2;L**-6:w3;blp0;L3;L**-3:bli1-sili0<T]sT[lEd1-sE0=QlnsilTxllx[loss: ]nnAPDPcltx]stltx