Backpropagation is a supervised learning algorithm and is mainly used by Multi-Layer-Perceptrons to change the weights connected to the net's hidden neuron layer(s).

The backpropagation algorithm uses a computed output error to change the weight values in backward direction.

To get this net error, a forwardpropagation phase must have been done before. While propagating in forward direction, the neurons are being activated using the sigmoid activation function.

The formula of1 f(x) = --------- 1 + e^{-input}

The algorithm works as follows:

- Perform the forwardpropagation phase for an input pattern and calculate the output error
- Change all weight values of each weight matrix using the formula weight(old) + learning rate * output error * output(neurons i) * output(neurons i+1) * ( 1 - output(neurons i+1) )
- Go to step 1
- The algorithm ends, if all output patterns match their target patterns

Example:

Suppose you have the following 3-layered Multi-Layer-Perceptron:

Patterns to be learned:

input | target |

0 1 | 0 |

1 1 | 1 |

First, the weight values are set to random values: 0.62, 0.42, 0.55, -0.17 for weight matrix 1 and 0.35, 0.81 for weight matrix 2.

The learning rate of the net is set to 0.25.

Next, the values of the first input pattern (0 1) are set to the neurons of the input layer (the output of the input layer is the same as its input).

The neurons in the hidden layer are activated:

Input of hidden neuron 1: 0 * 0.62 + 1 * 0.55 = 0.55 Input of hidden neuron 2: 0 * 0.42 + 1 * (-0.17) = -0.17 Output of hidden neuron 1: 1 / ( 1 + exp(-0.55) ) = 0.634135591 Output of hidden neuron 2: 1 / ( 1 + exp(+0.17) ) = 0.457602059

The neurons in the output layer are activated:

Input of output neuron: 0.634135591 * 0.35 + 0.457602059 * 0.81 = 0.592605124 Output of output neuron: 1 / ( 1 + exp(-0.592605124) ) = 0.643962658 Compute an error value by subtracting output from target: 0 - 0.643962658 =-0.643962658

Now that we got the output error, let's do the backpropagation.

We start with changing the weights in weight matrix 2:

Value for changing weight 1: 0.25 * (-0.643962658) * 0.634135591 * 0.643962658 * (1-0.643962658) = -0.023406638 Value for changing weight 2: 0.25 * (-0.643962658) * 0.457602059 * 0.643962658 * (1-0.643962658) = -0.016890593 Change weight 1: 0.35 + (-0.023406638) = 0.326593362 Change weight 2: 0.81 + (-0.016890593) = 0.793109407

Now we will change the weights in weight matrix 1:

Value for changing weight 1: 0.25 * (-0.643962658) * 0 * 0.634135591 * (1-0.634135591) = 0 Value for changing weight 2: 0.25 * (-0.643962658) * 0 * 0.457602059 * (1-0.457602059) = 0 Value for changing weight 3: 0.25 * (-0.643962658) * 1 * 0.634135591 * (1-0.634135591) = -0.037351064 Value for changing weight 4: 0.25 * (-0.643962658) * 1 * 0.457602059 * (1-0.457602059) = -0.039958271 Change weight 1: 0.62 + 0 = 0.62 (not changed) Change weight 2: 0.42 + 0 = 0.42 (not changed) Change weight 3: 0.55 + (-0.037351064) = 0.512648936 Change weight 4: -0.17+ (-0.039958271) = -0.209958271

The first input pattern had been propagated through the net.

The same procedure is used for the next input pattern, but then with the changed weight values.

After the forward and backward propagation of the second pattern, one learning step is complete and the net error can be calculated by adding up the squared output errors of each pattern.

By performing this procedure repeatedly, this error value gets smaller and smaller.

The algorithm is successfully finished, if the net error is zero (perfect) or approximately zero.

Note that this algorithm is also applicable for Multi-Layer-Perceptrons with more than one hidden layer.

If all values of an input pattern are zero, the weights in weight matrix 1 would never be changed for this pattern and the net could not learn it. Due to that fact, a "pseudo input" is created, called Bias that has a constant output value of 1.

This changes the structure of the net in the following way:

These additional weights, leading to the neurons of the hidden layer and the output layer, have initial random values and are changed in the same way as the other weights. By sending a constant output of 1 to following neurons, it is guaranteed that the input values of those neurons are always differing from zero.

Sub-sections

Copyright (c) 1996-2020 Neural Networks with Java - Jochen Fröhlich. All rights reserved.