# Backpropagation

Backpropagation is a supervised learning algorithm and is mainly used by Multi-Layer-Perceptrons to change the weights connected to the net's hidden neuron layer(s).

The backpropagation algorithm uses a computed output error to change the weight values in backward direction.

To get this net error, a forwardpropagation phase must have been done before. While propagating in forward direction, the neurons are being activated using the sigmoid activation function.

The formula of sigmoid activation is:
```               1
f(x) = ---------
1 + e-input
```

The algorithm works as follows:

1. Perform the forwardpropagation phase for an input pattern and calculate the output error
2. Change all weight values of each weight matrix using the formula weight(old) + learning rate * output error * output(neurons i) * output(neurons i+1) * ( 1 - output(neurons i+1) )
3. Go to step 1
4. The algorithm ends, if all output patterns match their target patterns

Example:

Suppose you have the following 3-layered Multi-Layer-Perceptron: Backpropagation in a 3-layered Multi-Layer-Perceptron Patterns to be learned:

 input target 0 1 0 1 1 1 First, the weight values are set to random values: 0.62, 0.42, 0.55, -0.17 for weight matrix 1 and 0.35, 0.81 for weight matrix 2.

The learning rate of the net is set to 0.25.

Next, the values of the first input pattern (0 1) are set to the neurons of the input layer (the output of the input layer is the same as its input).

The neurons in the hidden layer are activated:

```Input of hidden neuron 1:       0 * 0.62 + 1 * 0.55    = 0.55
Input of hidden neuron 2:       0 * 0.42 + 1 * (-0.17) = -0.17
Output of hidden neuron 1:      1 / ( 1 + exp(-0.55) ) = 0.634135591
Output of hidden neuron 2:      1 / ( 1 + exp(+0.17) ) = 0.457602059
```

The neurons in the output layer are activated:

```Input of output neuron:         0.634135591 * 0.35 + 0.457602059 * 0.81 = 0.592605124
Output of output neuron:        1 / ( 1 + exp(-0.592605124) ) = 0.643962658
Compute an error value by
subtracting output from target: 0 - 0.643962658 = -0.643962658
```

Now that we got the output error, let's do the backpropagation.

We start with changing the weights in weight matrix 2:

```Value for changing weight 1:    0.25 * (-0.643962658) * 0.634135591
* 0.643962658 * (1-0.643962658) = -0.023406638
Value for changing weight 2:    0.25 * (-0.643962658) * 0.457602059
* 0.643962658 * (1-0.643962658) = -0.016890593
Change weight 1:                0.35 + (-0.023406638) = 0.326593362
Change weight 2:                0.81 + (-0.016890593) = 0.793109407
```

Now we will change the weights in weight matrix 1:

```Value for changing weight 1:    0.25 * (-0.643962658) * 0
* 0.634135591 * (1-0.634135591) = 0
Value for changing weight 2:    0.25 * (-0.643962658) * 0
* 0.457602059 * (1-0.457602059) = 0
Value for changing weight 3:    0.25 * (-0.643962658) * 1
* 0.634135591 * (1-0.634135591) = -0.037351064
Value for changing weight 4:    0.25 * (-0.643962658) * 1
* 0.457602059 * (1-0.457602059) = -0.039958271
Change weight 1:                0.62 + 0 = 0.62         (not changed)
Change weight 2:                0.42 + 0 = 0.42         (not changed)
Change weight 3:                0.55 + (-0.037351064) = 0.512648936
Change weight 4:                -0.17+ (-0.039958271) = -0.209958271
```

The first input pattern had been propagated through the net.

The same procedure is used for the next input pattern, but then with the changed weight values.

After the forward and backward propagation of the second pattern, one learning step is complete and the net error can be calculated by adding up the squared output errors of each pattern.

By performing this procedure repeatedly, this error value gets smaller and smaller.

The algorithm is successfully finished, if the net error is zero (perfect) or approximately zero.

Note that this algorithm is also applicable for Multi-Layer-Perceptrons with more than one hidden layer. ### "What happens, if all values of an input pattern are zero?" If all values of an input pattern are zero, the weights in weight matrix 1 would never be changed for this pattern and the net could not learn it. Due to that fact, a "pseudo input" is created, called Bias that has a constant output value of 1.

This changes the structure of the net in the following way: Backpropagation in a 3-layered Multi-Layer-Perceptron using Bias values

These additional weights, leading to the neurons of the hidden layer and the output layer, have initial random values and are changed in the same way as the other weights. By sending a constant output of 1 to following neurons, it is guaranteed that the input values of those neurons are always differing from zero. 