Recurrent back-propagation networks take values from hidden layer and/or output layer units and copy them down to the input layer for use with the next input. These values that are copied down are a kind of coded record of what the recent inputs to the network have been and this gives a network a simple kind of short-term memory, possibly a little like human short-term memory. For instance, suppose you want a network to memorize the two short sequences, "acb" and "bcd". In the middle of both of these sequences is the letter, `c'. In the first case you want a network to take in `a' and output `c', then take in `c' and output `b'. In the second case you want a network to take in `b' and output `c', then take in `c' and output `d'. To do this a network needs a simple memory of what came before the `c'.
Let the network be an 7-3-4 network where input units 1-4 and output units 1-4 stand for the letters a-d and the `h' stands for the value of a hidden layer unit. So the codes are:
a: 1000 b: 0100 c: 0010 d: 0001In action, the networks need to do the following. When `a' is input, `c' must be output:
0010 <- output layer hhh <- hidden layer 1000 stm <- input layerIn this context, when `c' is input, `b' should be output:
0100 hhh 0010 stmFor the other string, when `b' is input, `c' is output:
0010 hhh 0100 stmand when `c' in input, `d' is output:
0001 hhh 0010 stmThis is easy to do if the network keeps a short-term memory of what its most recent inputs have been. Suppose we input a and the output is c:
0010 <- output layer hhh <- hidden layer 1000 stm <- input layerPlacing `a' on the input layer generates some kind of code (like a hash code) on the 3 units in the hidden layer. On the other hand, placing `b' on the input units will generate a different code on the hidden units. All we need to do is save these hidden unit codes and input them with a `c'. In one case the network will output `b' and in the other case it will output `d'. In one particular run inputting `a' produced:
0 0 1 0 0.993 0.973 0.020 1 0 0 0 0 0 0When `c' is input the hidden layer units are copied down to input to give:
0 1 0 0 0.006 0.999 0.461 0 0 1 0 0.993 0.973 0.020For the other pattern, inputting `b' gave:
0 0 1 0 0.986 0.870 0.020 0 1 0 0 0 0 0Then the input of `c' gave:
0 0 0 1 0.005 0.999 0.264 0 0 1 0 0.986 0.870 0.020This particular problem can be set up as follows:
m 7 3 4 s 7 ci t 0.2 f ic * compressed input format rt rexample.datwhere the data file rexample.dat is:
1000 H 0010 0010 H 0100 0100 H 0010 0010 H 0001The first four values on each line are the normal input. The H codes for however many hidden layer units there are. The last four values are the desired outputs.
By the way, this simple problem does not converge particularly fast and you may need to do a number of runs before you hit on initial values that will work quickly. It will work more reliably with more hidden units.
Rather than using recurrent networks to memorize sequences of letters they are probably more useful at predicting the value of some variable at time t+1 given its value at t, t-1, t-2, ... . A very simple of this is to give the value of sin(t+1) given a recent history of inputs to the net. Given a value of sin(t) the curve may be going up or down and the net needs to keep track of this in order to correctly predict the next value. The following setup will do this:
m 1+5 5 1 f ir a aol dd uq qp e 0.02 ci rt rsin.datand the data file rsin.bp is:
0.00000 H 0.15636 0.15636 H 0.30887 0.30887 H 0.45378 . . . -0.15950 H -0.00319 -0.00319 H 0.15321and in fact it converges rather rapidly. The complete set of data can be found in the example file rsin.bp.
Another recurrent network included in the examples is one designed to memorize two lines of poetry. The two lines were:
I the heir of all the ages in the foremost files of time
For I doubt not through all the ages ones increasing purpose runs
but for the sake of making the problem simpler each word was shortened to 5 characters giving:
i the heir of all the ages in the frmst files of
time for i doubt not thru the ages one incre purpo runs
The letters were coded by taking the last 5 bits of their ASCII codes. See the file poetry.bp.