This article introduces a very simple threshold rnn (gated recurrent neural network),
Here are two doors horizontal/forget gate and Vertical/input Gate, i.e.
which (Logistic sigmoid function)
The following assumes that the input data XT meet the following properties,
If the hidden layer node is initialized to 0, that is, the network response to the Pulse XT is,
With attenuation to 0, the forget gate controls the attenuation speed, so when the hidden-layer node HT (i) encounters a strong signal,HT (i) is activated and then attenuated to 0until the next time it is activated again.
Zero input comparison
The model of this paper has only one attractor, zerostate, but other models, i.e., vanilla RNN, the LSTM and the GRU have the c18> chaotic dynamical behavior.
Then the article would like to show that this non-chaotic RNN in Word level language modeling task can also achieve a good effect, indirect explanation of chaos Nature does not explain the success of these models on tasks.
CHAOS in recurrent neural NETWORKS
Consider the following discrete dynamical system, where the vector U belongs to the Rd
The trajectory that is formed will enter the attractor (invariant set) of the system, which is usually a fractal .
All RNN can be written in the following form
Assuming there is no input, the RNN can induce the corresponding dynamical system
Thus, the ability to produce complex trajectories is depicted.
How can the behavior of the power system above be seen? Actually can exist, because the parameter Wj is obtained through the study, when encounters a not important data point xt0, and the hidden layer node has the very weak coupling, namely the data influence is not small, i.e. Wjxt0 ≈0 , the behavior of these dynamical systems will occur over the next period of time until a very important signal is encountered.
Chaotic BEHAVIOR of LSTM and GRU in the absence of INPUT DATA
Consider the following LSTM -induced dynamical system,
Where the parameters are specific,
Then initialize the hidden layer node,
Figure 1 shows the specific dynamic system, the attractor is essentially a 4 -dimensional dynamic system on the 2 -dimensional projection.
The chaotic dynamical system has the initial value sensitivity, given an initial point, the author in [1e-7, 1e7] in the range of disturbance, run steps, a total of 100,000 disturbances. The result is that the No. 200 step is almost filled with the entire attractor .
Above are examples of construction, the following is the author in Penn Treebank Corpus without dropout training good LSTM, the results also appear chaotic phenomenon. When there is an initial entry, it is no longer an autonomous power system, fully received input signal control.
Chaos-free BEHAVIOR of the CFN
Experimental results: the signal attenuation of the hidden layer nodes in the upper layers is slow
A recurrent neural NETWORK without CHAOS