Rnn (Recurrent Neural Network) is a type of neural network used for analysis, prediction, and classification of time series data.
For the general introduction of rnn, see the next article deep learning from image to sequence. This article describes how bengio works (rnnrbm) based on deep learning (Basic neural network training principle, RBM structure and principle, and simple time series model.
This article focuses on the architecture and program interpretation of a rnn (Recurrent Neural Network) Application: Music Composition. For more information, see paper: Modeling temporal dependencies in High-dimen1_sequences.
-----------------------------------------------------------------
Content:
1. General architecture and ideas of rnn
2. Definition of RNN-RBM
2.1 rtrbm Structure
2.2 RNN-RBM Network Architecture
2.3 RNN-RBM Training
3. Implementation of RNN-RBM and interpretation of Program
-----------------------------------------------------------------
1. General architecture and ideas of rnn
Rnn is a NN model for processing time series data. It is designed to create time series data models for simulation, prediction, and classification.
Fig1. architecture of rnn
As shown in Figure A, rnn is the basic structure of rnn. In short, rnn is a neural network composed of input units (u), internal units (x), and output units (y. the internal units layer is connected to form a ring. Intuitively, the purpose of this operation is to make the state of the next time point of the network related to the current time point, that is, a memory network.
Expand! B, is in the T-1, t, t + 1 moment network parameter transfer (only shows the forward-propagation nodes in the mutual depression)
2. Definition of the RNN-RBM:
2.1 rtrbm Structure
First, we involved rtrbm (rnnrbm simplified version, proposed by sutsskever in. The structure of rtrbm is as follows:
In the figure, each red box contains an RBM, H is hidden states, V is visible nodes, for example, Voice representation at a certain time point (but in fact, to increase the dimension, some work will extend V (T) to the value of N frames of data before and after ), two-way arrows indicate the conditional probabilities generated by H and V, that is:
(1)
σ is the sigmoid function.
The joint probability distribution of RBM, V, and h at each time point is:
(2)
Where a (t) =, that is, the {v, h} set before all t moments.
In addition, for rtrbm, it can be understood that each time point can be influenced by the state H (t-1) of the previous time point (through W' and w ''), then a (h (T), V (t) Steady State is obtained through RBM. Since each parameter is related to the parameters at the previous time point, we can think that only the bias item is affected by the hidden, so the effect is the same, that is:
(2)
2.2 RNN-RBM Network Architecture
After seeing the structure of rtrbm, bengio Thought about it. The hidden layer in the rtrbm Structure describes the visible conditional probability distribution, only temporary information can be saved (it should mean after it reaches the steady state). Can I replace the hidden layers in RBM with rnn? So came out of the RNN-RBM:
Each red box still contains an RBM, and the green box below shows a rnn expanded by time. The advantage of this design is that the hiddenlayer is separated. Some (h) are only used to indicate the steady state of the current RBM, and the other (H ^) indicates the hidden node in rnn.
PS: network structure of rnn: V (visible), u (internal units), H (hidden)
Edge: V-u, u-V, V-H (bidirectional edge, = H-V), U-h, u-u (actually ring, however, in the time series model, unfold is u ^ t-u ^ {t + 1 })
These edges share weights at different levels (different moments of sequence.
Therefore, the preceding five weight matrix plus BV, BH, and Bu parameters must be optimized.
2.3 RNN-RBM Training
1. Calculated by H ^
2. Calculate BH and BV from (2), and obtain V (t) based on K-Step Block-based DCT sampling)
3. Evaluate and update the parameters (W, BH, BV) in RBM through the cost of nll.
4. Estimate and update the rnn parameters (W2, W3, BH ^ ).
3. Implementation of RNN-RBM and interpretation of Program
3.1 Preparation and environment Configuration
3.1.1 reference procedure see: http://deeplearning.net/tutorial/rnnrbm.html
3.1.2 download the Midi package (http://www.iro.umontreal.ca /~ Lisa/deep/midi.zip), extract to the python package directory (my files are/usr/lib/python2.7/dist-packages)
3.1.3 download the dataset (nottheim database of folk tunes) and put it in the same folder as the code/
3.2 key points of the program:
1. build_rbm: Build a single RBM for K vhv sampling
Input: five parameters: V (visible), w (RBM weight), BV (v_bias), BH (h_bias), K (Param K in CD-K)
Output:
V_sample (sample result of Cd-K for visible ),
Cost (cost <nll> in RBM, that is, Fe (input)-Fe (v_sample). Fe indicates free energy. Cost is used to evaluate the parameter ),
Monitor (cost monitor, CD-K uses refactoring cross-entropy to replace the above cost for observation)
Updates)
2. build_rnnrbm: Build RNN-RBM
Input: n_visible, n_hidden (number of hidden nodes of conditional RBM), n_hidden_recurrent (number of hidden nodes of rnn)
Default: n_visible = 88 (Piano scale), n_hidden = 150, n_hidden_recurrent = 100
Output:
V: training time series data
V_t: the sampling time series data of visible
Params: W, BV (B _vias), BH, Wuh, wuv, wvu, wuu, bu (eight parameters to be train)
V_sample, cost, monitor, updates_rbm = build_rbm (V, W, bv_t [:], bh_t [:], K = 15)
Updates_train: Param update dictionary in Training Mode
Updates_generate: Param updates dictionary in generation mode.
Nested functions:
Recurrence (v_t, u_tm1 ):
In training mode, u_t is generated based on the current v_t and u_tm1
In the generation mode, the v_t input is none, and the recurrence starts from the full zero. The corresponding u_t
Note: Line 156: (u_t, bv_t, bh_t), updates_train = theano. scan (lambda v_t, u_tm1, * _: recurrence (v_t, u_tm1), sequences = V, outputs_info = [U0, none, none], non_sequences = Params) run recurrence on every moment of sequence = V to get the corresponding bv_t and bh_t. Then they are used to estimate the V (gibsampling) of the next moment: v_sample, cost, monitor, updates_rbm = build_rbm (V, W, bv_t [:], bh_t [:], K = 15)
3. rnnrbm. Train:
Input: Files: file name, batch_size, num_epochs
Function: Training Mode. Divide each mid file in the train Directory into batch for SGD parameter update, and calculate cost.
Output: cost mean of all mid files in the train directory
4. rnnrbm. Generate:
Function: generation mode. Use the recurrence of build_rnnrbm to generate a sequence of n_step (line 166: 200 by default) of updates_generate.
(V_t, u_t), updates_generate = theano. scan (lambda u_tm1, * _: recurrence (none, u_tm1), outputs_info = [none, U0], non_sequences = Params, n_steps = 200)
5. Several n_steps:
1. line 60: chain, updates = theano. the K of scan (lambda V: gibbs_step (v) [1], outputs_info = [v], n_steps = k) indicates that the K-times of the collection of the CD-K of the box are different from those of converge.
2. line 148: v_t, _, _, updates = build_rbm (T. zeros (n_visible,), W, bv_t, bh_t, K = 25), and K = 25 indicate that the recurrence sample the request for 25 times (meaning 1)
3. line 164: (v_t, u_t), updates_generate = theano. scan (lambda u_tm1, * _: recurrence (none, u_tm1), outputs_info = [none, U0], non_sequences = Params, n_steps = 200) in the generation mode, n_step = 200 means that the generated sequence (v_t) is 200 long. Why is the final visualization of the x-axis only 60? See the role of DT in rnnrbm. Finally, we draw extent = (0, self. dt * Len (piano_roll) + self. R. Dt = 0.3 by default, Len (piano_roll) = 200. Understand?
4. line 156: (u_t, bv_t, bh_t), updates_train = theano. scan (lambda v_t, u_tm1, * _: recurrence (v_t, u_tm1), sequences = V, outputs_info = [U0, none, none], non_sequences = Params) No n_step is displayed here, however, the number of scan repetitions 0 is actually determined by the sequence length.
Architecture and program of RNN-RBM for Music Composition