initialization of GRU and lstm weights
When writing a model, sometimes you want RNN to initialize RNN's weight matrices in some particular way, such as Xaiver or orthogonal, which is just:
1 2 3 4 5 6 7 8 9 ten |
cell = Lstmcell if self.args.use_lstm else Grucell with Tf.variable_scope (initializer=tf.orthogonal_initializer ()): input = Tf.nn.embedding_lookup (embedding, questions_bt) CELL_FW = Multirnncell (Cells=[cell (hidden_size) for _ in ran GE (num_layers)]) CELL_BW = Multirnncell (Cells=[cell (hidden_size) for _ in range (num_layers)]) outputs, last_states = TF.N N.bidirectional_dynamic_rnn (CELL_BW=CELL_BW, CELL_FW=CELL_FW, dtype= "float32", Inputs=input, swap_memory= True) |
So write in the end is not the correct initialization of the weight, we follow BIDIRECTIONAL_DYNAMIC_RNN code to look in, first look at forward:
1 2 3 4 5 6 |
With Vs.variable_scope ("FW") as FW_SCOPE:OUTPUT_FW, OUTPUT_STATE_FW = Dynamic_rnn (CELL=CELL_FW, Inputs=inputs, Sequenc E_length=sequence_length, INITIAL_STATE=INITIAL_STATE_FW, Dtype=dtype, Parallel_iterations=parallel_iterations, Swap_memory=swap_memory, Time_major=time_major, Scope=fw_scope) |
found that it added a variable_scope called Fw_scope, continued to look DYNAMIC_RNN found that this scope was used only in cache management, and DYNAMIC_RNN actually called the following:
1 2 3 4 5 6 7 8 |
(outputs, final_state) = _dynamic_rnn_loop (cell, inputs, state, parallel_iterations=parallel_iterations, swap_memory= Swap_memory, Sequence_length=sequence_length, Dtype=dtype) |
In short, the call is used to go to the last call to a statement:
1 |
Call_cell = Lambda:cell (input_t, State) |
Well, finally, we call the Grucell or Lstmcell __call__ (), and we look in, like Gru's __call__ ():
1 2 3 4 5 6 7 8 9 10 11 12 13 14-15 16 |
def __call__ (self, inputs, state, Scope=none): "" "Gated Recurrent unit (GRU) with nunits cells." "" With _checked_scope (sel F, scope or "Gru_cell", Reuse=self._reuse: With Vs.variable_scope ("Gates"): # Reset Gate and update gate. # We start with bias of ' 1.0 to ' not reset ' and ' not ' update. Value = sigmoid (_linear ([Inputs, State], 2 * self._num_units, True, 1.0)) r, U = Array_ops.split (Value=value, Num_or_siz E_splits= 2, axis= 1) with Vs.variable_scope ("candidate"): C = self._activation (_linear ([Inputs, R * State], Self._num_un Its, True)] new_h = u * State + (1-u) * C return new_h, New_h |
Hey. There is no weight and bias. It seems that the __init__ () method does not, see this _linear (), in fact all the weights in this method (Lstmcell also), this method has the mystery:
1 2 3 4 5 6 7 8 9 ten |
with Vs.variable_scope (scope) as Outer_scope:weights = Vs.get_variable (_weights_variable_name, [Total_arg_ Size, output_size], Dtype=dtype) # ... some code with Vs.variable_scope (Outer_scope) as inner_scope:inner_scope.set_ Partitioner (None) biases = vs.get_variable (_bias_variable_name, [output_size], Dtype=dtype, Initializer=init_ Ops.constant_initializer (Bias_start, Dtype=dtype)) |
So, in this method, you add a variable_scope and then call the Get_variable () method to get the weights and offsets. So, after we nested a number of layers in the Variable_scope variable_scope, we have defined the initialization method is not used, experiment it:
OK, after our tests, nested variable_scope if the inner layer has no initialization method, then the other layer is the same. So our conclusion is: Two variants of RNN in the TensorFlow version 1.1.0 implementation, only need to call them in the Variable_scope plus initialization method, their weights will be initialized in this way;
However, both lstm and GRU do not provide an initialization method for the bias (although it seems that the initial value can be defined). Original address: http://cairohy.github.io/2017/05/05/ml-coding-summarize/Tensorflow%E4%B8%ADGRU%E5%92%8CLSTM%E7%9A%84%E6% 9d%83%e9%87%8d%e5%88%9d%e5%a7%8b%e5%8c%96/