GRU and lstm weights in TensorFlow initialization

Source: Internet
Author: User
initialization of GRU and lstm weights

When writing a model, sometimes you want RNN to initialize RNN's weight matrices in some particular way, such as Xaiver or orthogonal, which is just:

 
1 2 3 4 5 6 7 8 9 ten
 
cell = Lstmcell if self.args.use_lstm else Grucell with Tf.variable_scope (initializer=tf.orthogonal_initializer ()): input = Tf.nn.embedding_lookup (embedding, questions_bt) CELL_FW = Multirnncell (Cells=[cell (hidden_size) for _ in ran GE (num_layers)]) CELL_BW = Multirnncell (Cells=[cell (hidden_size) for _ in range (num_layers)]) outputs, last_states = TF.N N.bidirectional_dynamic_rnn (CELL_BW=CELL_BW, CELL_FW=CELL_FW, dtype= "float32", Inputs=input, swap_memory= True)

So write in the end is not the correct initialization of the weight, we follow BIDIRECTIONAL_DYNAMIC_RNN code to look in, first look at forward:

1 2 3 4 5 6
With Vs.variable_scope ("FW") as FW_SCOPE:OUTPUT_FW, OUTPUT_STATE_FW = Dynamic_rnn (CELL=CELL_FW, Inputs=inputs, Sequenc E_length=sequence_length, INITIAL_STATE=INITIAL_STATE_FW, Dtype=dtype, Parallel_iterations=parallel_iterations, Swap_memory=swap_memory, Time_major=time_major, Scope=fw_scope)

found that it added a variable_scope called Fw_scope, continued to look DYNAMIC_RNN found that this scope was used only in cache management, and DYNAMIC_RNN actually called the following:

1 2 3 4 5 6 7 8
(outputs, final_state) = _dynamic_rnn_loop (cell, inputs, state, parallel_iterations=parallel_iterations, swap_memory= Swap_memory, Sequence_length=sequence_length, Dtype=dtype)

In short, the call is used to go to the last call to a statement:

1
Call_cell = Lambda:cell (input_t, State)

Well, finally, we call the Grucell or Lstmcell __call__ (), and we look in, like Gru's __call__ ():

1 2 3 4 5 6 7 8 9 10 11 12 13 14-15 16
def __call__ (self, inputs, state, Scope=none): "" "Gated Recurrent unit (GRU) with nunits cells." "" With _checked_scope (sel F, scope or "Gru_cell", Reuse=self._reuse: With Vs.variable_scope ("Gates"): # Reset Gate and update gate. # We start with bias of ' 1.0 to ' not reset ' and ' not ' update. Value = sigmoid (_linear ([Inputs, State], 2 * self._num_units, True, 1.0)) r, U = Array_ops.split (Value=value, Num_or_siz E_splits= 2, axis= 1) with Vs.variable_scope ("candidate"): C = self._activation (_linear ([Inputs, R * State], Self._num_un Its, True)] new_h = u * State + (1-u) * C return new_h, New_h

Hey. There is no weight and bias. It seems that the __init__ () method does not, see this _linear (), in fact all the weights in this method (Lstmcell also), this method has the mystery:

 
1 2 3 4 5 6 7 8 9 ten
 
with Vs.variable_scope (scope) as Outer_scope:weights = Vs.get_variable (_weights_variable_name, [Total_arg_ Size, output_size], Dtype=dtype) # ... some code with Vs.variable_scope (Outer_scope) as inner_scope:inner_scope.set_ Partitioner (None) biases = vs.get_variable (_bias_variable_name, [output_size], Dtype=dtype, Initializer=init_ Ops.constant_initializer (Bias_start, Dtype=dtype))

So, in this method, you add a variable_scope and then call the Get_variable () method to get the weights and offsets. So, after we nested a number of layers in the Variable_scope variable_scope, we have defined the initialization method is not used, experiment it:

OK, after our tests, nested variable_scope if the inner layer has no initialization method, then the other layer is the same. So our conclusion is: Two variants of RNN in the TensorFlow version 1.1.0 implementation, only need to call them in the Variable_scope plus initialization method, their weights will be initialized in this way;

However, both lstm and GRU do not provide an initialization method for the bias (although it seems that the initial value can be defined). Original address: http://cairohy.github.io/2017/05/05/ml-coding-summarize/Tensorflow%E4%B8%ADGRU%E5%92%8CLSTM%E7%9A%84%E6% 9d%83%e9%87%8d%e5%88%9d%e5%a7%8b%e5%8c%96/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.