GRU and lstm weights in TensorFlow initialization

Last Update:2018-07-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

initialization of GRU and lstm weights

When writing a model, sometimes you want RNN to initialize RNN's weight matrices in some particular way, such as Xaiver or orthogonal, which is just:

1 2 3 4 5 6 7 8 9 ten

cell = Lstmcell if self.args.use_lstm else Grucell with Tf.variable_scope (initializer=tf.orthogonal_initializer ()): input = Tf.nn.embedding_lookup (embedding, questions_bt) CELL_FW = Multirnncell (Cells=[cell (hidden_size) for _ in ran GE (num_layers)]) CELL_BW = Multirnncell (Cells=[cell (hidden_size) for _ in range (num_layers)]) outputs, last_states = TF.N N.bidirectional_dynamic_rnn (CELL_BW=CELL_BW, CELL_FW=CELL_FW, dtype= "float32", Inputs=input, swap_memory= True)

So write in the end is not the correct initialization of the weight, we follow BIDIRECTIONAL_DYNAMIC_RNN code to look in, first look at forward:

1 2 3 4 5 6

With Vs.variable_scope ("FW") as FW_SCOPE:OUTPUT_FW, OUTPUT_STATE_FW = Dynamic_rnn (CELL=CELL_FW, Inputs=inputs, Sequenc E_length=sequence_length, INITIAL_STATE=INITIAL_STATE_FW, Dtype=dtype, Parallel_iterations=parallel_iterations, Swap_memory=swap_memory, Time_major=time_major, Scope=fw_scope)

found that it added a variable_scope called Fw_scope, continued to look DYNAMIC_RNN found that this scope was used only in cache management, and DYNAMIC_RNN actually called the following:

1 2 3 4 5 6 7 8

(outputs, final_state) = _dynamic_rnn_loop (cell, inputs, state, parallel_iterations=parallel_iterations, swap_memory= Swap_memory, Sequence_length=sequence_length, Dtype=dtype)

In short, the call is used to go to the last call to a statement:

1	Call_cell = Lambda:cell (input_t, State)

Well, finally, we call the Grucell or Lstmcell __call__ (), and we look in, like Gru's __call__ ():

1 2 3 4 5 6 7 8 9 10 11 12 13 14-15 16

def __call__ (self, inputs, state, Scope=none): "" "Gated Recurrent unit (GRU) with nunits cells." "" With _checked_scope (sel F, scope or "Gru_cell", Reuse=self._reuse: With Vs.variable_scope ("Gates"): # Reset Gate and update gate. # We start with bias of ' 1.0 to ' not reset ' and ' not ' update. Value = sigmoid (_linear ([Inputs, State], 2 * self._num_units, True, 1.0)) r, U = Array_ops.split (Value=value, Num_or_siz E_splits= 2, axis= 1) with Vs.variable_scope ("candidate"): C = self._activation (_linear ([Inputs, R * State], Self._num_un Its, True)] new_h = u * State + (1-u) * C return new_h, New_h

Hey. There is no weight and bias. It seems that the __init__ () method does not, see this _linear (), in fact all the weights in this method (Lstmcell also), this method has the mystery:

1 2 3 4 5 6 7 8 9 ten

with Vs.variable_scope (scope) as Outer_scope:weights = Vs.get_variable (_weights_variable_name, [Total_arg_ Size, output_size], Dtype=dtype) # ... some code with Vs.variable_scope (Outer_scope) as inner_scope:inner_scope.set_ Partitioner (None) biases = vs.get_variable (_bias_variable_name, [output_size], Dtype=dtype, Initializer=init_ Ops.constant_initializer (Bias_start, Dtype=dtype))

So, in this method, you add a variable_scope and then call the Get_variable () method to get the weights and offsets. So, after we nested a number of layers in the Variable_scope variable_scope, we have defined the initialization method is not used, experiment it:

OK, after our tests, nested variable_scope if the inner layer has no initialization method, then the other layer is the same. So our conclusion is: Two variants of RNN in the TensorFlow version 1.1.0 implementation, only need to call them in the Variable_scope plus initialization method, their weights will be initialized in this way;

However, both lstm and GRU do not provide an initialization method for the bias (although it seems that the initial value can be defined). Original address: http://cairohy.github.io/2017/05/05/ml-coding-summarize/Tensorflow%E4%B8%ADGRU%E5%92%8CLSTM%E7%9A%84%E6% 9d%83%e9%87%8d%e5%88%9d%e5%a7%8b%e5%8c%96/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

GRU and lstm weights in TensorFlow initialization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

GRU and lstm weights in TensorFlow initialization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support