TensorFlow Neural Network Optimization Strategy Learning, tensorflow Network Optimization
During the optimization of the neural network model, we will encounter many problems, such as how to set the learning rate. We can quickly approach the optimal solution in the early stage of training through exponential attenuation, after training, the system enters the optimal region stably. For the over-fitting problem, the regularization method is used to deal with the problem. The moving average model can make the final model more robust in unknown data.
1. Set the learning rate
The learning rate cannot be too large or too small. TensorFlow provides a more flexible learning rate setting method-the exponential attenuation method. This method achieves the exponential attenuation learning rate. First, a large learning rate is used to quickly obtain a better solution, and then the learning rate is gradually reduced as the iteration continues, making the model more stable in the later stages of training, it is slow and smooth to reach the optimal value.
Tf. train. exponential_decay(Learning_rate, global_step, decay_steps, decay_rate, staircase = False, name = None)
This function reduces the learning rate exponentially and achieves the learning rate after the attenuation of each round of actual optimization. The learning rate is decayed_learning_rate = learning_rate * decay_rate ^ (global_step/decay_steps), and learning_rate is the set learning rate, decay_rate is the attenuation coefficient, and decay_steps is the attenuation speed. For example, when staircase is set to False, the learning rate changes to the light-colored part; When staircase is set to True, the dark part changes to the staircase function ), the common application scenario of this setting is that the learning rate is reduced every time the training data is complete.
Example: learning_rate = tf. train. exponential_decay (starter_learning_rate, global_step, 100000, 0.96, staircase = True ).
Ii. Over-fitting
1. Over-fitting Problem and Its Solution
The so-called over-fitting problem refers to the fact that when a model is too complex, it can well remember the random noise in each training data and forget the general trend of learning the training data.
To avoid the issue of over-fitting, the common method is Regularization. The idea is to add an indicator describing the complexity of the model to the loss function and define the optimization goalJ (θ) + λ R (w)Where R (w) depicts the complexity of the model, including weight w, excluding offset B, and λ, indicating the ratio of the model's complexity loss to the total loss. Generally, the complexity of the model is determined by the weight of w. There are two common functions R (w) describing model complexity. One is L1 regularization:
The other is L2 regularization:
Regardless of the normalization method, the basic idea is to limit the weight so that the model cannot fit the random noise in the training data. Difference: L1 regularization will make the parameters more sparse, while L2 will not. The so-called parameter becoming more sparse means that more parameters will change to 0, which can achieve similar feature selection functions. In practice, L1 regularization and L2 regularization can also be used simultaneously:
2. TensorFlow solution for over-fitting Problems
Loss = tf. performance_mean (tf. square (y _-y) + tf. contrib. layers. l2_regularizer (lambda) (w)
The above is a loss function that includes L2 regularization items. The first part is the mean square error loss function, and the second part is the regularization item. The lambda parameter indicates the weight of the regularization item, that is, the λ and w in J (θ) + λ R (w) are the parameters for calculating the regularization loss.Tf. contrib. layers. l2_regularize ()The function can calculate the L2 regularization items of a given parameter. Similarly,Tf. contrib. layers. lw.regularizer ()It can be the L1 regularization item of the given parameter.
# Compare the effects of L1 regularization and L2 regularization functions w = tf. constant ([[1.0,-2.0], [-3.0, 4.0]) with tf. session () as sess: #0.5 * (| 1 | + |-2 | + |-3 | + | 4 | = 5.0) print (sess. run (tf. contrib. layers. lw.regularizer (0.5) (w) #5.0 #0.5*[(1 + 4 + 9 + 16) /2] = 7.5 TensorFlow will divide the L2 regularization items by 2 to make the result of the derivation more concise print (sess. run (tf. contrib. layers. l2_regularizer (0.5) (w) #7.5
When the number of parameters in a neural network increases, the loss function defined above will lead to a long loss definition and poor readability, in addition, when the network structure is complex, the part defining the network structure and the part of the computing loss function may not be in the same function, it is inconvenient to calculate the loss function through the variable method. To solve this problem, you can use the collection provided in TensorFlow ). For specific implementation, see the code section.
Tf. add_to_collection () adds the variable to the specified collection; tf. get_collection () returns a list that stores the elements in this collection.
Iii. Moving Average Model
The other one makes the model more robust in the test data (robust) Moving Average Model. When using a random gradient descent algorithm to train a neural network, moving average models can improve the performance of the final model in many applications, both GradientDescent and Momentum training can benefit from the ExponentialMovingAverage method.
Provided in TensorFlowTf. train. ExponentialMovingAverageIs a class to implement the moving average model. When initializing a tf. train. ExponentialMovingAverage class object, you must specify the attenuation rate decay and the num_updates parameter used to dynamically control the attenuation rate. Tf. train. exponentialMovingAverage maintains a shadow variable for each variable. The initial value of the shadow variable is the initial value of the corresponding variable. When each variable is updated, shadow_variable = decay * shadow_variable + (1-decay) * variable. As can be seen from the formula, decay determines the speed of model update. The larger the decay, the more stable the model is. In actual application, decay is generally set to a number close to 1. Num_updates is set to None by default. If set, the attenuation rate is calculated by min (decay, (1 + num_updates)/(10 + num_updates.
The apply method of the tf. train. ExponentialMovingAverage object returns an operation to update the moving average of var_list. var_list must be a Variable or Tensor of list. This operation updates the shadowvariable of var_list. The average method can obtain the value of the variable after moving average.
4. Code presentation
1. L2 Regularization Method for Structural weight of complex neural networks
Import tensorflow as tf ''' # compares effects of L1 regularization and L2 regularization functions w = tf. constant ([[1.0,-2.0], [-3.0, 4.0]) with tf. session () as sess: #0.5 * (| 1 | + |-2 | + |-3 | + | 4 | = 5.0) print (sess. run (tf. contrib. layers. lw.regularizer (0.5) (w) #5.0 #0.5*[(1 + 4 + 9 + 16) /2] = 7.5 TensorFlow will divide the L2 regularization items by 2 to make the result of the derivation more concise print (sess. run (tf. contrib. layers. l2_regularizer (0.5) (w) #7.5 ''' # L2 regularization of complex neural network structure weights # define the weights of each layer, add the L2 regularization item of the weight to the set def get_weight (shape, lambda1): var = tf. variable (tf. random_normal (shape), dtype = tf. float32) tf. add_to_collection ('losses ', tf. contrib. layers. l2_regularizer (lambda1) (var) return var x = tf. placeholder (tf. float32, (None, 2) y _ = tf. placeholder (tf. float32, (None, 1) layer_dimension = [2, 10, 5, 3, 1] # defines the number of nodes on each layer of the neural network n_layers = len (layer_dimension) current_layer = x # set the current layer to the input layer in_dimension = layer_dimension [0] # generate a 5-layer Fully Connected Neural Network Structure for I in range (1, n_layers) through loops ): out_dimension = layer_dimension [I] weight = get_weight ([in_dimension, out_dimension], 0.003) bias = tf. variable (tf. constant (0.1, shape = [out_dimension]) current_layer = tf. nn. relu (tf. matmul (current_layer, weight) + bias) in_dimension = layer_dimension [I] mse_loss = tf. performance_mean (tf. square (y _-current_layer) tf. add_to_collection ('losses ', mse_loss) loss = tf. add_n (tf. get_collection ('losses ') # loss function containing all the parameter regularization items
2. tf. train. ExponentialMovingAverage example
Import tensorflow as tf # tf. train. exponentialMovingAverage example v1 = tf. variable (0, dtype = tf. float32) step = tf. variable (0, trainable = False) # Here, the step simulates the number of rounds of Neural Network iterations # defines a moving average class object, and the initialization attenuation rate is decay = 0.99, num_updates ema = tf. train. exponentialMovingAverage (0.99, num_updates = step) # The apply method returns an operation to update the moving average of var_list, var_list must be a Variable or Tensor of list # This operation updates the shadow variable shadow Variable maintain_averages_op = ema. apply (var_list = [v1]) with tf. session () as sess: init_op = tf. global_variables_initializer () sess. run (init_op) # The average method can obtain the value of the variable after moving average print (sess. run ([v1, ema. average (v1)]) # [0.0, 0.0] sess. run (tf. assign (v1, 5) # min {0.99, (1 + step) (10 + step) = 0.1} = 0.1 # update the average sliding value of v1 to 0.1*0.0 + 0.9*5 = 4.5 sess. run (maintain_averages_op) print (sess. run ([v1, ema. average (v1)]) # [5.0, 4.5] sess. run (tf. assign (step, 10000) sess. run (tf. assign (v1, 10) # min {0.99, (1 + step) (10 + step) = 0.999} = 0.99 # update the average sliding value of v1 to 0.99*4.5 + 0.01*10 = 4.555 sess. run (maintain_averages_op) print (sess. run ([v1, ema. average (v1)]) # [10.0, 4.5549998] # update the average sliding value of v1 to 0.99*4.555 + 0.01*10 = 4.60945 sess. run (maintain_averages_op) print (sess. run ([v1, ema. average (v1)]) # [10.0, 4.6094499]
The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.