"Deeplearning.ai" The second course: lifting the deep neural network--weight initialization

Source: Internet
Author: User
first, the initialization of

Proper weight initialization can prevent gradients from exploding and disappearing. For Relu activation functions, weights can be initialized to:

Also known as "he initialization". For Tanh activation functions, the weights are initialized to:


Also known as "Xavier initialization". You can also use the following formula to initialize:


In the above formula, L refers to the first layer of the neural network, L-1 is the upper layer.


second, the programming work

Have the following two-dimensional data:


The training network correctly classifies the red dots and the blue dots. Import the required expansion pack, where init_utils.py is downloaded here

Import NumPy as NP
import Matplotlib.pyplot as plt
import sklearn
import sklearn.datasets from
init_ Utils import sigmoid, Relu, Compute_loss, Forward_propagation, backward_propagation from
init_utils import Update_ Parameters, predict, Load_dataset, plot_decision_boundary, Predict_dec

%matplotlib inline
plt.rcparams[' Figure.figsize '] = (7.0, 4.0) # Set default size of plots
plt.rcparams[' image.interpolation '] = ' nearest '
PLT.RCP arams[' image.cmap ' = ' Gray '

# Load image dataset:blue/red dots in circles
train_x, train_y, test_x, test_y = Lo Ad_dataset ()



1. Establish a neural network model

def model (X, Y, learning_rate = 0.01, num_iterations = 15000, Print_cost = True, initialization = "he"): "" "Imple
    
    ments a Three-layer neural network:linear->relu->linear->relu->linear->sigmoid. ARGUMENTS:X-input data, of shape (2, number of examples) Y--true "label" vector (containing 0 for red dots; 1 for blue dots), of shape (1, number of examples) learning_rate – learning rate for gradient descent Num_iterat Ions--Number of iterations to run gradient descent print_cost--if True, print the cost every + iterations I  Nitialization--Flag to choose which initialization to use ("Zeros", "random" or "he") returns:parameters-- Parameters learnt by the model "" "grads = {} costs = [] # to keep track of the loss m = X.shape
    [1] # number of examples layers_dims = [x.shape[0], ten, 5, 1] # Initialize parameters dictionary. If initialization = = "Zeros": Parameters = Initialize_parameters_zeros (layers_dims) elif initialization = = "Random": Parameters = Initialize_parameter S_random (layers_dims) elif initialization = = "he": Parameters = Initialize_parameters_he (layers_dims) # L OOP (gradient descent) for I in range (0, num_iterations): # Forward Propagation:linear, RELU-LIN
        SIGMOID, LINEAR, EAR, RELU. A3, Cache = Forward_propagation (X, parameters) # Loss cost = Compute_loss (A3, Y) # BACKW
        Ard propagation.
        Grads = Backward_propagation (X, Y, Cache) # Update parameters.
        Parameters = update_parameters (parameters, Grads, learning_rate) # Print The loss every-iterations If print_cost and i% = = 0:print ("Cost after iteration {}: {}". Format (i, cost)) costs . Append (Cost) # Plot the loss plt.plot (costs) Plt.ylabel (' cost ') Plt.xlabel(' iterations (per hundreds) ') Plt.title ("Learning rate =" + str (learning_rate)) plt.show () return Paramet ERs



2. Initialize the weights to 0

def initialize_parameters_zeros (layers_dims): "" "
    Arguments:
    layer_dims--python Array (list) containing the size of each layer.
    
    Returns:
    Parameters--Python dictionary containing your parameters "W1", "B1", ..., "WL", "BL":
                    W1--weight Matr IX of shape (layers_dims[1], layers_dims[0])
                    B1--Bias vector of shape (layers_dims[1], 1) ...
                    WL--Weight matrix of shape (Layers_dims[l], layers_dims[l-1])
                    BL--Bias vector of shape (layers_dims[l], 1)
    "" "
    
    parameters = {}
    L = Len (layers_dims)            # Number of layers
    
    in the network for L in range (1, L):
        PA rameters[' W ' + str (l)] = Np.zeros ((Layers_dims[l], layers_dims[l-1])
        parameters[' B ' + str (l)] = Np.zeros ((layers_ Dims[l], 1))
    return parameters




Training Network:

Parameters = Model (train_x, train_y, initialization = "zeros")
print ("On the Train Set:")
Predictions_train = Pre Dict (train_x, train_y, parameters)
print ("On the Test set:")
predictions_test = Predict (test_x, test_y, Parameters



Cost curve drawn after training is completed:


The training accuracy rate is 0.5, the test accuracy rate is 0.5. To output the prediction results of a test set:


Draw a classification line:


This model predicts all the test sets to be 0, and the weights are initialized to 0 so that the network does not break the balance, and each neuron learns the same thing.


3, the weight is randomly initialized to a larger number

def initialize_parameters_random (layers_dims): "" "Arguments:layer_dims--Python Array (list) containing the
    
    Size of each layer. Returns:parameters--Python dictionary containing your parameters "W1", "B1", ..., "WL", "BL": W 1--Weight matrix of shape (layers_dims[1], layers_dims[0]) B1--Bias vector of shape (layers_dims[1
                    ], 1) ... WL--Weight matrix of shape (Layers_dims[l], layers_dims[l-1]) BL--Bias vector of shape (layers_dim 
    S[l], 1) "" "Np.random.seed (3) # This seed makes sure your" random "numbers would be the as ours Parameters = {} L = Len (layers_dims) # Integer representing the number of layers for L in ran  GE (1, L): parameters[' W ' + str (l)] = Np.random.randn (Layers_dims[l], layers_dims[l-1]) *10 parameters[' B ' + STR (l)] = Np.zeros ((layers_dims[l], 1)) return parameters



Train this model to get the cost curve:


The accuracy rate of training set is 0.83, and the test set accuracy is 0.86. The classification lines are as follows:


It can be seen that the cost is very large at first because the weight is initialized so that some sample output (sigmoid activation function) is very close to 0 or 1. Bad initialization can cause gradients to explode or disappear, while reducing training speed.


4. Using He initialization

def initialize_parameters_he (layers_dims): "" "
    Arguments:
    layer_dims--python Array (list) containing The size of each layer.
    
    Returns:
    Parameters--Python dictionary containing your parameters "W1", "B1", ..., "WL", "BL":
                    W1--weight Matr IX of shape (layers_dims[1], layers_dims[0])
                    B1--Bias vector of shape (layers_dims[1], 1) ...
                    WL--Weight matrix of shape (Layers_dims[l], layers_dims[l-1])
                    BL--Bias vector of shape (layers_dims[l], 1)
    "" "
    
    np.random.seed (3)
    parameters = {}
    L = Len (layers_dims)-1 # Integer representing the number of LAYERS
  for l in range (1, L + 1):
        parameters[' W ' + str (l)] = Np.random.randn (Layers_dims[l], layers_dims[l-1]) * NP.SQRT (2 /LAYERS_DIMS[L-1])
        parameters[' B ' + str (l)] = Np.zeros ((layers_dims[l], 1))

        
    return parameters



Cost curve:


The accuracy rate of the training set is 0.9933333, and the accuracy of the test set is 0.96. Classification line:

It can be seen that the reasonable weight initialization makes the network performance improved very well.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.