"Deeplearning.ai" The second course: lifting the deep neural network--weight initialization

Last Update:2018-08-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

first, the initialization of

Proper weight initialization can prevent gradients from exploding and disappearing. For Relu activation functions, weights can be initialized to:

Also known as "he initialization". For Tanh activation functions, the weights are initialized to:

Also known as "Xavier initialization". You can also use the following formula to initialize:

In the above formula, L refers to the first layer of the neural network, L-1 is the upper layer.

second, the programming work

Have the following two-dimensional data:

The training network correctly classifies the red dots and the blue dots. Import the required expansion pack, where init_utils.py is downloaded here

Import NumPy as NP
import Matplotlib.pyplot as plt
import sklearn
import sklearn.datasets from
init_ Utils import sigmoid, Relu, Compute_loss, Forward_propagation, backward_propagation from
init_utils import Update_ Parameters, predict, Load_dataset, plot_decision_boundary, Predict_dec

%matplotlib inline
plt.rcparams[' Figure.figsize '] = (7.0, 4.0) # Set default size of plots
plt.rcparams[' image.interpolation '] = ' nearest '
PLT.RCP arams[' image.cmap ' = ' Gray '

# Load image dataset:blue/red dots in circles
train_x, train_y, test_x, test_y = Lo Ad_dataset ()

1. Establish a neural network model

def model (X, Y, learning_rate = 0.01, num_iterations = 15000, Print_cost = True, initialization = "he"): "" "Imple
    
    ments a Three-layer neural network:linear->relu->linear->relu->linear->sigmoid. ARGUMENTS:X-input data, of shape (2, number of examples) Y--true "label" vector (containing 0 for red dots; 1 for blue dots), of shape (1, number of examples) learning_rate – learning rate for gradient descent Num_iterat Ions--Number of iterations to run gradient descent print_cost--if True, print the cost every + iterations I  Nitialization--Flag to choose which initialization to use ("Zeros", "random" or "he") returns:parameters-- Parameters learnt by the model "" "grads = {} costs = [] # to keep track of the loss m = X.shape
    [1] # number of examples layers_dims = [x.shape[0], ten, 5, 1] # Initialize parameters dictionary. If initialization = = "Zeros": Parameters = Initialize_parameters_zeros (layers_dims) elif initialization = = "Random": Parameters = Initialize_parameter S_random (layers_dims) elif initialization = = "he": Parameters = Initialize_parameters_he (layers_dims) # L OOP (gradient descent) for I in range (0, num_iterations): # Forward Propagation:linear, RELU-LIN
        SIGMOID, LINEAR, EAR, RELU. A3, Cache = Forward_propagation (X, parameters) # Loss cost = Compute_loss (A3, Y) # BACKW
        Ard propagation.
        Grads = Backward_propagation (X, Y, Cache) # Update parameters.
        Parameters = update_parameters (parameters, Grads, learning_rate) # Print The loss every-iterations If print_cost and i% = = 0:print ("Cost after iteration {}: {}". Format (i, cost)) costs . Append (Cost) # Plot the loss plt.plot (costs) Plt.ylabel (' cost ') Plt.xlabel(' iterations (per hundreds) ') Plt.title ("Learning rate =" + str (learning_rate)) plt.show () return Paramet ERs

2. Initialize the weights to 0

def initialize_parameters_zeros (layers_dims): "" "
    Arguments:
    layer_dims--python Array (list) containing the size of each layer.
    
    Returns:
    Parameters--Python dictionary containing your parameters "W1", "B1", ..., "WL", "BL":
                    W1--weight Matr IX of shape (layers_dims[1], layers_dims[0])
                    B1--Bias vector of shape (layers_dims[1], 1) ...
                    WL--Weight matrix of shape (Layers_dims[l], layers_dims[l-1])
                    BL--Bias vector of shape (layers_dims[l], 1)
    "" "
    
    parameters = {}
    L = Len (layers_dims)            # Number of layers
    
    in the network for L in range (1, L):
        PA rameters[' W ' + str (l)] = Np.zeros ((Layers_dims[l], layers_dims[l-1])
        parameters[' B ' + str (l)] = Np.zeros ((layers_ Dims[l], 1))
    return parameters

Training Network:

Parameters = Model (train_x, train_y, initialization = "zeros")
print ("On the Train Set:")
Predictions_train = Pre Dict (train_x, train_y, parameters)
print ("On the Test set:")
predictions_test = Predict (test_x, test_y, Parameters

Cost curve drawn after training is completed:

The training accuracy rate is 0.5, the test accuracy rate is 0.5. To output the prediction results of a test set:

Draw a classification line:

This model predicts all the test sets to be 0, and the weights are initialized to 0 so that the network does not break the balance, and each neuron learns the same thing.

3, the weight is randomly initialized to a larger number

def initialize_parameters_random (layers_dims): "" "Arguments:layer_dims--Python Array (list) containing the
    
    Size of each layer. Returns:parameters--Python dictionary containing your parameters "W1", "B1", ..., "WL", "BL": W 1--Weight matrix of shape (layers_dims[1], layers_dims[0]) B1--Bias vector of shape (layers_dims[1
                    ], 1) ... WL--Weight matrix of shape (Layers_dims[l], layers_dims[l-1]) BL--Bias vector of shape (layers_dim 
    S[l], 1) "" "Np.random.seed (3) # This seed makes sure your" random "numbers would be the as ours Parameters = {} L = Len (layers_dims) # Integer representing the number of layers for L in ran  GE (1, L): parameters[' W ' + str (l)] = Np.random.randn (Layers_dims[l], layers_dims[l-1]) *10 parameters[' B ' + STR (l)] = Np.zeros ((layers_dims[l], 1)) return parameters

Train this model to get the cost curve:

The accuracy rate of training set is 0.83, and the test set accuracy is 0.86. The classification lines are as follows:

It can be seen that the cost is very large at first because the weight is initialized so that some sample output (sigmoid activation function) is very close to 0 or 1. Bad initialization can cause gradients to explode or disappear, while reducing training speed.

4. Using He initialization

def initialize_parameters_he (layers_dims): "" "
    Arguments:
    layer_dims--python Array (list) containing The size of each layer.
    
    Returns:
    Parameters--Python dictionary containing your parameters "W1", "B1", ..., "WL", "BL":
                    W1--weight Matr IX of shape (layers_dims[1], layers_dims[0])
                    B1--Bias vector of shape (layers_dims[1], 1) ...
                    WL--Weight matrix of shape (Layers_dims[l], layers_dims[l-1])
                    BL--Bias vector of shape (layers_dims[l], 1)
    "" "
    
    np.random.seed (3)
    parameters = {}
    L = Len (layers_dims)-1 # Integer representing the number of LAYERS
  for l in range (1, L + 1):
        parameters[' W ' + str (l)] = Np.random.randn (Layers_dims[l], layers_dims[l-1]) * NP.SQRT (2 /LAYERS_DIMS[L-1])
        parameters[' B ' + str (l)] = Np.zeros ((layers_dims[l], 1))

        
    return parameters

Cost curve:

The accuracy rate of the training set is 0.9933333, and the accuracy of the test set is 0.96. Classification line:

It can be seen that the reasonable weight initialization makes the network performance improved very well.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Deeplearning.ai" The second course: lifting the deep neural network--weight initialization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Deeplearning.ai" The second course: lifting the deep neural network--weight initialization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support