first, the initialization of
Proper weight initialization can prevent gradients from exploding and disappearing. For Relu activation functions, weights can be initialized to:
Also known as "he initialization". For Tanh activation functions, the weights are initialized to:
Also known as "Xavier initialization". You can also use the following formula to initialize:
In the above formula, L refers to the first layer of the neural network, L-1 is the upper layer.
second, the programming work
Have the following two-dimensional data:
The training network correctly classifies the red dots and the blue dots. Import the required expansion pack, where init_utils.py is downloaded here
Import NumPy as NP
import Matplotlib.pyplot as plt
import sklearn
import sklearn.datasets from
init_ Utils import sigmoid, Relu, Compute_loss, Forward_propagation, backward_propagation from
init_utils import Update_ Parameters, predict, Load_dataset, plot_decision_boundary, Predict_dec
%matplotlib inline
plt.rcparams[' Figure.figsize '] = (7.0, 4.0) # Set default size of plots
plt.rcparams[' image.interpolation '] = ' nearest '
PLT.RCP arams[' image.cmap ' = ' Gray '
# Load image dataset:blue/red dots in circles
train_x, train_y, test_x, test_y = Lo Ad_dataset ()
1. Establish a neural network model
def model (X, Y, learning_rate = 0.01, num_iterations = 15000, Print_cost = True, initialization = "he"): "" "Imple
ments a Three-layer neural network:linear->relu->linear->relu->linear->sigmoid. ARGUMENTS:X-input data, of shape (2, number of examples) Y--true "label" vector (containing 0 for red dots; 1 for blue dots), of shape (1, number of examples) learning_rate – learning rate for gradient descent Num_iterat Ions--Number of iterations to run gradient descent print_cost--if True, print the cost every + iterations I Nitialization--Flag to choose which initialization to use ("Zeros", "random" or "he") returns:parameters-- Parameters learnt by the model "" "grads = {} costs = [] # to keep track of the loss m = X.shape
[1] # number of examples layers_dims = [x.shape[0], ten, 5, 1] # Initialize parameters dictionary. If initialization = = "Zeros": Parameters = Initialize_parameters_zeros (layers_dims) elif initialization = = "Random": Parameters = Initialize_parameter S_random (layers_dims) elif initialization = = "he": Parameters = Initialize_parameters_he (layers_dims) # L OOP (gradient descent) for I in range (0, num_iterations): # Forward Propagation:linear, RELU-LIN
SIGMOID, LINEAR, EAR, RELU. A3, Cache = Forward_propagation (X, parameters) # Loss cost = Compute_loss (A3, Y) # BACKW
Ard propagation.
Grads = Backward_propagation (X, Y, Cache) # Update parameters.
Parameters = update_parameters (parameters, Grads, learning_rate) # Print The loss every-iterations If print_cost and i% = = 0:print ("Cost after iteration {}: {}". Format (i, cost)) costs . Append (Cost) # Plot the loss plt.plot (costs) Plt.ylabel (' cost ') Plt.xlabel(' iterations (per hundreds) ') Plt.title ("Learning rate =" + str (learning_rate)) plt.show () return Paramet ERs
2. Initialize the weights to 0
def initialize_parameters_zeros (layers_dims): "" "
Arguments:
layer_dims--python Array (list) containing the size of each layer.
Returns:
Parameters--Python dictionary containing your parameters "W1", "B1", ..., "WL", "BL":
W1--weight Matr IX of shape (layers_dims[1], layers_dims[0])
B1--Bias vector of shape (layers_dims[1], 1) ...
WL--Weight matrix of shape (Layers_dims[l], layers_dims[l-1])
BL--Bias vector of shape (layers_dims[l], 1)
"" "
parameters = {}
L = Len (layers_dims) # Number of layers
in the network for L in range (1, L):
PA rameters[' W ' + str (l)] = Np.zeros ((Layers_dims[l], layers_dims[l-1])
parameters[' B ' + str (l)] = Np.zeros ((layers_ Dims[l], 1))
return parameters
Training Network:
Parameters = Model (train_x, train_y, initialization = "zeros")
print ("On the Train Set:")
Predictions_train = Pre Dict (train_x, train_y, parameters)
print ("On the Test set:")
predictions_test = Predict (test_x, test_y, Parameters
Cost curve drawn after training is completed:
The training accuracy rate is 0.5, the test accuracy rate is 0.5. To output the prediction results of a test set:
Draw a classification line:
This model predicts all the test sets to be 0, and the weights are initialized to 0 so that the network does not break the balance, and each neuron learns the same thing.
3, the weight is randomly initialized to a larger number
def initialize_parameters_random (layers_dims): "" "Arguments:layer_dims--Python Array (list) containing the
Size of each layer. Returns:parameters--Python dictionary containing your parameters "W1", "B1", ..., "WL", "BL": W 1--Weight matrix of shape (layers_dims[1], layers_dims[0]) B1--Bias vector of shape (layers_dims[1
], 1) ... WL--Weight matrix of shape (Layers_dims[l], layers_dims[l-1]) BL--Bias vector of shape (layers_dim
S[l], 1) "" "Np.random.seed (3) # This seed makes sure your" random "numbers would be the as ours Parameters = {} L = Len (layers_dims) # Integer representing the number of layers for L in ran GE (1, L): parameters[' W ' + str (l)] = Np.random.randn (Layers_dims[l], layers_dims[l-1]) *10 parameters[' B ' + STR (l)] = Np.zeros ((layers_dims[l], 1)) return parameters
Train this model to get the cost curve:
The accuracy rate of training set is 0.83, and the test set accuracy is 0.86. The classification lines are as follows:
It can be seen that the cost is very large at first because the weight is initialized so that some sample output (sigmoid activation function) is very close to 0 or 1. Bad initialization can cause gradients to explode or disappear, while reducing training speed.
4. Using He initialization
def initialize_parameters_he (layers_dims): "" "
Arguments:
layer_dims--python Array (list) containing The size of each layer.
Returns:
Parameters--Python dictionary containing your parameters "W1", "B1", ..., "WL", "BL":
W1--weight Matr IX of shape (layers_dims[1], layers_dims[0])
B1--Bias vector of shape (layers_dims[1], 1) ...
WL--Weight matrix of shape (Layers_dims[l], layers_dims[l-1])
BL--Bias vector of shape (layers_dims[l], 1)
"" "
np.random.seed (3)
parameters = {}
L = Len (layers_dims)-1 # Integer representing the number of LAYERS
for l in range (1, L + 1):
parameters[' W ' + str (l)] = Np.random.randn (Layers_dims[l], layers_dims[l-1]) * NP.SQRT (2 /LAYERS_DIMS[L-1])
parameters[' B ' + str (l)] = Np.zeros ((layers_dims[l], 1))
return parameters
Cost curve:
The accuracy rate of the training set is 0.9933333, and the accuracy of the test set is 0.96. Classification line:
It can be seen that the reasonable weight initialization makes the network performance improved very well.