Experienced programmers take you to the regularization technique in deep learning (Python code)!

Source: Internet
Author: User
Tags dashed line keras

Directory

1. What is regularization?

2. How does regularization reduce overfitting?

3. Various regularization techniques in deep learning:

Regularization of L2 and L1

Dropout

Data Enhancement (augmentation)

Stop early (Early stopping)

4. Case study: Case studies using Keras on Mnist datasets

1. What is regularization?

Before going into this topic, take a look at these pictures:

Have you seen this picture before? From left to right, our model learns too much detail from the noise data in the training set, resulting in poor performance of the model on unknown data.

In other words, from left to right, the complexity of the model increases so that the training error decreases, but the test error does not necessarily decrease. As shown in the following:

2. How does regularization reduce overfitting?

Let's look at a neural network that fits on the training data as shown in:

If you have ever studied regularization in machine learning, you will have a concept that regularization punishes the coefficients. In deep learning, it actually punishes the weight matrix of a node.

Suppose our regularization coefficients are so high that some weight matrices are almost 0:

This will get a simple linear network, and a slight lack of fit on the training data set.

Such a large regularization factor is not so useful. We need to optimize it to get a well-fitted model, as shown in:

Here, the lambda is a regular parameter. It is a super parameter to optimize for better results. L2 regularization is also called weight decay (weight decay) because it forces the weight to decay toward 0 (but not 0)

In L1, we have:

What does dropout do? For each iteration, randomly select nodes and delete them along with the corresponding inputs and outputs, such as:

Therefore, each iteration has a different set of nodes, which also results in different outputs. It can also be thought of as an integrated technology in machine learning (ensemble technique).

Patience indicates that there is no further performance improvement within the number of epochs to stop training. To get a better understanding, let's look at the diagram above. After the dashed line, each epoch causes a higher validation set error. Therefore, after the dashed line of 5 epochs (since we set patience equal to 5), the model will stop training because there is no further improvement.

Note: The model may start to improve again after the 5 epoch (which is generally the value set for patience), and the validation set errors begin to decrease. Therefore, be careful when adjusting this hyper-parameter.

Now, load the data.

Now take some pictures to see.

# import Keras modulesfrom keras.models import sequentialfrom keras.layers import dense# define varsinput_num_units = 784h Idden1_num_units = 500hidden2_num_units = 500hidden3_num_units = 500hidden4_num_units = 500hidden5_num_units = 500output _num_units = 10epochs = 10batch_size = 128model = Sequential ([Dense (output_dim=hidden1_num_units, input_dim=input_num_ Units, activation= ' Relu '), dense (output_dim=hidden2_num_units, input_dim=hidden1_num_units, activation= ' Relu '), Dense (output_dim=hidden3_num_units, input_dim=hidden2_num_units, activation= ' Relu '), dense (output_dim=hidden4_num _units, Input_dim=hidden3_num_units, activation= ' Relu '), dense (output_dim=hidden5_num_units, input_dim=hidden4_num _units, activation= ' Relu '), dense (output_dim=output_num_units, input_dim=hidden5_num_units, activation= ' Softmax '), ])

Note that the value of the lambda here is equal to 0.0001. Great! We obtained a better accuracy rate than the previous NN model.

Now try the L1 regularization.

# # L1model = Sequential ([Dense (output_dim=hidden1_num_units, input_dim=input_num_units, activation= ' Relu ', kernel_ REGULARIZER=REGULARIZERS.L1 (0.0001)), dense (output_dim=hidden2_num_units, input_dim=hidden1_num_units, activation = ' Relu ', Kernel_regularizer=regularizers.l1 (0.0001)), dense (output_dim=hidden3_num_units, input_dim=hidden2_num_ Units, activation= ' Relu ', Kernel_regularizer=regularizers.l1 (0.0001)), dense (output_dim=hidden4_num_units, input_ Dim=hidden3_num_units, activation= ' Relu ', Kernel_regularizer=regularizers.l1 (0.0001)), dense (output_dim=hidden5_ Num_units, Input_dim=hidden4_num_units, activation= ' Relu ', Kernel_regularizer=regularizers.l1 (0.0001)), Dense ( Output_dim=output_num_units, Input_dim=hidden5_num_units, activation= ' Softmax '),] model.compile (loss= ') Categorical_crossentropy ', optimizer= ' Adam ', metrics=[' accuracy ']) trained_model_5d = Model.fit (X_train, Y_train, nb_ Epoch=epochs, Batch_size=batch_size, Validation_dat

This time there is no improvement. Let's try dropout technology again.

# # Dropoutfrom Keras.layers.core Import Dropoutmodel = Sequential ([Dense (output_dim=hidden1_num_units, input_dim= Input_num_units, activation= ' Relu '), Dropout (0.25), dense (output_dim=hidden2_num_units, input_dim=hidden1_num_ Units, activation= ' Relu '), Dropout (0.25), dense (output_dim=hidden3_num_units, input_dim=hidden2_num_units, activation= ' Relu '), Dropout (0.25), dense (output_dim=hidden4_num_units, input_dim=hidden3_num_units, activation= ' Relu '), Dropout (0.25), dense (output_dim=hidden5_num_units, input_dim=hidden4_num_units, activation= ' Relu '), Dropout (0.25), dense (output_dim=output_num_units, input_dim=hidden5_num_units, activation= ' Softmax '),]) Model.compile ( Loss= ' categorical_crossentropy ', optimizer= ' Adam ', metrics=[' accuracy ']) trained_model_5d = Model.fit (X_train, Y_ Train, Nb_epoch=epochs, Batch_size=batch_size, Validation_data= (X_test, Y_test))

Good effect! Dropout also gives some improvements in the simple NN model.

Now, let's try the data enhancement.

From keras.preprocessing.image Import Imagedatageneratordatagen = Imagedatagenerator (zca_whitening=true) # Loading Datatrain = Pd.read_csv (Os.path.join (Data_dir, ' Train ', ' train.csv ')) temp = []for img_name in Train.filename:image_path = Os.path.join (Data_dir, ' Train ', ' Images ', ' Train ', img_name) img = Imread (image_path, flatten=true) img = Img.astype (' Float32 ') temp.append (img) x_train = np.stack (temp) X_train = X_train.reshape (X_train.shape[0], 1, approx.) X_train = X_ Train.astype (' float32 ')

# # Splittingy_train = keras.utils.np_utils.to_categorical (train.label.values) split_size = Int (x_train.shape[0]*0.7) X_train, x_test = X_train[:split_size], x_train[split_size:]y_train, y_test = y_train[:split_size], y_train[split_size :]## Reshapingx_train=np.reshape (X_train, (x_train.shape[0],-1))/255x_test=np.reshape (X_test, (x_test.shape[0],-1 ))/255## structure using dropoutfrom keras.layers.core import Dropoutmodel = Sequential ([Dense (output_dim=hidden1_num_ Units, input_dim=input_num_units, activation= ' Relu '), Dropout (0.25), dense (output_dim=hidden2_num_units, input_dim= Hidden1_num_units, activation= ' Relu '), Dropout (0.25), dense (output_dim=hidden3_num_units, input_dim=hidden2_num_ Units, activation= ' Relu '), Dropout (0.25), dense (output_dim=hidden4_num_units, input_dim=hidden3_num_units, activation= ' Relu '), Dropout (0.25), dense (output_dim=hidden5_num_units, input_dim=hidden4_num_units, activation= ' Relu '), Dropout (0.25), dense (output_dim=output_num_units, input_dim=hidden5_num_units, activation= ' SOFTmax ') model.compile (loss= ' categorical_crossentropy ', optimizer= ' Adam ', metrics=[' accuracy ']) trained_model_5d = Model.fit (X_train, Y_train, Nb_epoch=epochs, Batch_size=batch_size, Validation_data= (X_test, Y_test))

Wow! We have made a leap in the accuracy rate score. And the good news is that it works every time. We only need to select an appropriate argument based on the image in the dataset.

Now, try the last technique-stop early.

From keras.callbacks import earlystoppingmodel.compile (loss= ' categorical_crossentropy ', optimizer= ' Adam ', metrics=[ ' Accuracy ']) trained_model_5d = Model.fit (X_train, Y_train, Nb_epoch=epochs, Batch_size=batch_size, Validation_data= ( X_test, y_test), callbacks = [Earlystopping (monitor= ' Val_acc ', patience=2)])

We can see that our model is stopped after just 5 iterations, because the accuracy of the validation set is no longer increased. When we run it with a larger value of epochs, it gives good results. You can say it's a technique for optimizing the epoch value.

Conclusion

I hope now that you understand the regularization and the different techniques for regularization in the deep learning model. Whether you are dealing with any deep learning tasks, I strongly recommend that you use regularization. It will help you broaden your horizons and better understand the subject.

Welcome to my blog or the public number, we learn together: Https://home.cnblogs.com/u/Python1234/Python Learning Exchange

Welcome to join my thousand People Exchange learning questions: 125240963

Experienced programmers take you to the regularization technique in deep learning (Python code)!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.