Use CNN (convolutional neural nets) to detect facial key points Tutorial (V): Training Special network through pre-training (Pre-train)

Last Update:2015-06-14 Source: Internet

Author: User

Tags nets vars

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The ninth part of training special network

Remember the 70% training data we lost at the beginning? If we want to get a competitive score on the Kaggle leaderboard, that's not a good idea. In 70% of the data, there are quite a few features we don't see.

So before we change the way we train only one model, we train several ad hoc networks, each of which can predict different sets of goals. We train a model to predict Left_eye_center and right_eye_center, and another model predicts Nose_tip ... In the end, we have 6 models that allow us to take full advantage of the training data and hopefully get better predictions.

All 6 dedicated networks use the same network structure. Because the training time has become super long, so we can think of some strategies so that we can not wait until the end of Max_epochs, and even let the validation error early stop improving. This strategy is called early stopping, and we will write another on_epoch_finished callback function to implement this. Here is the implementation:

 class earlystopping(object):     def __init__(self, patience=):Self.patience = Patience Self.best_valid = Np.inf Self.best_valid_epoch =0Self.best_weights =None     def __call__(self, nn, train_history):Current_valid = train_history[-1][' Valid_loss '] Current_epoch = train_history[-1][' Epoch ']ifCurrent_valid < Self.best_valid:self.best_valid = Current_valid Self.best_valid_epoch = Current_ Epoch self.best_weights = Nn.get_all_params_values ()elifSelf.best_valid_epoch + self.patience < Current_epoch:print ("Early stopping.") Print ("Best valid loss is {:. 6f} at Epoch {}.". Format (Self.best_valid, Self.best_valid_epoch)) Nn.load_params_from (self.best_weights)RaiseStopiteration ()

As you can see , there are two branches in the call function: The first is now the validation error is better than what we saw earlier, the second is the best validation error where the number of iterations and the current number of iterations is more than our patience. In the first branch, we save the weight of the network:

self.best_weights = nn.get_all_params_values()

In the second branch, we set the weight of the network to the value stored in the optimal validation error, and then issue a stopiteration that tells Neuralnet we want to stop training.

          nn.load_params_from(self.best_weights)          raise StopIteration()

Let's update the op_epoch_finished handle in the network definition and add the early stopping:

net8 = NeuralNet(    # ...    on_epoch_finished=[        AdjustVariable(‘update_learning_rate‘, start=0.03, stop=0.0001),        AdjustVariable(‘update_momentum‘, start=0.9, stop=0.999),        EarlyStopping(patience=200),        ],    # ...    )

So far everything is going well, but how do you define these special networks to make predictions accordingly? Let's Make a list:

Specialist_settings = [Dict (columns= (' left_eye_center_x ',' left_eye_center_y ',' right_eye_center_x ',' right_eye_center_y ',), Flip_indices= ((0,2), (1,3),), Dict (columns= (' nose_tip_x ',' nose_tip_y ',), flip_indices= (),), Dict (columns= (' mouth_left_corner_x ',' mouth_left_corner_y ',' mouth_right_corner_x ',' mouth_right_corner_y ',' mouth_center_top_lip_x ',' mouth_center_top_lip_y ',), Flip_indices= ((0,2), (1,3),), Dict (columns= (' mouth_center_bottom_lip_x ',' mouth_center_bottom_lip_y ',), flip_indices= (),), Dict (columns= (' left_eye_inner_corner_x ',' left_eye_inner_corner_y ',' right_eye_inner_corner_x ',' right_eye_inner_corner_y ',' left_eye_outer_corner_x ',' left_eye_outer_corner_y ',' right_eye_outer_corner_x ',' right_eye_outer_corner_y ',), Flip_indices= ((0,2), (1,3), (4,6), (5,7),), Dict (columns= (' left_eyebrow_inner_end_x ',' left_eyebrow_inner_end_y ',' right_eyebrow_inner_end_x ',' right_eyebrow_inner_end_y ',' left_eyebrow_outer_end_x ',' left_eyebrow_outer_end_y ',' right_eyebrow_outer_end_x ',' right_eyebrow_outer_end_y ',), Flip_indices= ((0,2), (1,3), (4,6), (5,7)),        ),    ]

We discussed the importance of flip_indices in data expansion earlier. In the Data Introduction section, our Load_data () function also takes an optional parameter to extract some columns. We will use these features in Fit_specialists (), which predicts results with special networks:

 fromCollectionsImportOrdereddict fromSklearn.baseImportClone def fit_specialists():Specialists = Ordereddict () forSettinginchSpecialist_settings:cols = setting[' Columns '] X, y = load2d (cols=cols) model = Clone (net) Model.output_num_units = y.shape[1] Model.batch_iterator_train.flip_indices = setting[' flip_indices ']# Set Number of epochs relative to number of training examples:Model.max_epochs = Int (1e7/y.shape[0])if ' Kwargs ' inchSetting# A option ' Kwargs ' In the Settings list is used to            # set any other parameter of the net:VARs (model). Update (setting[' Kwargs ']) Print ("Training model for columns {} for {} epochs". Format (cols, Model.max_epochs)) Model.fit (X, y) specialists[cols] = model withOpen' Net-specialists.pickle ',' WB ') asF:# We persist a dictionary with all models:Pickle.dump (specialists, F,-1)

There is nothing to be surprised about, but to train a series of models and coexist into a dictionary. Although there are early stopping, it will take half a day to train on a single GPU, and I do not recommend that you run this.

Running on multiple GPUs is certainly fast, but it's still too extravagant. The next section describes a way to reduce training time, where we take a look at the results of these models that cost a lot of resources.

The learning rate of 6 models, the solid line represents the Rmse (root mean square error) on the validation set, and the dash is the training set error. The mean represents the average validation error for all models multiplied by the weight (number of targets owned by the model). All curves are scaled to the same scale on the x-axis.

Part one supervised pre-training

The last part of the tutorial discusses a new way to get the ad hoc network trained faster. The idea is to initialize the network weights by replacing the values with NET6 or NET7-trained weights. If you remember the implementation of early stopping, it is very simple to copy weights from one network to another, just use the Load_params_form () method. Here we change the Fit_specialists method to achieve the above function. It still adds #!. Row is the newly added row:

 def fit_specialists(fname_pretrain=none):    ifFname_pretrain:# !         withOpen (Fname_pretrain,' RB ') asF:# !Net_pretrain = Pickle.load (f)# !    Else:# !Net_pretrain =None  # !Specialists = Ordereddict () forSettinginchSpecialist_settings:cols = setting[' Columns '] X, y = load2d (cols=cols) model = Clone (net) Model.output_num_units = y.shape[1] Model.batch_iterator_train.flip_indices = setting[' flip_indices '] Model.max_epochs = Int (4e6/y.shape[0])if ' Kwargs ' inchSetting# A option ' Kwargs ' In the Settings list is used to            # set any other parameter of the net:VARs (model). Update (setting[' Kwargs '])ifNet_pretrain is  not None:# !            # If a pretrain model is given, use it to initialize the            # Weights of our new specialist model:Model.load_params_from (Net_pretrain)# !Print"Training model for columns {} for {} epochs". Format (cols, Model.max_epochs)) Model.fit (X, y) specialists[cols] = model withOpen' Net-specialists.pickle ',' WB ') asF:# This time we ' re persisting a dictionary with all models:Pickle.dump (specialists, F,-1)

It turns out that there are two practical benefits to replacing random initialization with weights for a well-trained network: One is faster to train, and about four times times faster in this case, and the second advantage is that the generalization ability of the network is stronger and the former training has the effect of regularization. Or just the same as the learning curve, showing the use of pre-training network:

Ultimately, this solution is 2.13 RMSE on the leaderboard.

Part 11 conclusions

Now maybe you have a dozen ideas to try and you can find the source code of the tutorial final program and start your attempt. The code also includes generating the commit file, running Python kfkd.py to find out how the command is exercised with this script.

There's a whole bunch of obvious improvements you can make: try to optimize each ad hoc network, and observe 6 networks, and you can see that models have different degrees of overfitting. If the model has almost no overfitting like a green or yellow curve, you can try to reduce the number of dropout and increase the number of dropout if the fit is too strong.

In the definition of specialist_settings, we are able to add settings for a particular network. If we want to add more regularization to the second network, we can change it as follows:

    dict(        columns=(            ‘nose_tip_x‘‘nose_tip_y‘,            ),        flip_indices=(),        kwargs=dict(dropout2_p=0.3, dropout3_p=0.4),  # !        ),

There are all sorts of places you can try to improve, maybe you can add a convolutional layer or an all-connected layer? Look forward to your good news.

Use CNN (convolutional neural nets) to detect facial key points Tutorial (V): Training Special network through pre-training (Pre-train)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More