The common tricks__ depth study deep of deep learning

Source: Internet
Author: User
Tags square root
This paper mainly gives some tips on how to implement the network or adjust the code and how to read some articles, mainly for convolution network and image processing. In personal sense, some techniques are still very effective, and can often be found by looking at some documents or source code in the open Source Library, and finally being able to call yourself. 1. The constructed validation set general dataset may not give a validation set, so it will isolate the validation set from the given training set according to a certain percentage (9:1). 2. Increase the training data in order to better train the network, sometimes need to increase the original data set, generally have the following methods [1]: Along the x-axis to flip the picture around the random shearing, scaling, rotation with PCA to change the intensity of RGB values, resulting in the corresponding eigenvalues and eigenvectors respectively, The new data is then multiplied by a random number with a mean of 0 variance of 0.1 and a characteristic value and a eigenvector [2]
3. Pretreatment is commonly used to reduce the mean value, except variance, and change to -1~1, mainly for the characteristics of different scales, such as housing prices in the case of the forecast, the size of each house and the number of bedrooms is not an order of magnitude, this situation needs to scale the characteristics of each dimension transformation normalize, The other method is to use PCA to whiten. However, in the field of image processing, it is usually possible to subtract a mean value and then directly compute it. 4. Weight initialization not all initialized to 0, this will cause most of the Deltaw are the same, generally Gaussian distribution or uniform distribution. This distribution, however, causes the variance of the output to increase with the number of input cells, so you need to divide by fan in (the square root of the number of inputs). 5. Convolution tricks picture input is a power exponent of 2, such as 32, 64, 96, 224, etc. The volume kernel size is 3*3 or 5*5. Input picture up and down need to use 0 supplement, namely padding, and if the volume kernel size is 5 then padding is 2 (the picture is up and down all add 2), the volume kernel size is 3padding size is 1.
5.pooling layer Tricks Poolin layer can also prevent the fitting, using overlapped pooling, that is, the data used to pool overlap, but the pooling size should not exceed 3. Max pooling is a better effect than AVG pooling. 6. Avoid overfitting drop out to avoid [1], generally after the full connection layer [3], but will lead to a slower convergence rate. Regularization can also avoid the fitting, L2 regular L2 is punishing the peak weight, L1 will lead to sparse weights, to the 0,L1 will tend to choose useful input. Alternatively, the upper boundary (3 or 4) can be added to the weighting vector, and the delta W is normalized when it is updated.
7. The parameter is used as the initial value of the Pretrain network parameters. Then fine-tuning, where you can keep the parameters of the front layer unchanged, and only adjust the following parameters. But finetuning to consider is the size of the picture and the correlation with the original data set, if the correlation is high, then only use the last layer of output, not related to the data more finetuning more layers. The initial value is set to 0.1 and then trained to a certain stage divided by 2, divided by 5, decreasing in descending order. adding momentum [2] will allow the network to converge faster. Increase the number of nodes, learning rate to reduce the number of layers increased, the following layer learning rate to reduce
9. The excitation function sigmoid as the excitation function is saturated and the gradient disappears, the gradient near the output value 0 and 1 is close to 0 (can be observed by the distribution curve of the sigmoid). Thus can be used rectified Linear Units (relus) as an incentive function, this will train faster, but relu more fragile, if a certain point gradient down a lot, the weight is changed a lot, then this point of the incentive may always be 0, Also has the Prelu with the parameter, produces the random value Rrelu and so on improved version. But the effect of the leaky version (0 to 0.01) is not very stable. 10. By doing a graph to observe the training of the network can be drawn with the different parameters of the training set test set of changes, to observe their trend map to analyze exactly when the parameters are more appropriate. Different learning rate and loss graph, the horizontal axis is epoch, the ordinate is loss or the correct rate different batchsize and the loss graph, coordinates ditto
11. Data set imbalance if the data set is similar to the imagenet, you can use the Finetuning method directly, if not close, first consider reconstruct the dataset of each class number, again can reduce the data volume of the larger class (minus sampling), And the class that replicates the smaller amount of data (increased sampling). The above is the implementation of convolution neural network used in the peace to see the article mentioned in some of the techniques, most of the current open source software is already implemented, direct call to use. Reference: [1] http://cs231n.stanford.edu/[2]http://lamda.nju.edu.cn/weixs/project/cnntricks/cnntricks.html [3] http:// Www.cs.toronto.edu/~fritz/absps/imagenet.pdf

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.