Deep Learning Assistant

Last Update:2015-08-28 Source: Internet

Author: User

Tags theano

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Tidy up how to adjust the parameters, the beginning is ready to translate, and later added some of their own understanding, there are a few parts are not very sure not written out, if there is a problem can see the original, at the end there is written. Level is limited, if there are errors please indicate.

Get Data: Make sure there are enough high-quality input-output datasets and be representative.

Sometimes there is not such a large data set, such as character recognition in the Mnist database training, it is easy to reach 98% recognition rate, but the real use in the actual will find that the effect is not so good, the database of the picture to the hard disk to see, will find almost all the fonts in the image center, the size of almost the same, And there is no large rotation, so the representation of the picture here is not strong enough, it is necessary to expand the data set through some methods. Of course, some of the data may have a bad impact on the training of the network, if there is a way to remove the data should also be more useful.

pretreatment: it is very effective to center the data so that it has a mean value of 0 and the variance of each dimension is 1. In some cases the number of input data is very different, we'd better take log (1+x) to the data.

An example of a mnist dataset on Theano is a floating-point number that averages the input x to 0-1, so there's a bit of an objection here, but this is a really necessary step. Remember that in human face recognition, in order to prevent the effect of light on the use of log (1+x) processing, it should be to reduce the impact of high brightness, but in the training convolutional neural network, this step does not have a good effect, because the network in the case we do not know that may have done the same function of processing, However, trying to incorporate some of the features you have defined can sometimes have a better effect.

Minibatches's Choice: (although in practice, the Minibatches is selected as 1 will achieve better results, and can achieve a lower overfitting, but in order to achieve faster results, we have to make a trade-off, so minibatches generally choose a larger value.) For example, the Mnist database of the routine minibatches selected is 128, for larger datasets can increase the size of minibatches, or according to the situation of video memory increase minibatches. )

Normalization of gradients: in the course of training does not have to modify the learning rate, you can modify the size of the minibatches to modify the gradient changes. (here the title translates and the sense of the body not uniform)

Modify the learning rate: start using a standardized learning rate, and then slowly decrease.

0.1 is a typical learning rate, which can be used for most of the said situations, but sometimes the values are reduced depending on the situation.

Using the validation set for cross-validation, you can take a part of the training set out of training to verify the effectiveness of the training, modify the learning rate according to the effect, or terminate the training process. If the training results no longer have a better effect on the validation set, you can try to reduce the learning rate to 1/2 or 1/5. Another method is to update the value of the change and weight ratio, if the 10^-3 around is better, if too small, the learning speed will be relatively slow, too big words will be unstable.

Initialization weights: at the beginning of the random initialization weights

The initialization method mentioned here will not be particularly clear or not written. However, it is said that for shallow network simpler initialization method, the network can also work normally, but for deep networks, if there is no good initialization, the network will probably not work, so if your network model does not work to try to initialize the weights and not anxious to modify the network. The weights used in the Theano routines are initialized with ±sqrt (6/(fan_in+fan_out)), fan_in is the number of cells on the previous level fan_out is the number of cells in the next layer.

Gradient value test: If you're not using Theano or torch, you need to test for gradients, remember Andrew Ng's machine learning course, which I used Theano, which I didn't pay attention to in programming.

To expand the data set: for the image can be added rotation and flip, for speech processing, can add random noise and other methods, the effect is very obvious.

using dropout: The dropout layer is good for preventing overfitting, but one obvious drawback is that it slows down the training speed of the network. Forget to turn off dropout and to multiply the weights by (namely by 1-dropout probability) at test time. In the test, remember to remove the dropout layer, but I have tried to remove the feeling that there is no good effect, may not be correct operation.

Integration: train multiple classifiers, and then go on average for their results. Here is a little unsure, not used, and so on after understanding the completion.

It suddenly occurred to me that there was no mention of regularization here, so let's write it here today and add it later.

Reference: http://yyue.blogspot.com/2015/01/a-brief-overview-of-deep-learning.html Normal should not open, not fully translated, wrote some of their own understanding.

Deep Learning Assistant

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More