Deep Learning Network Debugging skills

Deep Learning Network Debugging skills _01

Last Update:2018-08-03 Source: Internet

Author: User

Tags theano

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

From Alchemy Lab: https://zhuanlan.zhihu.com/p/20792837

Neural network code, more difficult to debug than the general code, and compile errors and run-time program crashes compared to the neural network is tricky place, often in the program is running normally, but the results can not converge, this check up can be more troublesome. The following is based on my usual experience of debugging neural network, summed up some of the more general debugging skills, follow-up will write an article, specifically about how to debug Theano, I hope you can debug the neural network to help. What to do if you encounter Nan.

Nan problem, I believe most people have encountered, the general may be caused by the following several reasons: except for the 0 problem. There are actually two possible, one is that the value of the divisor is infinity, that is, Nan, and the other is the divisor value is 0. A previously generated Nan or 0 is likely to be passed down, resulting in Nan at the back. Please check the neural network where there may be division, such as the Softmax layer, and then carefully examine the data. I once helped others debug the code, and even encountered, in the training data file, some values are Nan ... After reading in this way, the training begins, as long as the data of Nan is met, and the latter is Nan. You can try adding some logs to output the intermediate results of the neural network to see which step begins to appear Nan. The treatment of Theano is described later. The gradient is too large, causing the updated value to be Nan. Especially RNN, when the sequence is long, it is prone to the problem of gradient explosion. There are generally several solutions to this problem. To clip the gradient (gradient clipping), limiting the maximum gradient, is actually value = sqrt (w1^2+w2^2 ...), if value exceeds the threshold, even if a coefficient of attenuation is equal to the threshold value of value: 5,10,15. Reduce the learning rate. The initial learning rate is too large and may also cause this problem. It is important to note that even with the use of an adaptive learning rate algorithm such as Adam Training, it is also possible to encounter a high learning rate problem, and such algorithms, generally there is a learning rate of the parameters, you can change this parameter smaller. The initial parameter value is too large, and a Nan problem may occur. Input and output values, preferably also do a normalization. The specific method can refer to one of my previous articles: Deep learning personal Alchemy Experience-Alchemy Laboratory-News column
The neural network can't learn what to do.

Perhaps we did not encounter, or solve the problem of Nan, the network has been in normal training, but the cost is not down, forecast, the results are not normal. Please print out the cost value of the training set and the trend of the cost value on the test set, the normal situation is that the cost value of the training set is declining, finally tending to a flat, or a small range of shocks, the cost of the test set decreases first and then starts to oscillate or slowly rise. If the training set cost value does not drop, it is possible that the code has a bug, there may be a problem with the data (itself a problem, data processing problems, etc.), it may be the parameters (network size, number of layers, learning rate, etc.) set unreasonable.
Please manually construct 10 data, with the neural network training repeatedly to see if the cost is down, if not down, then the network code may have a bug, need to be carefully examined. If the cost value drops, make predictions on these 10 data to see if the result is not as expected. Then it is possible that the network itself is normal. Then you can try to check the parameters and data is not a problem. If the neural network code is all self-fulfilling, it is highly recommended to do a gradient check. Make sure there are no errors in the gradient calculation. Start with the simplest network, and don't just look at the cost, but also look at what the neural network's predictive output looks like and make sure it runs out the expected results. For example, to do language model experiment, first with a layer of RNN, if a layer rnn normal, and then try to lstm, and then further try multilayer lstm. If possible, you can enter a specified data, and then calculate each step of the correct output, and then check the neural network every step of the results, is not the same. References

Http://russellsstewart.com/notes/0.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More