The network accuracy is constant in tensorflow, and the weight initialization nan problem

Source: Internet
Author: User
Tags constant

Recently contacted Deep learning, because the project involves some mobile development problems, but also listened to some of the suggestions of friends, finally decided to choose TensorFlow as the study of deep learning platform. These two days according to the Tflearn official website of the Vggnet demo, with TensorFlow realized Vggnet, but in the training set with 17flowers training, we found that no matter how many iterations, accuracy and loss function is always maintained in a relatively constant value, That is, the network does not converge. At first, it is very confusing, after all, according to the official website of the demo, how this situation will occur. The first thing to think about is to hit the median, like this:

For I in range (£):
    batch_xs, Batch_ys = Mnist.train.next_batch (+)
    sess.run (Train_step, Feed_dict={xs: Batch_xs, Ys:batch_ys, keep_prob:0.5})

    print ("Loss:", Sess.run (Cross_entropy,feed_dict={xs:batch_xs, Ys:batch_ YS, keep_prob:0.5}))
    print (cross_entropy)
    if i% = = 0:
        print (Compute_accuracy (
            mnist.test.images, Mnist.test.labels))

Loss is the output value of the mutual entropy loss, but the result shows that its value is Nan. So traced back, and played the value of weights and bias, found the same is Nan. Then I went to google this question, found that in fact, many people have encountered this problem. On the StackOverflow, one of the explanations is this:

Actually, it turned out to be something stupid. I ' m posting this on case anyone else would run into a similar error.

Cross_entropy =-tf.reduce_sum (Y_*tf.log (Y_conv))
is actually a horrible the computing the cross-entropy. In some samples, certain classes could is excluded with certainty after a while, resulting in y_conv=0 for that sample. That's normally not a problem since you ' re not interested in those, but the the-the-the-the-cross_entropy is written there, it Yiel DS 0*log (0) for that particular sample/class. Hence the NaN.

Replacing it with

Cross_entropy =-tf.reduce_sum (Y_*tf.log (Tf.clip_by_value (y_conv,1e-10,1.0)))
Solved all my problems.

So I tried it, and really did not nan, but this friend's explanation, I indefinitely, probably means that some samples through the forward output to the outermost time when the output value becomes 0, so log (0) causes the result to be displayed as Nan. Then, there is another netizen to explain, said clipping method is not very good, because when the reverse propagation when the threshold is reached, will prevent the gradient changes. So a small constant is added directly to the log function, in order not to let the predicted value be 0.

Cross_entropy =-tf.reduce_sum (Y_*tf.log (Y_conv + 1e-10))

Although the problem has been solved, the understanding is not thorough. If a friend has a thorough understanding, it is also troublesome to point out in the comments. Thank you so much.

------------------------------------------------------------------------Split Line---------------------------------------------- ---------------------------
After discovering this problem, the instructor was also very interested, he was curious, just add such a minimum value to allow the network to continue training is really incredible. But ultimately, the reason for this is that the part of the predicted value has a mathematically insignificant value. So the tutor thought that in seeking the mutual entropy loss, it is possible to cause an arithmetic overflow because it is the exponent of E. The problem remains to be solved.


Attach the StackOverflow in the answer: Http://stackoverflow.com/questions/33712178/tensorflow-nan-bug

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.