Problem:
When using the TensorFlow training network, it was found that each time a batch training, its loss is Nan, resulting in a accuracy of 0.
Nan is an infinity or a non-numeric value, and is typically infinite when a number is divided by 0 o'clock or log (0), so you have to think about whether you are calculating the loss function, your network output is 0, and the log is computed, resulting in Nan.
There are many reasons on the Internet, such as the study rate is too large, batch too large, or its own data is very dirty and so on, I try to reduce the learning rate, from 0.001 in turn, reduced by 10 times times, are reduced to 1000 times times, or will appear nan.
I reduced the batch_size again, the result is still not solved.
Then I refer to this blog http://blog.sina.com.cn/s/blog_6ca0f5eb0102wr4j.html#cmt_5A0D972D-72F73880-BE365276-926-938
It may have been because of log (0), so I added clip to the log where I encountered the loss function.
such as Tf.log (Tf.clip_by_value (Tf.sigmoid (self.scores), 1e-8,1.0)
After the addition, continue training, found that the loss of loss is not Nan.