"Turn" TensorFlow implementation and application of four cross entropy algorithms

Source: Internet
Author: User

Http://www.jianshu.com/p/75f7e60dae95

Chen Dihao Source: CSDN
Http://dataunion.org/26447.html

Introduction to cross-entropy

Crossover Entropy (cross Entropy) is a kind of loss function (also called loss function or cost function), which is used to describe the difference between the predicted value and the real value of the model, and the common loss function is the mean square difference (Mean squared Error), as defined below.


The squared difference is well understood, and the predicted value is directly subtracted from the real value, in order to avoid getting negative numbers to absolute or square, then averaging is mean square difference. Note that the predicted values here need to go through the sigmoid activation function to get the predicted values ranging from 0 to 1.
The square difference can express the difference between the predicted value and the real value, but the effect of the classification problem is not as good as the cross-entropy, the reason can refer to this blog post. The cross-entropy is defined as follows, from Https://hit-scir.gitbooks.io/neural-networks-and-deep-learning-zh_cn/content/chap3/c3s1.html.


The above article also introduced the cross-entropy can be used as the reason for the loss function, the first is that the cross-entropy obtained value must be positive, followed by the more accurate the prediction results, note that the "a" here is used to calculate the "a" is also sigmoid activated, the value range from 0 to 1. If the label is 1, the predicted value is also 1, the preceding item y ln (a) is 1 ln (1) equals 0, the latter (1–y) ln (1–a) is 0 ln (0) equals the 0,loss function is 0, and the loss function is infinitely large Conforms to our definition of the loss function.

The sigmoid activation function is emphasized many times here because some functions are not available under multi-objective or multi-classification problems, and TensorFlow itself provides a variety of cross-entropy algorithm implementations.

The cross-entropy function of TensorFlow

TensorFlow four cross-entropy functions are implemented for classification problems, respectively
Tf.nn.sigmoid_cross_entropy_with_logits, Tf.nn.softmax_cross_entropy_with_logits, Tf.nn.sparse_softmax_cross_ Entropy_with_logits and Tf.nn.weighted_cross_entropy_with_logits, detailed reference API documentation https://www.tensorflow.org/versions/ Master/api_docs/python/nn.html#sparse_softmax_cross_entropy_with_logits

Sigmoid_cross_entropy_with_logits detailed

Let's look at Sigmoid_cross_entropy_with_logits first, for what, because its implementation and the previous cross-entropy algorithm definition is the same, but also tensorflow the first implementation of the cross-entropy algorithm. The input of this function is logits and targets,logits is the W x matrix in the neural network model, note that there is no need to go through sigmoid, and targets shape and logits are the same, which is the correct label value. For example, this model once to determine whether 100 images contain 10 kinds of animals, both of the shapes are [100, 10]. The note also mentions that the 10 categories are independent and not required to be mutually exclusive, and that we become multi-objective, for example, to determine whether the picture contains 10 kinds of animals, the label value can contain more than 1 or 0 1, there is also a problem is a multi-classification problem, for example, we are divided into 5 segments of age characteristics, Only 5 values are allowed and only 1 values are 1, can this problem be directly used with this function? The answer is no, let's take a look at Sigmoid_cross_entropy_with_logits's code implementation.


Picture description


You can see that this is the standard cross entropy algorithm implementation,
the value obtained by W x is sigmoid activated, the value is guaranteed to be between 0 and 1, and then placed in the function of crossover entropy to calculate loss. For the two classification problem this is not a problem, but for the above mentioned multi-classification, such as the range of young values in 0~4, the target value is also in 0~4, here if after sigmoid after the predicted value is limited to 0 to 1, and the formula 1–z will appear negative, think carefully about 0 to 4 there is no linear relationship between If you bring the label value directly into the calculation there will be a very large error. So for the multi-classification problem is not directly into the, that we can be flexible, the 5-age prediction with Onehot encoding into 5-dimensional label, training as 5 different goals to train, but not guaranteed only one for 1, For this kind of problem TensorFlow also provides the cross entropy function based on Softmax.

Softmax_cross_entropy_with_logits detailed

Softmax itself algorithm is very simple, is to put all the value of E in the N-square calculation, summed after each value accounted for the ratio, guaranteed sum of 1, generally we can think Softmax out is confidence is the probability, the algorithm is implemented as follows.


Picture description


? softmax_cross_entropy_with_logits and Sigmoid_cross_entropy_with_logits are very different, input is similar to logits and lables shape, However, the results of the classification are mutually exclusive, guaranteeing that only one field has a value, for example, the picture in the CIFAR-10 can only be divided into a class rather than the previous judge whether it contains multiple animals. Think about asking what would be the limit? In the comment of the function header we see that this function passed in the logits is unscaled, neither do sigmoid nor do softmax, because the function implementation will be more efficient in the internal use of Softmax, For any input that passes through the Softmax to a probability prediction value of 1, this value can be replaced by the transformed cross Entroy algorithm-y ln (a)-(1–y) ln (1–a) algorithm, to get meaningful loss values. If it is a multi-objective problem, after Softmax will not get multiple and 1 probability, and label has more than 1 can not calculate the cross-entropy, so this function is only suitable for single-objective two classification or multi-classification problem, the TensorFlow function is defined as follows.


Picture description


To add, for multi-classification problems, such as our age is divided into 5 categories, and artificially coded to 0, 1, 2, 3, 4, because the output value is 5-dimensional characteristics, so we need to manually do onehot encoding respectively coded as 00001, 00010, 00100, 01000, 10000, it can be used as input to this function. Theoretically we do not do onehot encoding can also, made and for 1 probability distribution can also, but need to ensure that is and for 1, and not 1 of the actual meaning is not clear, TensorFlow C + + code implementation plan to check these parameters, you can warn users in advance to avoid misuse.

Sparse_softmax_cross_entropy_with_logits detailed

Sparse_softmax_cross_entropy_with_logits is an easy-to-use version of Softmax_cross_entropy_with_logits, with the same function and algorithm implementation as the input parameters. The input of softmax_cross_entropy_with_logits mentioned above must be a multidimensional feature similar to Onehot encoding, but CIFAR-10, imagenet and most categorical scenarios have only one classification target, The label values are all integers encoded from 0, each time turning into onehot encoding is more troublesome, is there a better way? The answer is to use Sparse_softmax_cross_entropy_with_logits, whose first parameter logits as before, shape is [Batch_size, num_classes], The second parameter, labels, must have been [Batch_size, num_classes], otherwise it would have been impossible to do cross Entropy, and the function would have been to restrict the stronger [batch_size], and the value must have been int32 or int64 encoded starting from 0, And the value range is [0, Num_class), if we start from 1 code or step size greater than 1, will cause some label value beyond this range, the code will directly error exit. This is also very well understood, tensorflow through such restrictions to know the user incoming 3, 6 or 9 corresponds to which class, and finally in the internal efficient implementation of similar onehot encoding, which is simply to simplify the user's input, if the user has done Onehot Encoding you can use the Softmax_cross_entropy_with_logits function without the "sparse" directly.

Weighted_sigmoid_cross_entropy_with_logits detailed

Weighted_sigmoid_cross_entropy_with_logits is an extended version of Sigmoid_cross_entropy_with_logits, with input parameters and implementations similar to the latter, which can support more than one Pos_ The weight parameter is designed to increase or decrease the loss of the positive sample when calculating cross entropy. The implementation principle is very simple, in the traditional sigmoid based on the cross-entropy algorithm, the positive sample calculated by multiplying the value of a coefficient interface, the algorithm is implemented as follows.


Picture description? summary

This is the function implementation that TensorFlow currently provides for cross entropy, where the user needs to understand the multi-objective and multi-classification scenarios, choosing an implementation based on sigmoid or Softmax according to the business requirements (whether the classification targets are independent and mutually exclusive). If using sigmoid now also supports weighted implementations, if you use Softmax we can do onehot coding ourselves or use the easier sparse_softmax_cross_entropy_with_logits function.

The cross entropy function provided by TensorFlow basically cover the problem of multi-objective and multi-classification, but if it is a multi-objective multi-classification scenario, it is certainly not possible to use softmax_cross_entropy_with_logits. If we use sigmoid_cross_entropy_with_logits we consider the characteristics of the multi-classification as independent, but in fact they have only one non-independent feature of 1, and the calculation of loss is not as effective as softmax. Here you can predict that the future TensorFlow community will achieve more OP resolution of similar problems, and we expect more people to participate in TensorFlow contribution algorithms and code!



II Pick up the picture students
Links: Http://www.jianshu.com/p/75f7e60dae95
Source: Pinterest
Copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please specify the source.

"Turn" TensorFlow implementation and application of four cross entropy algorithms

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.