Sigmoid Cross Entorpy Loss

Source: Internet
Author: User

1.Cross Entropy Error
The mathematics behind Cross entropy (CE) error and its relationship to NN training is very complex, but, fortunately, th E results is remarkably simple to understand and implement. CE is the best explained by example. Suppose you has just three training items with the following computed outputs and target outputs:

InCE, A 4-7-3 NN (4input-7hindden-3output) is instantiated and then trained using the back-propagation algorithm in con junction with cross entropy error. After training's completed, the NN model correctly predicted the species of (0.9667) test items.

Using a Winner-takes-all evaluation technique, the NN predicts the first data items correctly because the positions of the largest computed outputs match the positions of the 1 values in the target outputs, but the NN is incorrect on the Third data item. The mean (average) squared error for this data is the sum of the squared errors divided by three. The squared error for the first item is (0.1-0) ^2 + (0.3-0) ^2 + (0.6-1) ^2 = 0.01 + 0.09 + 0.16 = 0.26. Similarly, the squared error for the second item was 0.04 + 0.16 + 0.04 = 0.24, and the squared error for the third item is 0.49 + 0.16 + 0.09 = 0.74. The mean squared error is (0.26 + 0.24 + 0.74)/3 = 0.41.

Notice the Some sense the NN predicted the first, the first, and the accuracy for because both items the C omputed outputs that correspond to target outputs of 1 is 0.6. But observe the squared error is different (0.24 and 0.26), because all three outputs contribute to the sum.

The mean (average) CE error for the three items are the sum of the CE errors divided by three. The fancy-Express CE error with a function was shown in Figure 2.

In words this means, "Add up the" the product of the log to the base e of each computed output times its corresponding targe T output, and then take the negative of this sum. " The three items above, the CE of the first item is-(ln (0.1) *0 + ln (0.3) *0 + ln (0.6) =-(0 + 0-0.51) = 0.51. The CE of the second item is-(ln (0.2) *0 + ln (0.6) * + ln (0.2) *0) =-(0-0.51 + 0) = 0.51. The CE of the third item is-(LN (0.3) * + ln (0.4) *0 + ln (0.3) *0) =-(-1.2 + 0 + 0) = 1.20. So the mean cross entropy error for the Three-item data set is (0.51 + 0.51 + 1.20)/3 = 0.74.

Notice that is computing mean cross entropy error with neural networks in situations where target OU Tputs consist of a single 1 with the remaining values equal to 0, all the terms in the sum except one (the term with a 1 t Arget) would vanish because of the multiplication by the 0s. Put Another, cross entropy essentially ignores all computed outputs which don ' t correspond to a 1 target output. The idea was when computing error during training, you really don ' t care about far off the outputs which was associated With non-1 targets is, you ' re only concerned and how close the single computed output that corresponds to the target Val UE of 1 is to that value of 1. So, for the three items above, the CEs for the first and both items, which in a sense were predicted with equal accuracy, is b Oth 0.51.

    • Contact NG in the ML course of LR regression, it is known that the LR regression loss that Ng referred to is actually sigmoid cross entorpy loss (note Noticeabove). Of course sigmoid Cross entorpy loss is not only used in such problems, but can also be applied to multi-label learning (multi-label learning concepts).
    • The difference between multi-label learning and traditional single-label learning is that:
      Traditional Single-label classification is concerned with learning from a set of examples, is associated with a Singl E label L from a set of disjoint labels L, | l| > 1. In Multi-label classification, the examples is associated with a set of labels Y in L. In the past, Multi-label classification is mainly motivated by the tasks of the text categorization and medical diagnosis. Nowadays, we notice that Multilabel classification methods is increasingly required by modern applications, such as Prote In function classification, music categorization and semantic scene classification.

2. Calculation of Sigmoidcrossentropyloss layer in Caffe
Reference from CAFFECN

Sigmoid Cross Entorpy Loss

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.