Simple and understandable derivation of Softmax cross-entropy loss function

Source: Internet
Author: User
simple and understandable derivation of Softmax cross-entropy loss function

This blog transfer from: http://m.blog.csdn.net/qian99/article/details/78046329

To write a derivation of Softmax derivation process, not only can you clarify the idea, but also for the benefit of the public, not beautiful ~

Softmax is often added to the output layer in the neural network of the classification task, the key step in the reverse propagation of the neural network is the derivation, which can understand the process of the reverse propagation more profoundly, and also can think more about the problem of gradient propagation.
Softmax function

Softmax (Flexible maximum) function, generally in the neural network, Softmax can be used as the output layer of the classification task. In fact, it can be thought that softmax output is the probability of several categories of choice, such as I have a classification task, to be divided into three classes, Softmax function can be based on their relative size, output three categories of the probability of selection, and the probability and 1.

The formula for the Softmax function is this form:


The derivation process is as follows, the Softmax function is:


Loss functions Loss function

In the neural network reverse propagation, requires a loss function, this loss function actually represents the real value and the network estimate the error, knows the error, can know how to modify the network weight.
Loss function can have many forms, here is the cross-entropy function, mainly because the derivative result is relatively simple, easy to calculate, and cross-entropy to solve some loss function learning slow problem. The function of cross-entropy is this:


Here, Yi represents the true classification result.

First, we need to make clear what we're asking for, and what we're asking for is the gradient of our loss for the neuron output (Zi), namely:


According to the principle of derivation of complex function:


Some people may be wondering, why this is AJ instead of AI, here's a look at the Softmax formula, because the Softmax formula, its denominator contains the output of all neurons, so, for other outputs not equal to I, also contains Zi, all a must be included in the calculation range , and the subsequent calculations can be seen to be divided into i=j and i≠j two cases of derivation.
Below we push one by one:


The second one is slightly more complicated, so let's break it down into two situations:
① if I=j:


② if I≠j:


The next thing we need to do is to combine the above:


The final result looks a lot simpler, and finally, for the classification problem, the result of our given Yi eventually only has a category of 1, the other categories are 0, so for the classification problem, this gradient equals:


Looks more refreshed, we calculate the gradient is the neuron output-1, is not very magical ~

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.