Deep Learning: 13 (Softmax Regression)

Last Update:2018-07-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transferred from: http://www.cnblogs.com/tornadomeet/archive/2013/03/22/2975978.html

Author: tornadomeet

Source: Http://www.cnblogs.com/tornadomeet

In front of the logistic regression blog Deep Learning: Four (logistic regression exercise) , we know that the logistic regression is well suited for some non-linear classification problems, However, it is only suitable for dealing with the problem of two classification, and the probability of the result will be given when the classification result is given. So if you need to use a similar method (a similar approach here is to output the classification results and give the probability value) to deal with the multi-classification problem, how to expand it. This is mainly about the logstic regression extension of a multi-classifier, Softmax regression. Reference content for Web pages: http://deeplearning.stanford.edu/wiki/index.php/Softmax_Regression

In the logistic regression, the process of learning the system is:

The corresponding loss function is:

As can be seen, given a sample, the output of a probability value, the probability value represents the probability that the sample belongs to Category ' 1 ', because there are only 2 categories, so the probability of another class directly with 1 minus the results just. If the present hypothesis is a multi-classification problem, for example a total of k categories. The equations of the system at this time in Softmax regression are:

The parameter SIDTA is no longer a column vector, but a matrix, each row of the matrix can be regarded as the parameter of the classifier corresponding to a category, with a total of k rows. So the matrix SIDTA can be written in the following form:

At this point, the equation for the system loss function is:

One of the 1{.} is an indicative function, that is, when the value in the curly brace is true, the result of the function is 1, otherwise the result is 0.

Of course, if we want to use gradient descent method, Newton method, or L-bfgs method to obtain the parameters of the system, we must find out the derivative function of the loss function, Softmax regression the loss function of the partial derivative function is as follows:

Note that the formula is a vector that represents the category I was evaluated for. So the above formula is just a partial derivative of the class, we need to find out all the categories of the biased formula. Represents the bias of the loss function to the L parameter of category J.

More interesting, Softmax regression in the optimization of the parameters of more than one solution, whenever an optimization parameter is obtained, if each item of this parameter is lost the same number, the resulting loss function value is the same. This indicates that the parameter is not a unique solution. The mathematical formula proves that the process is as follows:

So what is the reason for this? From the macro can be understood, because at this time the loss function is not strictly non-convex, that is, near the local minimum point is a "flat", so the values around this parameter are the same. So how to avoid this problem. In fact, the addition of the rules can be solved (for example, when the Newton method, Hession matrix if not added to the rule, it is possible that it is not reversible resulting in the situation just now, if the rule entry after the Hession matrix will not be irreversible), add the rule after the loss function expression is as follows:

The partial-derivative expression at this time is as follows:

The rest of the problem is to use mathematical optimization method to solve, in addition to the mathematical formula to understand Softmax regression is the extension of the logistic regression.

The differences and conditions of use between Softmax regression and K binary classifiers are also described in the Web tutorial. A summary is one of the main points: if the required classification categories are strictly mutually exclusive, that is, two categories can not be occupied by a sample at the same time, you should use Softmax regression. Anyway, if there is some overlap between the categories you want to classify, you should use binary classifiers.

Resources:

Deep Learning: Four (logistic regression exercise)

Http://deeplearning.stanford.edu/wiki/index.php/Softmax_Regression

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Deep Learning: 13 (Softmax Regression)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Deep Learning: 13 (Softmax Regression)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support