Deep Learning: 13 (Softmax Regression)

Source: Internet
Author: User

Transferred from: http://www.cnblogs.com/tornadomeet/archive/2013/03/22/2975978.html

Author: tornadomeet

Source: Http://www.cnblogs.com/tornadomeet

In front of the logistic regression blog Deep Learning: Four (logistic regression exercise) , we know that the logistic regression is well suited for some non-linear classification problems, However, it is only suitable for dealing with the problem of two classification, and the probability of the result will be given when the classification result is given. So if you need to use a similar method (a similar approach here is to output the classification results and give the probability value) to deal with the multi-classification problem, how to expand it. This is mainly about the logstic regression extension of a multi-classifier, Softmax regression. Reference content for Web pages: http://deeplearning.stanford.edu/wiki/index.php/Softmax_Regression

In the logistic regression, the process of learning the system is:

The corresponding loss function is:

As can be seen, given a sample, the output of a probability value, the probability value represents the probability that the sample belongs to Category ' 1 ', because there are only 2 categories, so the probability of another class directly with 1 minus the results just. If the present hypothesis is a multi-classification problem, for example a total of k categories. The equations of the system at this time in Softmax regression are:

The parameter SIDTA is no longer a column vector, but a matrix, each row of the matrix can be regarded as the parameter of the classifier corresponding to a category, with a total of k rows. So the matrix SIDTA can be written in the following form:

At this point, the equation for the system loss function is:

One of the 1{.} is an indicative function, that is, when the value in the curly brace is true, the result of the function is 1, otherwise the result is 0.

Of course, if we want to use gradient descent method, Newton method, or L-bfgs method to obtain the parameters of the system, we must find out the derivative function of the loss function, Softmax regression the loss function of the partial derivative function is as follows:

Note that the formula is a vector that represents the category I was evaluated for. So the above formula is just a partial derivative of the class, we need to find out all the categories of the biased formula. Represents the bias of the loss function to the L parameter of category J.

More interesting, Softmax regression in the optimization of the parameters of more than one solution, whenever an optimization parameter is obtained, if each item of this parameter is lost the same number, the resulting loss function value is the same. This indicates that the parameter is not a unique solution. The mathematical formula proves that the process is as follows:

So what is the reason for this? From the macro can be understood, because at this time the loss function is not strictly non-convex, that is, near the local minimum point is a "flat", so the values around this parameter are the same. So how to avoid this problem. In fact, the addition of the rules can be solved (for example, when the Newton method, Hession matrix if not added to the rule, it is possible that it is not reversible resulting in the situation just now, if the rule entry after the Hession matrix will not be irreversible), add the rule after the loss function expression is as follows:

The partial-derivative expression at this time is as follows:

The rest of the problem is to use mathematical optimization method to solve, in addition to the mathematical formula to understand Softmax regression is the extension of the logistic regression.

The differences and conditions of use between Softmax regression and K binary classifiers are also described in the Web tutorial. A summary is one of the main points: if the required classification categories are strictly mutually exclusive, that is, two categories can not be occupied by a sample at the same time, you should use Softmax regression. Anyway, if there is some overlap between the categories you want to classify, you should use binary classifiers.

Resources:

Deep Learning: Four (logistic regression exercise)

Http://deeplearning.stanford.edu/wiki/index.php/Softmax_Regression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.