Deeplearning Road (ii) Softmax return

Last Update:2016-04-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Softmax regression model

Softmax regression model is the extension of logistic regression model on multi-classification problem (logistic regression solves the problem of two classification).

For training sets, there are.

For a given test input, we hug the hypothetical function to estimate the probability value for each category J. In other words, we estimate the probability that each classification result appears. So our hypothetical function is going to enter a vector of a dimension to represent this estimated probability value. Assume the function as follows:

These are the parameters of the model. The probability distribution is normalized, and the sum of all probabilities is 1.

Cost function of Softmax regression:

The above formula is the generalization of logistic regression cost function. The logistic regression cost function can be changed to:

It can be seen that the Softmax cost function and the logistic cost function are very similar in form, except that the possible values of the class tag are accumulated in the Softmax loss function. Note the probability that the Softmax regression will be categorized as:

with the above partial derivative formula, we can put it into the gradient descent method and other algorithms, to minimize. For example, in the standard implementation of the gradient descent method, the following updates are required for each iteration:

2. Weight attenuation

In practical applications, in order to make the algorithm more simple and clear, it is often reserved all parameters, and not arbitrarily set a parameter to 0. But at this point we need to make a change to the cost function: adding weight attenuation. The weight attenuation can solve the numerical problems caused by the parameter redundancy of Softmax regression.

We modify the cost function by adding a weight falloff, which punishes too large parameter values, and now our cost function becomes:

With this weight decay term (), the cost function becomes a strict convex function, so that the only solution can be guaranteed. At this time the Hessian matrix becomes a invertible matrix, and because it is a convex function, the gradient descent method and the L-BFGS algorithm can guarantee the convergence to the global optimal solution.

In order to use the optimization algorithm, we need to find the derivative of this new function, as follows:

By minimizing it, we can implement a usable softmax regression model.

3. Model Selection

If you are developing a music classification application that requires the recognition of k types of music, do you choose to use the Softmax classifier, or do you use the logistic regression algorithm to create a K-independent two-dollar classifier?

This choice depends on whether your category is mutually exclusive, for example, if you have four categories of music: Classical, country, rock and jazz, then you can assume that each training sample will only be tagged (i.e. a song can only belong to one of these four types of music), At this point you should use the Softmax regression with the category number K = 4.

If your four categories are: vocal music, dance, movie soundtrack, pop songs, then these categories are not mutually exclusive. For example, a song can originate from the soundtrack and also contain vocals. In this case, a logistic regression classifier with 4 two classifications is more appropriate. Thus, for each new musical piece, our algorithm can determine whether it belongs to each category individually.

Now let's look at an example of computing the visual field, your task is to divide the image into three different categories. (i) Assume that these three categories are: Indoor scene, Outdoor city scene, outdoor wilderness scene. Would you use Sofmax regression or 3 logistic regression classifiers? (ii) Now suppose that these three categories are indoor scenes, black and white images, images containing characters, and will you choose Softmax regression or multiple logistic regression classifiers?

In the first example, three categories are mutually exclusive, so it is more appropriate to select the Softmax regression classifier. In the second example, it is more appropriate to build three independent logistic regression classifiers.

Transferred from: http://www.cnblogs.com/Rosanna/p/3865212.html

Reference: Http://deeplearning.stanford.edu/wiki/index.php/Softmax regression

Deeplearning Road (ii) Softmax return

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Deeplearning Road (ii) Softmax return

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Deeplearning Road (ii) Softmax return

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support