Deeplearning Road (ii) Softmax return

Source: Internet
Author: User

1. Softmax regression model

Softmax regression model is the extension of logistic regression model on multi-classification problem (logistic regression solves the problem of two classification).

For training sets, there are.

For a given test input, we hug the hypothetical function to estimate the probability value for each category J. In other words, we estimate the probability that each classification result appears. So our hypothetical function is going to enter a vector of a dimension to represent this estimated probability value. Assume the function as follows:

These are the parameters of the model. The probability distribution is normalized, and the sum of all probabilities is 1.

Cost function of Softmax regression:

The above formula is the generalization of logistic regression cost function. The logistic regression cost function can be changed to:

It can be seen that the Softmax cost function and the logistic cost function are very similar in form, except that the possible values of the class tag are accumulated in the Softmax loss function. Note the probability that the Softmax regression will be categorized as:

with the above partial derivative formula, we can put it into the gradient descent method and other algorithms, to minimize. For example, in the standard implementation of the gradient descent method, the following updates are required for each iteration:

2. Weight attenuation

In practical applications, in order to make the algorithm more simple and clear, it is often reserved all parameters, and not arbitrarily set a parameter to 0. But at this point we need to make a change to the cost function: adding weight attenuation. The weight attenuation can solve the numerical problems caused by the parameter redundancy of Softmax regression.

We modify the cost function by adding a weight falloff, which punishes too large parameter values, and now our cost function becomes:

With this weight decay term (), the cost function becomes a strict convex function, so that the only solution can be guaranteed. At this time the Hessian matrix becomes a invertible matrix, and because it is a convex function, the gradient descent method and the L-BFGS algorithm can guarantee the convergence to the global optimal solution.

In order to use the optimization algorithm, we need to find the derivative of this new function, as follows:

By minimizing it, we can implement a usable softmax regression model.

3. Model Selection

If you are developing a music classification application that requires the recognition of k types of music, do you choose to use the Softmax classifier, or do you use the logistic regression algorithm to create a K-independent two-dollar classifier?

This choice depends on whether your category is mutually exclusive, for example, if you have four categories of music: Classical, country, rock and jazz, then you can assume that each training sample will only be tagged (i.e. a song can only belong to one of these four types of music), At this point you should use the Softmax regression with the category number K = 4.

If your four categories are: vocal music, dance, movie soundtrack, pop songs, then these categories are not mutually exclusive. For example, a song can originate from the soundtrack and also contain vocals. In this case, a logistic regression classifier with 4 two classifications is more appropriate. Thus, for each new musical piece, our algorithm can determine whether it belongs to each category individually.

Now let's look at an example of computing the visual field, your task is to divide the image into three different categories. (i) Assume that these three categories are: Indoor scene, Outdoor city scene, outdoor wilderness scene. Would you use Sofmax regression or 3 logistic regression classifiers? (ii) Now suppose that these three categories are indoor scenes, black and white images, images containing characters, and will you choose Softmax regression or multiple logistic regression classifiers?

In the first example, three categories are mutually exclusive, so it is more appropriate to select the Softmax regression classifier. In the second example, it is more appropriate to build three independent logistic regression classifiers.

Transferred from: http://www.cnblogs.com/Rosanna/p/3865212.html

Reference: Http://deeplearning.stanford.edu/wiki/index.php/Softmax regression

Deeplearning Road (ii) Softmax return

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.