Stanford CS229 Machine Learning course Note III: Perceptual machine, Softmax regression

Source: Internet
Author: User

To draw a full stop to the first four sessions of the course, here are two of the models that were mentioned in the first four lectures by Andrew the Great God.

The Perceptron Learning Algorithm Sensing machine

Model:

From the model, the Perceptron is very similar to the logistic regression, except that the G function of logistic regression is a logical function (also called the sigmoid function), which is a continuous curve from the Y value of 0 to 1. When Z→∞,g (z) →1, when Z→−∞,g (z) →0.
G (z) = 1/(1+e-z)

The G function of the perceptron is a piecewise function that outputs only 0 and 1.
Although similar to the logistic regression form, it is difficult to add probability interpretation to the prediction of the Perceptron, it is hard to say that the perceptron algorithm is derived from maximizing the likelihood function, but Andrew Ng also gives a gradient-ascending algorithm for training the Perceptron model (as with logistic regression):

Softmax regression 1. Model

Through the previous study, we know that the logic regression based on the Bernoulli two distribution can output the probability of two variables (such as: whether users click on the probability of the link, the user is the probability of return visit), in order to solve the two-dollar classification problem. But when we want to do multivariate classification (for example: Want to according to the user in the next one months of the recharge amount, divided into low recharge users, medium recharge users, top-up), then the logistic regression is not enough. At this point we can use the multivariate distribution multinomial distribution, which can output the corresponding probabilities of multivariate variables, and still construct the model from the perspective of the generalized linear model:
1.1 Multivariate distribution belongs to the exponential distribution family (the output variable has k different values, the corresponding multivariate distribution has k-1 parameters, φk represents the probability of the K value occurrence, the last parameter can be 1 minus the first k-1 parameters and get, so only k-1 parameters)
There are T (y) =y for the normal and Bernoulli distributions that have been mentioned before, but you need to define T (Y) here:

In addition, make:
(t (y)) I represents the first element of the vector T (y), such as: (t (1)) 1=1 (T (1)) 2=0
1{.} is an indicator function, 1{true} = 1, 1{false} = 0
(T (y)) i = 1{y = i}
Thus, we can introduce the multivariate distribution of the exponential distribution family form:

1.2 The goal is to predict the expectation of T (y), because T (y) is a vector, so the resulting output will also be a desired vector, where each element is:

Corresponds to the probability of each value occurrence in multivariate variables.
1.3 The function of the η about φ in the exponential distribution family representation of a polynomial distribution, which can be rolled back

This is the Softmax function, the name of the Softmax regression, and finally the Η=θt X into the formula, and finally we get the Softmax regression model:

2. Strategy

For probabilistic models, the method of maximizing the logarithmic likelihood function is still used:

3. Algorithms

The optimal solution can be obtained by the gradient ascending algorithm, and it is worth noting that Softmax regression will get the k-1*n matrix of a parameter.

Stanford CS229 Machine Learning course Note III: Perceptual machine, Softmax regression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.