Machine Learning---2. Linear regression and data mining from the maximum likelihood view

Source: Internet
Author: User
http://blog.csdn.net/ppn029012/article/details/8908104
Machine Learning---2. From maximum likelihood to view linear regression classification: Mathematics machine Study 2013-05-10 00:34 3672 people read comments (15) Collection Report MLE machine learning

Directory (?) [+]

From maximum likelihood again see linear regression 1. Review of linear regression

In the previous section, we tried to solve the relationship between "house size and housing price", using linear regression to fit a linear equation, so that this equation and the room size data obtained to the maximum match.

So, the solution to our problem is,

Think of data as facts

To match data with a specific model (e.g. linear equation or nonlinear equations)

This data is considered a god, and let's use the model to match them. The data is the truth, when the error is very large, can only show that the model is not good enough, still need to work hard to match our data.
2. See linear regression from another angle

Just now the data is a fact, in other words, the data should be a manifestation of the fact. That is, "house price data" should be "house prices and the size of the relationship between" a performance. Now let's assume that the relationship between house prices and size in Beijing has been fixed.

Price = House size *500,

But we don't know, now we've got 5 data,

(500, 1), (502, 1), (1510, 3), (1120, 2), (1500, 2). You will find that these 5 data are not in line with the relationship. This is why, this is because the data contains not only the "house price" and "the size of the" relationship between the "housing", it is likely to include, "house price" and "Old and new", "housing" and "orientation", "housing" and "Community environment" ... and many other factors, and these factors are likely to be observed, and may not be observed.

Therefore, it is possible to predict the perfect and accurate prediction of the relationship between house price and the size of houses!! Just find out all the factors that affect house prices.

It's impossible to find all the factors that affect house prices!! So we might as well just need a recent relationship, so just think of all the other factors as small noises that are not related to the size of the house. So


Y is our house price, F (x) is the relationship between house prices and the size of houses, \epsilon is some small noise that has nothing to do with the size of the house, of course, because \epsilon is a random thing, we can use random variable e to express it,




3. Maximum Likelihood

Anyway, now that we have a bunch of x,y, we can try to find the most probable f (x) to fit the data.

What do you mean most likely?

If you have M f (x), then we need to evaluate which model is most likely to produce this string of data D (Y, X). Probabilities should be expressed in probability,

is the parameter of f (x), if the data is independent from the data, there is


The following equation shows the likelihood that the model generates data X,y


Because the x,y has been determined, now to make the most likely, we can only pass the adjusted value.

For any one data, (xi, Yi), we can calculate



Now to calculate the likelihood of a model producing data, we just need to know the error between the predicted value of the model and the actual value, and the distribution of this noise random variable E. The process of solving the maximum likelihood problem

Here, the problem can be solved, that is, for the existing data D (x,y) and any one parameter F (X), find the best parameters we need,

Select a model f (x), and initialize its parameters

Estimate the distribution of the noise random variable e (e.g. uniform distribution, Gaussian distribution ...), get likelihood expression

Calculates the likelihood function and adjusts the likelihood to achieve maximum

The method of adjustment can use the "derivative descent method" as described in the previous chapter, and of course, it can find the extremum point directly (the derivative is 0) to get its maximal minimum value.

The likelihood function varies depending on the choice of model F (x) and the choice of noise random variable E. Let me show you how the maximum likelihood relates to the front two-bit regression (linear regression (Linear regression) and categorical regression (Logistic regression)).
4. The maximum likelihood changes the body linear regression

At this time, the model I choose f (x) = ax + B, noise random variable e a normal distribution N (0,2).



To make likelihood the biggest, you just need to minimize it. Ah, ah. Is this formula familiar? Is this the cost function for the previous linear regression? The original linear regression is only a special case of maximum likelihood!
5. Maximum likelihood and classification

At this time, the model I choose f (x) =, the random noise variable distribution is no longer Gaussian distribution, is an extremely complex distribution. But luckily, we can get the likelihood expression, because




To unify this equation,


So there,


Finally, we can see that this likelihood function becomes the negative of the cost function in the classification regression. So maximizing likelihood is equivalent to minimizing the cost function in categorical regression.

The above two types of problems, linear regression and classification regression, can be deduced from the maximum likelihood estimation method, which shows that the maximum likelihood estimation method is a more universal method to describe the model matching.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.