The concept of linear regression, logistic regression, various regression learning _ machine learning Combat

Source: Internet
Author: User

Regression is to try to find out the number of variables in the relationship between the change in the expression of the function expression, this expression called the regression equation.

Conditions/Prerequisites for regression issues:

1) collected data

2 The hypothetical model

The model is a function that contains unknown parameters and can be estimated by learning from the parameter values. The model is then used to predict or classify new data. 1. Linear regression

Each component of the collected data can be viewed as a feature.

When there are exact and strict straight lines in two variables, y=ax+b can be used to represent the functional relationship between the two. where x is the argument (independent variable), and y is the dependent variable (dependent variable). But in real life, due to the interference of other factors, many of the relations between the two variables are not strictly functional relations, can not be used to accurately reflect the function equation, in order to distinguish between the two variables of the function relationship, we call this relationship as a regressive relationship, using the linear equation to express the relationship as regression line or linear regression

The vector representation is:

The problem can be solved by linear programming, which can be understood as a combinatorial problem, the weighted combination of each feature, to get a result value.

How to obtain the weight, that is, the value of the parameter theta vector. A linear matrix equation, directly solved, may not be directly solved, there is a unique solution of the dataset, minimal. are basically solutions that do not exist in a set of hyper-definite equations. Therefore, we need to step back, to solve the problem of the parameters, to find the minimum error problem, to find a closest solution, this is the relaxation solution.

To find the closest solution, we must first determine the expression with the least error.

The definition error is as follows:


Ask for the smallest theta value of J (Theta).

How to solve the minimum value of J (Theta).

1) Least squares

The least squares method is a direct mathematical solution, but it requires that X be ranked full rank.


2) Gradient Descent method

There are gradient descent method, batch gradient descent method and increment gradient descent. In essence, the partial derivative, step/best learning rate, update, convergence problem.

This algorithm is a common method in the optimization principle, can be combined with the principle of optimization to learn, it is easy to understand. 2. Logistic regression

The relationship and similarities and differences between logistic regression and linear regression.

The logistic regression model is a non-linear model, sigmoid function, also known as logistic regression function. But it is essentially a linear regression model, because except for the sigmoid mapping function, the other steps and algorithms are linear regression. It can be said that the logistic regression is supported by the linear regression theory.

However, the linear regression model can not achieve the Non-linear form of sigmoid, and sigmoid could handle 0/1 classification problems easily.

In addition, its derivation meaning: still with the linear regression maximum likelihood estimate derivation is same, the maximum likelihood function continuous product (here's distribution, may make the Bernoulli distribution or the Poisson distribution and other distribution forms), the derivation, loses the function.


Logical regression function:


showed 0, 1 forms of classification.

Application Examples:

is spam (category). Tumor, cancer (diagnostic prediction). Whether it is a financial fraud (classification). 3. General linear Regression

The linear regression is based on the Gaussian distribution as the error analysis model, and the Bernoulli distribution analysis error is used in logistic regression.

The Gaussian distribution, Bernoulli distribution, beta distribution and Dietritt distribution are all exponential distributions.

Through the derivation of maximum likelihood estimation, we can derive the error analysis model of general linear regression (minimizing error model).

Softmax regression:

is an example of a general linear regression. For many kinds of problems (the logical regression is to solve the two class division problem), such as the classification of characters, 0-9, 10 digits, Y-value has 10 possibilities.

And the distribution of this possibility is an exponential distribution, and all possible and 1, the result for one input can be expressed as:


The cost function is:


Is the generalization of the cost function of logic function.


For the solution of Softmax, there is no closed solution (the solution of High-order polynomial equations), which is still solved by gradient descent method or L-bfgs.

When k=2, Softmax degenerate into logistic regression, which can also reflect that Softmax regression is the generalization of logistic regression.

Linear regression, logistic regression, Softmax return to the three contact, need to repeatedly aftertaste, think more, understanding can be in-depth. 4. Fitting: Fitting model/function

By measuring the data, estimating a hypothetical model/function and how to fit it. Whether the fitted model is appropriate. Can be divided into the following three categories: Fit to fit the wrong fit to see the diagram of an article (appendix), it is very good to understand. Less fitting:
Appropriate fitting:
Cross fitting:

How to solve the problem of fitting.

The origin of the problem. The model is too complex, too many parameters, too many features.

Method:

1 reduce the number of features, there is manual selection, or the use of model selection algorithm

Http://www.cnblogs.com/heaad/archive/2011/01/02/1924088.html (a review of feature selection algorithms)

2 regularization, that is, to retain all features, but to reduce the parameters of the worthy influence.

The advantage of regularization is that, with a lot of features, each feature will have an appropriate impact factor. 5, probability explanation: Linear regression Why use the square sum as the error function.


Turn from: http://xgli0910.blog.163.com/blog/static/469621682013101211712163/

http://blog.csdn.net/viewcode/article/details/8794401#

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.