"Python Data Mining" regression model and its application

Last Update:2017-08-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

linear regression (Linear Regression)

Definition: In supervised learning, the study sample is D ={ (x(i), y(i));i = 1, ..., M }, the predicted result Y(i) is a continuous value variable, need to learn map f:x→y, And it is assumed that there is a linear correlation between the input x and the output Y.

Give a set of data:

where X is a two-dimensional vector in the real field. For example, XI1 is The living area of the first house, Xi2 is the number of rooms in this House.

To perform supervised learning, we need to decide how to represent our functions/assumptions in the computer. We can approximate the use of linear functions to represent.

(Matrix form)

Now, with the training data, how do we pick, or know the value of theta? A credible approach is to make H (x) closer to Y, at least for the example of our training.

Thus, we define a loss function/cost function (loss function/cost functions):

We take the X-to-y mapping function f as the function of θ hθ (x)

There are many types of loss functions, which are selected according to requirements.

Then minimize the loss function , the function is optimized into convex function (often there will only be a global optimal solution, do not worry too much about the algorithm convergence to the local optimal solution).

gradient descent (Gradient descent algorithm )

The fastest speed minimization loss function, compared to how the fastest downhill, that is, each step should be the steepest direction of the slope down, and the steepest slope of the direction is the loss of the corresponding partial derivative of the function.

So the rules of the algorithm iteration are:

Suppose there are now N features, or a variable XJ (J=1...N)

where α is the parameter of the algorithm learning Rate,α the greater the amplitude of each step, the faster the speed will be, but it is possible to repeatedly concussion, resulting in an inaccurate algorithm.

Under-fitting and over-fitting (underfitting and Overfitting)

Under-fitting problem: The characteristic value is few, the model is too simple and insufficient and support.

Overfitting problem: There are a lot of features, the model is very complex, our hypothetical function curve can fit the original data very well, but loss of generality, resulting in a new sample to be predicted, the prediction effect is poor.

Regular items, regularization

The parameter amplitude is controlled by the regular term.

Regular items are available in a variety of ways, often using:

L1 Regular: |θj|

L2 Regular: Θj2

Logistic regression (logistic Regression)

When the linear regression is used to solve the classification problem, the robustness of the model is low and the interference is serious when the noise is encountered.

We can make the appropriate modifications to the old linear regression algorithm to get the function we want.

Introduce the sigmoid function:

The original function hθ (x) is rewritten to get:

Observation function Image discovery: When x is greater than 0 o'clock, the value of Y is greater than 0.5, according to which the predicted value of the linear regression can be compressed within the 0~1 range.

1. Linear decision Boundary:

Suppose the linear function is:,

When hθ (x) > 0 o'clock, the value of G (hθ (x)) is greater than 0.5;

When hθ (x) < 0 o'clock, the value of G (hθ (x)) is less than 0.5;

2. Non-linear decision boundary:

Suppose the function is:

When θ0=0,θ1=0,θ2=0,θ3=1,θ4=1, get the function g (x12+x22-1), the boundary is a circle, the value of the inner point of the circle is less than 0

Define the loss function:

The function is a non-convex function with a local minimum value, and other functions should be selected.

Define the loss function as:

The image of the function is as follows:

We can find this function in:

In Y=1 's positive sample,hθ (x) tends to be 0.99~9, at which point we want to get a smaller price, and when the predicted value is 0.00~1, we want it to be more expensive;

In a negative sample of y=0,hθ (x) tends to be 0.00~1, at which point we want to get a smaller price, and when the predicted value is 0.99~9, we want it to be more expensive;

The loss function can be rewritten as:

Join the regular item:

Ii. Classification and multi-classification

One vs One

One vs Rest

Method One:

1. First classify the triangle and fork, get the classifier C1, and the probability value PC1 (x) and 1-PC1 (x)

2. Then classify the triangles and squares, get the classifier C2, and the probability values PC2 (x) and 1-PC2 (x)

3. Finally classify the square and fork, get the classifier C3, and the probability value PC3 (x) and 1-PC3 (x)

Get through 3 classifiers, 6 probability values, the maximum probability value of the judgment for the corresponding type!

Method Two:

1. First classify the triangle, determine whether it is a triangle, get the classifier C1, and the probability value PC1 (x)

2. Then classify the square, determine whether it is a square, get the classifier C2, and the probability value PC2 (x)

3. Finally, the Fork fork is classified to determine whether it is a fork fork, to get the classifier C3, and the probability value PC3 (x)

Get 3 classifiers, 3 probability values, the maximum probability value for the corresponding type of judgment!

Application:

Cond.....

"Python Data Mining" regression model and its application

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Python Data Mining" regression model and its application

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Python Data Mining" regression model and its application

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support