Principles of multivariate linear models, python code, and Linear Models

Last Update:2017-10-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Share URL: http://www.cnblogs.com/DesertHero2013/p/7662721.html

1) Goal: Use a linear combination of attributes to make a prediction model. That is:

Where is, after w and B are learned, the model is determined. It can be understood as the weight of each attribute value.

2) Performance Measurement: calculate the mean square error and minimize it, which is the required w and B:

The mean square error is the mean of the sum of the distance squares from the actual values, that is, the mean of the sum of the squared errors. However, we can see that this is not divided by m. Of course, the sum of squares is the smallest, and dividing by m is also the smallest.

The least square method is used to minimize the Euclidean distance from the sample to the straight line.

A.Least Squares: Find 1(Group) Estimated value,MakeDistance between the actual value and the estimated valueMinimum. It is ideal to summarize and minimize the absolute values of the two differences, but it is troublesome to calculate the minimum value of the absolute values in mathematics,Find an (Group) estimated value to minimize the difference between the actual value and the estimated value.. Least square is used in the English language of "2 multiplication". In fact, the literal meaning of the English is "least square ". At this time, calculate the derivative of the sum of the square of the difference to the parameter, and take the first derivative to zero, that is, OLSE.

3) In general, we encounter more than one attribute, so we use a matrix to solve it. X indicates the matrix of m * (d + 1) size, indicating the attribute value, y indicates the mark, and w * indicates the required weight.

The = (w, B) Here is a vector that is intentionally combined. W is a vector, and B is a value, which is equivalent to an error.

It is the matrix form of the mean variance. If it is the smallest, you only need to evaluate the result. Get:

The above formula is = 0. We can see the required parameters.

This is the formula we can use for programming. This method is called the regular equation method. How can I find the above formula?

2X ^ T (Xw-y) = 0,

2X ^ TXw-2X ^ Ty = 0

X ^ TXw = X ^ Ty

If X ^ TX is full, it is reversible. Therefore, the left side of both sides is multiplied by (X ^ TX) ^-1 at the same time.

Therefore:

W = (X ^ TX) ^-1) X ^ Ty, that is, the preceding result.

The following is our Python code:

#-*-Coding: UTF-8-*-"Created on Tue Oct 10 23:10:00 2017 Version: python3.5.1 @ author: Stone" "import pandas as pdfrom numpy. linalg import invfrom numpy import dot # regular equation method # fitting linear model: Sepal. length ~ Sepal. Width + Petal. Length + Petal. Width # In this model, the Length of a specified flower is estimated based on the Width of the flower surface, the Width of the flower surface, and the Length of the flower surface. # Model: Sepal. length = Sepal. width * x1 + Petal. length * x2 + Petal. width * x3 + biris = pd.read_csv('iris.csv ') ← read iris.csv file temp = iris. iloc [:,] # indicates that in the iris of your defined matrix, the rows are [0, end), and the columns are [) # iloc directly determines the number of rows and columns, left closed right open temp ['x0'] = 1 # To add error B to w * = (w, B) vector (w * On P55 page of machine learning by Zhou Zhihua has a '^' symbol on it, which is hard to be entered here). Add one column X0 to the end of the temp table and assign one value to each line, it is to absorb B into w *. X = temp. iloc [:, [0, 1, 2, 3] # The Value assignment matrix X is column 0, 1, 2, and 3, with 150 rows. The column name is Sepal. width, Petal. length, Petal. width, x0Y = iris. iloc [:, 1] # select the tag column: Sepal. length is our fitting target Sepal. length. Y = Y. values. reshape (len (iris), 1) # values is the returned value reshape () remodeling array, as shown in the following figure: a new list of iris table lengths, such as [a], [B],...] # In the book, Y is a tag, which is the target we need to fit. Theta_n = dot (inv (dot (X. t, X), X. t), Y) # theta = (x' X) ^ (-1) x' Y dot is calculated according to the matrix multiplication rule. inv () is the X of the inverse matrix. T is the conversion print (theta_n) of the X matrix # print the weight coefficient print ('enter the Sepal to be predicted in sequence. width, Petal. length, Petal. width: ') x1, x2, x3 = map (float, input (). split () t = (0.65083716 * x1 + 0.70913196 * x2-0.55648266 * x3 + 1.85599749) print ('predicted CIDR length: ', t) # tested, the error of the three types of data is acceptable, at least fitting. # The system values of x1, x2, x3, and B above are the obtained weights.

Iris training set used in this Code: http://files.cnblogs.com/files/sumai/iris.rar

The output is as follows:

The length of the original flower volume is 5.4, so the fitting is successful. The accuracy is hard to say.

Linear regression: it is to find the linear relationship between several sample property values (independent variables) and labeled values (my fit targets, dependent variables. I have just learned this and I have an error. I hope to point it out.

I am sorry for the poor layout and formula of the first such article.

Reference: 1. https://www.cnblogs.com/sumai/p/5211558.html

2. Zhou Zhihua's machine learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Principles of multivariate linear models, python code, and Linear Models

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Principles of multivariate linear models, python code, and Linear Models

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support