Machine Learning: how to use the least squares and Python multiplication in python

The reason for "using" rather than "Implementing" is that the python-related class library has helped us implement specific algorithms, and we only need to learn how to use them. With the gradual mastery and accumulation of technology, when the algorithms in the class library cannot meet their own needs, we can also try to implement various algorithms in our own way.

What is the "Least Squares?

Definition: the least square method (also called the least square method) is a mathematical optimization technique. It minimizes the sum of squares of errors to find the optimal function matching for data.

Function: the least square method can be used to obtain unknown data and minimize the sum of squares between the obtained data and the actual data.

Principle: "Least Squares of residual" is used to determine the linear position (in mathematical statistics, residual refers to the difference between the actual observed value and the estimated value)

Basic Idea: For a linear regression model, assume that n groups of observed values (X1, Y1), (X2, Y2),… are obtained from the population ),..., (Xn, Yn), for the n points in the plane, you can use countless curves to fit. Linear regression requires the sample regression function to fit this set of values as well as possible. That is to say, this line should be at the center of the sample data as much as possible. Therefore, the criteria for selecting the best fitting curve can be determined to minimize the total fitting error (that is, the total residual.

The implementation code is as follows:

# Least Squares import numpy as np # scientific computing Database import scipy as sp # import matplotlib to some algorithm libraries implemented based on numpy. pyplot as plt # Drawing Library from scipy. optimize import leastsq # introduce the least squares algorithm ''' to set sample data. The actual data needs to be processed here ''' # sample data (Xi, Yi ), it needs to be converted into an array (list) in the form of Xi = np. array ([6.19, 2.51, 7.29, 7.01, 5.7, 2.66, 3.98, 2.5, 9.1, 4.2]) Yi = np. array ([5.25, 2.83, 6.41, 6.71, 5.1, 4.23, 5.05, 1.98, 10.5, 6.3]) ''' sets the shape determination process of the fit function and the deviation function: 1. first draw the sample image 2. determine the function form (straight line, parabolic, sine cosine, etc.) based on the approximate shape of the sample image ''' # function func to be fitted: Specify the function shape def func (p, x ): k, B = p return k * x + B # Deviation Function: x, y are all lists: Here x, y is higher than Xi, in Yi, there is a one-to-one corresponding def error (p, x, y): return func (p, x)-y ''' main part: Additional Part 1. the Return Value of the leastsq function is tuple. The first element is the result of the solution, and the second element is the value of the solution (personal understanding. original statement on the official website (second Value): Value of the cost function at the solution 3. instance: Para => (array ([0.61349535, 1.79409255]), 3) 4. the number of the first value in the returned value tuples is the same as the number of parameters to be solved ''' # k, the initial value of B, which can be set at will. After several experiments, it is found that the p0 value will affect the cost value: Para [1] p0 = [] # package the parameters except p0 in the error function into args (usage requirements) para = leastsq (error, p0, args = (Xi, Yi) # Read result k, B = Para [0] print ("k =", k, "B =", B) print ("cost:" + str (Para [1]) print ("the fitting line for the solution is :") print ("y =" + str (round (k, 2) + "x +" + str (round (B, 2) ''' plot, see the fitting effect. matplotlib does not support Chinese by default. If the label is set to Chinese, you need to set it separately. If an error is reported, you can change it to English. Then you can use ''' # To draw the sample point plt. figure (figsize = (8, 6) # specify the image proportion: 8: 6plt. scatter (Xi, Yi, color = "green", label = "Sample Data", linewidth = 2) # Draw a fitting line x = np. linspace (100, 100) # directly draw consecutive points from 0 to 15 y = k * x + B # functional plt. plot (x, y, color = "red", label = "fit straight line", linewidth = 2) plt. legend () # Draw the legend plt. show ()

The result is as follows:

Output result:

K = 0.900458420439 B = 0.831055638877

Cost: 1

The straight line of the solution is:

Y = 0.9x + 0.83

Drawing result:

Note: This section briefly lists the situations of straight lines. The method for solving the curve is similar, but there will be situations of over-fitting of the curve, which will be discussed in future blogs.