Python machine learning "regression One"

Source: Internet
Author: User

Chit chat when chit chat when
is still every time casually say two little things of life. said that recently a little lazy, may be the reason for the exam, plus these two days have written tests and various interviews, so that the mood becomes less quiet to knock code, not so quiet learning algorithm. For the first time and the technical director of the chat is not very understanding of the decoration of this thing, and even do not know what functional programming is what; yesterday chatting with another manager is also not able to learn the algorithm very good expression, really clad think * * ah. Well, it seems that the wrong words, anyway, is the human brain in addition to thinking about eating meat and more exercise, fortunately, every night yoga can let oneself sink down to calm calm. Recall the various interviews, now I can no doubt be able to express those problems, but many times expensive in response speed, so although the response is not dull, but the answer is not perfect is usually not repeatedly consolidate the results of knowledge, do not pull, write notes.

Then the previous article Python machine learning "Getting Started"

Body:
In the previous introductory article, we mainly introduced two algorithms for machine learning tasks: supervised learning and unsupervised learning. Among them, the two most important things in supervised learning are regression and classification prediction. Here, we mainly talk about regression prediction. In the previous section, it is estimated that many people see a lot of text description is a headache, said this section will try to explain, the example is the implementation of Python programming results oh. The next article will specifically implement a data crawl, analysis and training of the final prediction process.

1, the source of the return
The word "regression" was invented by Darwin's cousins (the genius of the family), saying that the cousin began by using pea seeds (both parents) to predict the size of the next generation, and then found some rules to follow, so observed the human heredity, found that if the parents are taller than the average height, The height of the child is also inclined to a higher height, but not more than the parents (saying this sentence in my side of the perfect embodiment, so that I distress why the shortest in the family). This phenomenon is the child's height toward the average height back (return). Although the relationship between the numerical prediction and the fallback phenomenon is not very big, but the people are Darwin's cousin, so they quoted the designated academic name of the family ~

In the last section, we talked about the problem of the monthly price forecast, which is actually the input variable "house price" x and maps the output to a continuous expected result function f (x). Specifically, suppose we have such a group of data combinations (x (i), Y (i)), where I=1,..., m; that is, there is a total of M data combination samples, now with the "machine learning Combat" This book gives the data ex0.zip file to achieve the regression prediction of the dataset.

First, in the attachment to download and open the data file Ex0.txt, observed:

    • The first column in the data is 1, so it is clear that we can use the next two columns as the x, Y value, although we don't know the name of the data in the actual application at the moment.
    • The interval between all data (columns and columns) is tab-delimited, with one row for each sample data, which facilitates our later data reading.

Well, the most convenient way to analyze data is to visualize the data, then draw a picture to see how the data is trending:

1.1 Preparing data: Importing data from a text file using Python

Create a file named "reg.py", all of the code in this section is saved in the file; Before drawing, we need to prepare for the following steps: Reading the data and saving the data parsing to the matrix. The code is as follows:

1 #Coding=utf-82 __author__='wing1995'3 4 5  fromNumPyImport*6 7 8 9 defFile2matrix (filename):Tenf =open (filename) OneContents =F.readlines () ALength = len (contents)#number of rows to get the contents of the file -Mat = Zeros ((length, 3))#Create an empty matrix for storing file contents -index =0 the      forLineinchContents: -line = Line.strip ()#remove newline characters from each line -data = Line.split ('\ t') -Mat[index,:] = data#store each column of data in a row index into an empty matrix +Index + = 1 - returnMat (MAT)

Okay, now that you have the matrix, you can index the matrix in any way, such as indexing the second column of data:

1 " C:\Users\wing1995\Desktop\machinelearninginaction\Ch08\ex0.txt " 2 Datamat = File2matrix (data_file)3print datamat[:, 1]

PS: about the first column of data in the data why all is 1 later, it belongs to the default eigenvalue x0.

1.2 Analyzing data: Creating a scatter plot with matplotlib

The main drawing uses the following functions:

Basic functions for plotting: Plt.scatter (), Plt.plot (), Plt.bar ()
Custom axes and caption functions: Plt.xlabel (), Plt.ylabel (), Plt.title ()
Basic graphical display, clear function: Plt.show (), PLT.CLF ()

The specific function of the drawing can be viewed through the "help" command, the basic knowledge is not mentioned here, directly in the "reg.py" file to add the paint function:

1 def My_scatter (datamat): 2 x = datamat[:, 1]3 y = datamat[:, 2]4 plt.xlabel ('x ' )5 plt.ylabel ('y')6  Plt.scatter (x, y)7 plt.show ()   

As follows:

It can be clearly seen that the picture is going up, and if you want to use a line to fit the trend, it should be a straight line. Therefore, we give the "hypothetical function" of the fitted curve:

The so-called fitting, is to try to establish and invoke the function h (x), so that the input data x map to the output result y.
The above sample is a little big, for example:

Now, the random guess assumes the function of the two parameters theta0=2 and theta1=2, at this time assuming the function h (x) =2+2*x. The resulting mapping results are as follows:

This is not an easy way to see if our hypothetical function can predict the Y value well. Therefore, there is the concept of "cost function".

cost function J (theta): We can measure the accuracy of the hypothetical function by cost function, where the precision refers to the difference between the predicted value H (x) and the True value Y. Since the sample size is often greater than one, it is necessary to compare the input values in the sample to the function value obtained in the hypothetical function and the actual value Y to obtain the average of the predicted error of the sample, the formula is as follows:

As a math person, look at this is very familiar, engineering children's shoes may look at the formula is not too accustomed. Generally speaking, we should all know the definition of the mean, then the above J (theta0,theta1) is actually 1/2*m, where M is the mean value of squared error. where M is a sample of M, such as the m=4 of sample data in the table above. Another name for this cost function may be more well known--"squared error function" or "mean square error", where the mean value halved is also a simple implementation of the gradient descent algorithm, because 2 of the squared derivation will be offset by the 1/2 here. There is another question is why the original design error of the person does not directly say that the error or negative offset and to do the square is also I have not been clear of the problem, the last study of image processing mixed complementary model ROF also has such a sum of squares of the expression, the tutor asked me why, I did not answer it, only think it is a fixed definition, wish to know a friend to explain.

With this cost function, we can derive the fitting precision of the assumption function according to the table above, then the problem comes: the theta value in the preceding assumption function is also our hypothesis, for large sample data, the value of our subjective given theta is often not high enough to fit the accuracy. How to solve this optimal theta to get the best assumption function for fitting effect?

gradient descent algorithm: Now we have a hypothetical function and a way to measure its accuracy (cost function). Now we need a way to improve our hypothetical function, which is the gradient descent method.

1.3 Draw a diagram in Python to give you a better understanding of the cost function J (theta).
1.3.1 programming to implement computational cost functions

1 defcomputecost (X, Y, M, theta):2Pre = X*theta#predicted value3s =04      forIinchRange (m):5s + = (Pre[i]-y[i]) **26J = 1/(2*m) *s#cost function7     returnJ8 9 TenX = datamat[:, 0:2] Oney = datamat[:, 2] Am = Len (y)#Number of Samples -theta = Zeros ((2, 1))#Initialize Theta -iterations = 1500#Number of iterations the  -J = Computecost (X, Y, M, theta)

The above code has well implemented the cost function algorithm, because our initialization theta value is 0, so the initial value of J is also 0, then we need to use gradient descent algorithm to calculate theta0 and Theta1, So first on a coursera above the assignment i use MATLAB drawings (data is not the same), followed by the Python code to implement the type diagram:

It is the case that theta0 and theta1 converge to the best hypothetical function throughout the iteration (J0->J3), which is the result of multiple iterations of the initial value theta0=theta1=0 to the optimal value J3, the Red Fork is J (Theta) The corresponding optimal theta value of a global optimal solution. At this time, the cost is minimal, the most predictable results; the assumption function obtained by the theta into the hypothesis function is exactly the regression function we need, the highest fitting.

From J0 to J1 this process is the cost function J (theta0, THETA1) respectively on the THETA0 and theta1 to the bias, for example, from J0 to J1 Slope is theta0 and theta1 of the bias guide, like people downhill, slope is to choose the direction of downhill, angle While THETA0 and Theta1 are two of the same downhill villain, the pace of stride is the size of learning speed alpha. So whether two small men can walk to the bottom of the slope is determined by their initial position (half of the initial position is initialized to 0) and the direction of the descent (biased) and the downhill pace (learning Speed alpha).

In general, the gradient drop formula is:
Repeat the steps until convergence:

Gee, crossing may look tired, and then there are exams, first write so much, the following gives a gradient descent algorithm of the specific implementation code ~

Python machine learning "regression One"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.