found on the internet there are a lot of principles to explain, in fact, this everyone will almost, very few provide code reference, I here Python directly realized, the back will also implement the neural network, regression tree and other types of machine learning algorithms
first to a small test sledgehammer, personal expression ability is not very good, we forgive
briefly say your own understanding : train a linear Regression, given a set of eigenvalues ([x1,x2,x3,x4,..., xn]) and the corresponding result Y, the goal is to get a model W weight vector Figure AFigure one represents the original training data set (of course the training scale is much larger than this, here is just a graphical understanding) Add a column full of 1, and then use the trained weight w (how to train) and matrix multiplication can predict the result, here is the original data set and trained W to calculate the cost of the model later, of course, given a set of new data, you can use W to predict the results.
figure two calculating Y values with weights
Figure Mihara The Y value of the data
the expression is a universal form
in this example, our predictive model is hypothesis, the non-regularization cost function J, followed by the use of regularization to eliminate overfitting (what is called a fit, for example, simply, the performance of the training set is too good to be seen as 100% correct predictive training set, Is the model you trained, which is done on the original data set, and the predicted result is 100% correct. Then there is a problem, originally some interference value itself is a disturbance, and your model can not be ruled out, then in the future data set you will predict the error. In other words, you can't be very good at fault tolerance. We want to have a good promotion ability, but not the only training set.
Note that you want to update all theta values at the same timeMedium J cannot prevent overfitting, so the formula in use
Our goal is that if J tries to be equal to 0, the equivalent of guaranteeing you and the case, let the errors in the data set make as little as possible. Note that the expression of the predictive function is observed, the coefficient of theta0 is 0, and for the unified form of the derivation of J, the X data set is filled with a column of 1so for my realization of the form, the code for J is the cost function, note that regularization theta is not including THETA0 (specific understanding I was understood by the limit method, if the lambda is very large, then all the theta are small, leaving theta0 to guaranteecorrectly, it can be understood that THETA0 defines a basic tone, similar to an average meaning, and then other theta to fine-tune)
<pre name= "code" class= "Python" >def countcostfunc (X, y, theta, Lamb): #X为训练集, y means that the Lable,theta on the training set represents weights, Lamb is a regular constant #注意这里的X, theta have been expanded, X added a column 1,theta added theta0, this is mainly for the uniform operation of the vector m, n = Np.shape (X) #X矩阵行m, column n H = Np.dot (X, theta) #举证相乘 Regtheta = np.copy (theta) #深拷贝, because, Python is a reference, regtheta[0,0] = 0 J = ((Np.dot ( H-Y). Transpose (), (h-y)) + Lamb * Np.dot (Regtheta.transpose (), Regtheta))/float (2 * m)) grad = (1 / Float (M)) * (Np.dot (X.transpose (), h-y) + Lamb * regtheta) #对所有theta求导, this is the J function to the Theta respectively after the derivation, and then into a column vector return J, grad
#theta多了theta0但是X还没有增加1列, using a trained model, the predicted result def predict (X, theta): x = Np.array (x) m,n = Np.shape (x) X = Np.hstack ([Np.ones ((M, 1), X]) #为X增加一列1 h = Np.dot (x, theta) return h
The following Python code is an example of implementing a complete linear Regression, which is illustrated by the code, noting that the theta in the section below is the W weight mentioned earliergenerate simple datasets, consider simple visualizations, and instead of using 80M datasets, temporarily generate simple datasets. The code is as followssome imported files and libraries, specific instructions look at the back, python2.7
#coding =utf-8import numpy as Npimport granddescentimport costfunctionimport loaddataimport predictimport Write2fileimport rmseimport osimport randomimport matplotlib.pyplot as pltfrom pylab import plot,showfrom scipy import STA Ts
Generates a dataset with a horizontal axis of x and an ordinate of Y
Def handle_data_self (): x_train = Np.array ([0,1,2,3,4,5,6,7,8,9]). Reshape (1) y_train = Np.array ([ 0,2,3,4,5,6,7,4,9,10]). Reshape (1) X_cross = np.copy (x_train) Y_cross = np.copy (y_train) x_test = Np.array ([0,1,2,3,4,5,6,7,8,9, 10,11,23,1]). Reshape (14,1) plt.scatter (X_train, Y_train, color= "Red") # Plt.show () return x_train,y_train,x_cross,y_cross,x_test
finally post the main program as follows main.py
<pre name= "code" class= "Python" > #coding =utf-8import numpy as NP #系统库import Granddescentimport Costfunctionimport Loaddataimport predictimport write2fileimport rmseimport os #库 import random #库import matplotlib.pyplot as PLT #库def Regr Ession_simple (X, y, X1, y1, option): Alpha = float (option["alpha"]) maxcycles = Int (option["maxcycle"]) Lamb = FL Oat (option["lamb"]) Save = bool (option["Saverecord"]) add = bool (option["add"]) Optgoal = option["Optgoal"] me Thods = option["Method"] Thetapath = option["Thetawritepath"] if Option.has_key ("Thetawritepath") Else None m,n = NP . Shape (X) J_train = None theta = None if methods = = "Stocgraddescent": j_train, theta = GRANDDESCENT.STOCG Raddescent (Np.copy (X), y, maxcycles, alpha, lamb) elif methods = = "Granddescent": j_train, theta = granddescent. Granddescent (Np.copy (X), y, maxcycles, alpha, lamb) J_cross = Costfunction.countcostfunc (Np.hstack ([Np.ones (X1.shape [0], 1)), X1]), Y1, Theta,Lamb) Rmseresult = Rmse.countrmse (Np.copy (X1), Y1,theta) #后面都是保存一些东西 if Save and Thetapath:if not add and OS. Path.exists (Thetapath): Os.remove (thetapath) File_object = None if not Add:file_objec t = open (Thetapath, ' W ') elif os.path.exists (thetapath): File_object = open (Thetapath, ' a ') Else: File_object = open (Thetapath, ' W ') file_object.write ("j_traincost=>" + str (j_train) + ", j_crosscost=& gt; "+ str (j_cross[0]) +",alpha=> "+ str (alpha) +",lamb=> "+ str (Lamb) +", cycles=> "+ str (maxcycles) +", RM Se=> "+ str (rmseresult) +" \ n "+",theata==> "+str (Theta.transpose ()) +" \ n ") file_object.close () J = None if optgoal = = "J_train": j = j_train elif Optgoal = = "J_cross": j = j_cross elif Optgoal = = "Rmse": J = Rmseresult else:print "Optgoal fault!!!" Return J, Thetadef handle_data_self (): X_train = Np.array ([0,1,2,3,4,5,6,7,8, 9]). Reshape (1) y_train = Np.array ([0,2,3,4,5,6,7,4,9,10]). Reshape (1) X_cross = Np.copy (x_train) Y_cross = Np.copy (y_train) x_test = Np.array ([0,1,2,3,4,5,6,7,8,9, 10,11,23,1]). Reshape (14,1) plt.scatter (X_train, Y_train, Color= "Red") plt.show () return x_train,y_train,x_cross,y_cross,x_testif __name__ = = "__main__": print "load data. .." X_train, Y_train, X_cross, y_cross, x_test = handle_data_self () print "Load data finished" print "Enter traing ..." ac tion = "regression_granddescent" J = none theta = None if action = = "Regression_granddescent": option = {" Maxcycle ":", "alpha": 0.05, "lamb": 0.001, "Saverecord": 1, "Thetawritepath": "./thetasave.txt", "add" : 1, "Optgoal": "J_train", "Method": "Granddescent"} J, theta = Regression_simple (X_train.copy (), Y_ Train.copy (), X_cross.copy (), y_cross.copy (), option) Plt.plot (X_train, Predict.predict (X_train, theta), color= "Gre En ") plt.sHow () print "Complete traing" if action = = "Regression_granddescent" or action = = "Regression_granddescentwithbestalphaa Ndlamb "or action = =" Regression_stocgraddescent ": Y_pre = predict.predict (x_test, theta) write2file.savepre Diction (Y_pre, path= "Sample_submission.csv") print "forecast complete", "Save results to Sample_submission.csv"
which
X_train, Y_train, X_cross, y_cross, x_test = Handle_data_self ()
Here is the original DataSet x, original DataSet y, cross validation set X, cross validation y, detection set X
Original DataSet x: is the characteristic value of your training sample, note that there is no X0 added here, and then add it elsewhere, why add a column to unify the derivative calculation of the back,
granddescent.py file
<pre name= "code" class= "Python" > #coding =utf-8import numpy as Npimport operatorimport copyimport mathimport Costfunctiondef granddescent (x, y, maxcycles, alpha, Lamb): m,n = Np.shape (x) theta = Np.zeros ((n + 1, 1)) X = Np.hstack ([Np.ones ((m,1)), X]) j = 0 for i in range (Maxcycles): j, Grad = Costfunction.countcostfunc ( Np.copy (X), Y, Np.copy (theta), lamb) #print "J", I+1, "sub-==>", j theta = Theta-alpha * Grad return J, Thet A
costfunction.py file
<pre name= "code" class= "Python" > #coding =utf-8import numpy as Npimport operatorimport copyimport mathdef Countcostfunc (x, Y, theta, Lamb): m, n = np.shape (x) h = Np.dot (x, theta) Regtheta = np.copy (theta) regtheta[0,0] = 0 J = ((Np.dot ((h-y). Transpose (), (h-y)) + Lamb * Np.dot (Regtheta.transpose (), Regtheta))/ Float (2 * m)) grad = (1 /float (m)) * (Np.dot (X.transpose (), h-y) + Lamb * regtheta) return J, GRA D
rmse.py file
#coding =utf-8import NumPy as Npimport Operatorimport copyimport mathimport costfunctiondef countrmse (x, Y, theta): M,n = Np.shape (x) theta = Theta #扩展1 x = Np.hstack ([Np.ones ((m,1)), x]) H = Np.dot (X, theta) #按列求和 Sumdiffy = Np.dot ((h-y). Transpose (), (h-y) ) J = Sumdiffy/float (M) #print Np.shape (j) return Math.sqrt (j)
predict.py file
#coding =utf-8import NumPy as Npimport costfunction#theta more theta0 but X has not increased by 1 columns def predict (x, theta): x = Np.array (x) m,n = Np.shape (X) x = Np.hstack ([Np.ones ((M, 1), X]) h = Np.dot (x, theta) return h
write2file.py file
def saveprediction (Y_pre, path): file_object = open (Path, ' W ') file_object.write ("id,reference\n") for I In range (len (y_pre)): file_object.write (str (i) + "," + str (y_pre[i][0]) + "\ n")
The most red results are as follows: The red Dot predicts the original data set, and the Green line represents the result of the Theta calculated with the trained amount .
In This paper, we use the method of gradient descent to realize the regularization of linear Regression by using Python .summarize some important places,1 I do not preprocess the data here, because each feature is worth the order of magnitude to keep. There are two ways toThe first is generally used to change all datasets to a mean of 0, with a variance of 1 (x= (x-this column mean)/the standard deviation of this column)The second becomes-between 0.5 and +0.5 (x= (x-min)/(max-min) the minimum and maximum values for each column are evaluated, and then each value is treated as such ),
2. Increase the data set a column 1,theta increase theta0, is to be able to calculate the gradient uniformly, without the THETA0 special calculation
3. In addition to gradient descent (small-scale data)There is also a random gradient drop (suitable for large-scale data, such as the amount of g above, not shown, if needed, will be given later)linear algebra is used to direct the method of X*theat = Y (This method has certain conditions limited, generally not used), to solve the irreversible problem, but because the matrix inversion in the matrix is very large, the calculation error is very large, so generally do not use
4 regularization is the use of L2 squared, so good to calculate the derivative, there is better L1, the use of absolute value, the effect is better but not good to calculate the derivative, so here does not involve5 Here are just two theta0 and theta1, slightly larger, with tens of thousands of them, and the predicted values are weighted in a linear way.
6 also the key is that alpha and lambda settings are XP, AlphaGenerally, you try the combination .
Alphas = [0.10.050.010.005 0.001 0.0005 0.0001 lambs = [1,0.1,0.010.001]
7 parallelization is not yet involved, because industrialization is required to be distributed, trained in a limited time, linear model is simple, the prediction speed is fast, still is a mainstream algorithm
Machine learning (i)-------linear regression (Linear regression)