Golf distance accuracy data fitting--gradient descent method detailed

Source: Internet
Author: User

Dataset

The data set Pga.csv of this article contains the service statistics for professional golfers, including two attributes: accuracy and distance. Accuracy accuracy describes the ratio of hit lanes (fairways hits), distances describes the average distance of the tee. Our goal is to use distance to predict the hit rate. In golf, the farther a person serves, the less accurate it is.

    • For many machine learning algorithms, the input data will be pre-processed first, such as normalization, because when the distance of the two eigenvector is calculated, when the value of a feature is large, then the distance will be biased to the larger value of that feature. So the accuracy here is the percentage, and the distance is the number of yards. The normalization method used here is to subtract the mean from each element and then divide by the standard variance.
import pandasimport matplotlib.pyplot as pltpga = pandas.read_csv("pga.csv")# Normalize the data  DataFrame可以用{.属性名}来调用一列数据的方式,返回ndarray对象pga.distance = (pga.distance - pga.distance.mean()) / pga.distance.std()pga.accuracy = (pga.accuracy - pga.accuracy.mean()) / pga.accuracy.std()print(pga.head())plt.scatter(pga.distance, pga.accuracy)plt.xlabel(‘normalized distance‘)plt.ylabel(‘normalized accuracy‘)plt.show()‘‘‘   distance  accuracy0  0.314379 -0.7077271  1.693777 -1.5866692 -0.059695 -0.1766993 -0.574047  0.3726404  1.343083 -1.934584‘‘‘

Linear Model
    • Observing the data scatter plot, we find that the distance and accuracy are negatively correlated. First, the basic linear regression linearregression is used to fit the data:
fromimport LinearRegressionimportas np# We can add a dimension to an array by using np.newaxisprint("Shape of the series:"#这是一个ndarray对象,但是大小没有指定,因此需要人为指定一个print("Shape with newaxis:"‘‘‘Shape of the series: (197,)Shape with newaxis: (197, 1)‘‘‘# The X variable in LinearRegression.fit() must have 2 dimensionslm = LinearRegression()lm.fit(pga.distance[:, np.newaxis], pga.accuracy)theta1 = lm.coef_[0]
Cost Function, Introduction
    • The linear regression above uses the package in Sklearn to estimate the parameters of the model, using the least squares method. The least squares method can be used to fit the linear model efficiently and provide the exact parameter values. However, when the matrix is too large, that is, the training model is too large, the direct use of matrix operation is not realistic, it is necessary to use some iterative method to solve the parameter estimation. Gradient descent is a common iterative algorithm.
# The cost function of a single variable linear model def cost(THETA0, theta1, x, y):    # Initialize CostJ =0    # The number of observationsm = Len (x)# Loop through each observation     forIinchRange (m):# Compute the hypothesisH = theta1 * X[i] + theta0# ADD to costJ + = (H-y[i]) * *2    # Average and normalize costJ/= (2*M)returnJ# The cost for theta0=0 and theta1=1Print (Cost (0,1, Pga.distance, pga.accuracy)) theta0 = -Theta1s = Np.linspace (-3,2, -) costs = [] forTheta1inchTheta1s:costs.append (Cost (THETA0, Theta1, Pga.distance, pga.accuracy)) Plt.plot (Theta1s, costs)

    • The work of this code above is to set the intercept to 100 and the coefficients to 100 equidistant values from 3 to 2. Then calculate the error of the model corresponding to each coefficient, the error formula is as follows, draw the graph of coefficient and error. It is found that at around 0.7, the model has the smallest error.

ImportNumPy asNp fromMpl_toolkits.mplot3dImportaxes3dtheta0s = Np.linspace (-2,2, -) Theta1s = Np.linspace (-2,2, -) cost = Np.empty (Shape= ( -, -))# t0s: For 100theta0s (theta1s length) row theta0s,t1s: 100 columns (theta0s length) Theta1st0s, T1s = Np.meshgrid (theta0s, Theta1s)# for each parameter combination compute the cost forIinchRange -): forJinchRange -): cost[i,j] = Cost (t0s[0, I], t1s[j,0], pga.distance, pga.accuracy)# make 3d plotFig2 = plt.figure () ax = FIG2.GCA (projection=' 3d ') Ax.plot_surface (X=t0s,y=t1s,z=cost) plt.show ()

    • Above this code first uses a new function Meshgrid, the parameter is two arrays, the first length is M, the second length is n. This returns the N-row copy of the first array, and the M-column copy of the second array. As an example:
      x = [1,2,3],y=[5,6] ———— x=[[1,2,3],[1,2,3]],y=[[5,5,5],[6,6,6]].

    • The code above generates a set of coefficients and draws a 3D graph with the error along with the coefficients. The lowest place in the graph is the optimal solution.

Cost Function, slopes

What are the differences between the least squares and the gradient descent method?

1. Essentially the same: both methods are based on the given known data (independent & dependent variables) to the dependent variables calculated a general valuation function. The dependent variables of the given new data is then estimated.
2. The goal is the same: all within the framework of the known data, so that the total squared difference between the estimate and the actual value is as small as possible (in fact, it may not necessarily be squared).
3. The implementation method and the result are different: the least squares is the direct derivation to find the global minimum, non-iterative method. The gradient descent method is an iterative method, which first gives a parameter vector, then adjusts the parameters to the quickest direction of the error value, and finds the local minimum after several iterations. The disadvantage of gradient descent method is that the convergence speed slows down to the minimum point, and the selection of the initial point is very sensitive, and the improvement is mostly in these two aspects.

Of course, there are other uses for gradient descent, such as other extremum problems. In addition, Newton method is a good method, iterative convergence faster than gradient descent method, but the calculation cost is relatively high.

    • Gradient Descent method: Initialize the model parameters to W0, and then error about each parameter of the bias, the reverse direction of the bias is the fastest descent direction, so the parameter minus the value of the bias, you can also add the learning rate (falling speed):

    • Calculates the first parameter, theta0, of the bias Guide:
# Partial derivative of cost in terms of theta0def partial_cost_theta0(theta0, theta1, x, y):    # Hypothesis    h = theta0 + theta1*x    # Difference between hypothesis and observation    diff = (h - y)    # Compute partial derivative    partial = diff.sum() / (x.shape[0])    return partialpartial0 = partial_cost_theta0(11, pga.distance, pga.accuracy)
Gradient Descent algorithm
    • From the previous visualization, it can be visually felt that different slope and intercept will bring different errors, in order to reduce the error of the model, we must find the optimal value of the parameters in the error function. The following code is a detailed code to calculate the gradient descent, one go, combined with the previous function, so in the completion of a large project, pre-written some small functions, and then at the end, stitching together to complete a good function: here to use the previous calculation error function cost (), And the calculation of the biased function partial_cost_theta0 (), learn this way of programming.
# x is our feature vector--distance# y is our target variable--accuracy# Alpha is the learning rate# theta0 is the intial theta0# theta1 is the intial theta1 def gradient_descent(x, y, alpha=0.1, theta0=0, theta1=0 ):Max_epochs = + # Maximum number of iterationsCounter =0      # intialize a counterc = Cost (Theta1, THETA0, Pga.distance, pga.accuracy)# # Initial Costcosts = [C]# Lets Store each update    # Set A convergence threshold to find where the cost function in minimized    # When the difference between the previous    # is less than this value we'll say the parameters convergedConvergence_thres =0.000001Cprev = C +Tentheta0s = [THETA0] Theta1s = [THETA1]# When the costs converge or we hit a large number of iterations would we stop updating     while(Np.abs (CPREV-C) > Convergence_thres) and(Counter < Max_epochs): Cprev = C# Alpha times the partial deriviative is our updatedUPDATE0 = Alpha * PARTIAL_COST_THETA0 (THETA0, theta1, x, y) update1 = Alpha * PARTIAL_COST_THETA1 (THETA0, theta1, X , y)# Update THETA0 and Theta1 at the same time        # We want to compute the slopes at the same set of hypothesised parameters        # So we update after finding the partial derivativesTHETA0-= update0 theta1-= update1# Store ThetasTheta0s.append (THETA0) theta1s.append (THETA1)# Compute The new costc = Cost (THETA0, Theta1, Pga.distance, pga.accuracy)# Store UpdatesCosts.append (c) Counter + =1   # Count    return{' theta0 ': THETA0,' Theta1 ': Theta1,"Costs": Costs}print ("Theta1 =", Gradient_descent (Pga.distance, pga.accuracy) [' Theta1 ']) descend = Gradient_descent (Pga.distance, Pga.accuracy, alpha=.) Plt.scatter (Range (len (descend["Costs"]), descend["Costs"]) Plt.show ()

Golf distance accuracy data fitting--gradient descent method detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.