A simple linear regression _php tutorial implemented with PHP

Source: Internet
Author: User
In the 1th part of this two-part series ("Simple linear regression with PHP"), I explained why the math library was useful for PHP. I also demonstrated how to use PHP as the implementation language to develop and implement a simple linear regression algorithm core part.

The goal of this article is to show you how to use the Simplelinearregression class discussed in part 1th to build an important data research tool.

Brief review: Concept

The basic goal behind simple linear regression modeling is to find the most consistent line in a two-dimensional plane consisting of paired X-and Y-values (i.e., x and y measurements). Once the line is found with the least-squares variance method, various statistical tests can be performed to determine how the line corresponds to the observed Y-value deviation.

A linear equation (y = mx + b) has two parameters that must be estimated based on the supplied x and y data, which are slope (m) and y-intercept (b). Once the two parameters are estimated, the observed values can be entered into a linear equation and the Y-predicted values generated by the equation are observed.

To estimate the M and B parameters using the minimum variance method, it is necessary to find the estimates of M and b to minimize the observed and predicted values of all x-worthy y values. The difference between the observed and predicted values is called the error (Y i-(MX i+ b)), and if each error value is squared, then the sum of the residuals is obtained, and the result is a number called the predicted squared difference. Using the minimum variance method to determine the most consistent line involves looking for an estimate of M and b that minimizes the predicted variance.

There are two basic ways to find estimates of M and b that satisfy the minimum variance method. The first method, you can use the numeric search process to set different m and B values and evaluate them, ultimately deciding to produce estimates of the minimum variance. The second approach is to use calculus to find equations for estimating M and b. I'm not going to delve into the calculus involved in deriving these equations, but I did use these analytic equations in the Simplelinearregression class to find the minimum squared estimate for M and B (see the Simplelinearregression class Getslope () and Getyintercept methods).

Even having an equation that can be used to find the minimum squared estimates for M and B does not mean that as long as these parameters are put into a linear equation, the result is a straight line that matches the data well. The next step in this simple linear regression process is to determine whether the remaining predicted variances are acceptable.

You can use the statistical decision process to veto the "line-to-data" alternative hypothesis. This process is based on the calculation of the T statistic value, and the probability function is used to obtain a random large observation value. As mentioned in part 1th, the Simplelinearregression class generates a large number of aggregated values, one of which is the T statistic, which can be used to measure how well a linear equation fits the data. If the tally is good, the t statistic is often a large value, and if the T value is small, you should replace your linear equation with a default model that assumes that the mean of the Y value is the best predictor (because the average of a set of values can usually be a useful predictor of the next observation).

To test whether the T-Statistic is large to be the best predictor without the mean of the Y-value, you need to calculate the probability of randomly acquiring t-statistic values. If the probability is low, then it is not possible to take an invalid assumption that the mean is the best predictor, and to be confident that the simple linear model fits well with the data. (For more information on calculating the probability of a T statistic, see part 1th.) )

Go back and discuss the statistical decision-making process. It tells you when not to use the null hypothesis, but does not tell you whether to accept the alternative hypothesis. In the research environment, it is necessary to establish the linear model hypothesis by theoretical parameters and statistical parameters.

You will build a data research tool that implements the statistical decision-making process for a linear model (T-test) and provides aggregated data that can be used to construct theoretical and statistical parameters that are needed to build a linear model. Data research tools can be categorized as decision support tools for knowledge workers to study patterns in small and medium-sized data sets.

From a learning point of view, simple linear regression modeling is worth studying because it is the only way to understand more advanced forms of statistical modeling. For example, many of the core concepts in simple linear regression have established a good foundation for understanding multiple regression (multiple Regression), feature analysis (Factor analyses), and time series.

Simple linear regression is also a multi-purpose modeling technique. You can use it to model curve data by converting the original data, which is usually converted by a logarithmic or a power. These transformations allow data to be linearized so that simple linear regression can be used to model the data. The resulting linear model is represented as a linear formula associated with the converted value.

Back to top of page

probability function

In the previous article, I passed R to get the probability value, thus avoiding the problem of implementing probability function with PHP. I wasn't completely satisfied with the solution, so I started to look at the question: what is needed to develop a PHP-based probabilistic function.

I started surfing the internet to find information and code. A source of both is the probability function in the book numerical Recipes in C. I used PHP to re-implement some probability function code (GAMMLN.C and BETAI.C functions), but I was not satisfied with the result. The code seems to be a little bit more than some of the other implementations. In addition, I also need inverse probability function.

Fortunately, I stumbled upon John Pezzullo's Interactive statistical calculation. John has all the functions I need on the site of the probability distribution function, which has been implemented in JavaScript for ease of learning.

I ported the Student T and Fisher F functions to PHP. I made a little change to the API to conform to the Java naming style and embed all the functions into a class named distribution. A great feature of this implementation is the Docommonmath method, which all the functions in the library reuse. Other tests (normal and Chi-square) that I don't have the strength to implement also use the Docommonmath method.

Another aspect of this transplant is also noteworthy. By using JavaScript, users can assign dynamically determined values to instance variables, such as:

var PiD2 = Pi ()/2

This cannot be done in PHP. You can assign a simple constant value to an instance variable only. Hopefully, this flaw will be resolved in PHP5.

Note that the code in Listing 1 does not define instance variables-this is because in JavaScript versions, they are dynamically assigned values.

Listing 1. Implementing probability functions


Docommonmath ($CTH * $cth, 2, $df -3,-1)/(PI ()/2); } else {return 1-$sth * $this->docommonmath ($CTH * $cth, 1, $DF-3,-1);}} function Getinversestudentt ($p, $df) {$v = 0.5; $dv = 0.5; $t = 0; while ($dv > 1e-6) {$t = (1/$v)-1; $DV = $DV/ 2; if ($this->getstudentt ($t, $DF) > $p) {$v = $v-$DV;} else {$v = $v + $dv;}} return $t; } function Getfisherf ($f, $n 1, $n 2) {//implemented but not shown} function Getinversefisherf ($p, $n 1, $n 2) {//Impleme Nted but not shown}}?>

Back to top of page

Graphics output

To date, the output methods that you have implemented display summary values in HTML format. It is also suitable for displaying a distribution map (scatter plot) or line plot of these data in GIF, JPEG, or PNG format.

Rather than writing code to generate lines and distributions, I think it's best to use a PHP-based graphics library called Jpgraph. Jpgraph is actively developed by Johan Persson, whose project website describes it as:

Whether for a "quick but inappropriate" graph with minimal code, or for complex professional graphics that require very fine-grained control, jpgraph can make them easier to draw. Jpgraph also applies to scientific and commercial types of graphics.

The jpgraph distribution contains a number of sample scripts that can be tailored to specific needs. Using Jpgraph as a data research tool is very simple, just find a sample script that functions like my needs, and then rewrite the script to meet my specific needs.

The script in Listing 3 is extracted from the Sample Data research tool (explore.php), which demonstrates how to invoke the library and populate the line and scatter classes with data from the simplelinearregression analysis. The comments in this code are written by Johan Persson (the Jpgraph code base does a good job of documenting it).

Listing 3. Details of functions from the Sample Data research tool explore.php


Setscale ("Linlin"); Setup title $graph->title->set ("$title"); $graph->img->setmargin (50,20,20,40); $graph->xaxis->settitle ("$x _name", "center"); $graph->yaxis->settitlemargin (30); $graph->yaxis->title->set ("$y _name"); $graph->title->setfont (Ff_font1,fs_bold); Make sure, the x-axis is always on the//bottom at the plot and not just at y=0 which is//the default position $ Graph->xaxis->setpos (' min '); Create The scatter plot with some nice colors $sp 1 = new Scatterplot ($SLR->y, $SLR->x); $SP 1->mark->settype (mark_filledcircle); $SP 1->mark->setfillcolor ("Red"); $SP 1->setcolor ("Blue"); $SP 1->setweight (3); $SP 1->mark->setwidth (4); Create the regression line $lplot = new LinePlot ($SLR->predictedy, $SLR->x); $lplot->setweight (2); $lplot->setcolor (' Navy '); Add the Pltos to the line $graph->add ($sp 1); $graph->add ($lplot); ... and stroke $graph _name = "Temp/test.png"; $graph->stroKe ($graph _name);?> ' vspace= ' >?>

Back to top of page

Data research Script

The Data research tool consists of a single script (explore.php) that invokes methods of the Simplelinearregressionhtml class and the Jpgraph library.

The script uses simple processing logic. The first part of the script performs basic validation of the submitted form data. If the form data passes validation, the second part of the script is executed.

The second part of the script contains code that parses the data and displays the summary results in HTML and graphic format. The basic structure of the explore.php script is shown in Listing 4:

Listing 4. Structure of the explore.php


$title"; $SLR->showtablesummary ($x _name, $y _name); echo "

"; $SLR->showanalysisofvariance (); echo "

"; $SLR->showparameterestimates ($x _name, $y _name); echo "
"; $SLR->showformula ($x _name, $y _name); echo "

"; $SLR->showrvalues ($x _name, $y _name); echo "
"; Include ("jpgraph/jpgraph.php"); Include ("jpgraph/jpgraph_scatter.php"); Include ("jpgraph/jpgraph_line.php"); The code for displaying the graphics are inline in the//explore.php script. The code for these-plots//finishes off the script://omitted code for displaying scatter plus line plot//Omi tted code for displaying residuals plot}?>

http://www.bkjia.com/PHPjc/371643.html www.bkjia.com true http://www.bkjia.com/PHPjc/371643.html techarticle In the 1th part of This two-part series (Simple linear regression implemented in PHP), I explained why the math library was useful for PHP. I also demonstrated how to use PHP ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.