Data research tools for PHP to implement simple linear regression

Source: Internet
Author: User
The basic goal behind the concept of simple linear regression modeling is to find the most consistent line from the paired two-dimensional plane consisting of X and Y values (that is, X and Y measurements. Once this line is found using the least variance method, various statistical tests can be performed to determine the degree of deviation between this line and the observed Y value. The linear equation (ymxb) has two required parameters. Concept

The basic goal behind simple linear regression modeling is from pairing XValue and YValue (that is XAnd YMeasured value) to find the most consistent line in a two-dimensional plane. Once used Least variance methodFind this line, and you can perform various statistical tests to determine this line and the observed YThe deviation degree of the value.

Linear equation ( Y = mx B) There are two parameters that must be based on the provided XAnd YThe data is estimated as the slope ( M) And y-axis intercept ( B). Once the two parameters are estimated, the observed values can be input into the linear equation, and YPredicted value.

Use the least variance method to estimate MAnd BParameters, we need to find the m and B estimates, so that they for all XWorth It YThe observed values and predicted values are the smallest. The difference between the observed value and the predicted value is called an error ( Y I-(mx I B)), And if we calculate the square for each error value and then calculate the sum of these residual values, the result is Prediction squared difference. Using the least variance method to determine the most consistent line involves finding the rows that minimize the prediction variance MAnd B.

Two basic methods can be used to find the estimated value satisfying the least variance method. MAnd B. The first method can be used to set different values in the numerical search process. MAnd BAnd evaluate them to determine the minimum variance. The second method is to use calculus to locate MAnd B. I am not going to discuss in depth the calculus involved in deriving these equations, but I did use these analytical equations in the SimpleLinearRegression class to find MAnd B(See getSlope () and getYIntercept methods in the SimpleLinearRegression class ).

Even if you have MAnd BThe equation of least square estimate does not mean that as long as these parameters are substituted into the linear equation, the result is a straight line that fits well with the data. The next step in this simple linear regression process is to determine whether the remaining prediction variance is acceptable.

We can use the statistical decision-making process to reject the alternative assumption of "line matching data. This process is based on the calculation of the T statistical value, and the probability function is used to obtain the probability of a random large observed value. As mentioned in Part 1, the SimpleLinearRegression class generates a large number of aggregate values, one of which is the T statistical value, which can be used to measure the degree of fit between linear equations and data. If the matching is good, the T statistical value is usually a large value. if the T value is small, a default model should be used to replace your linear equation. This model assumes that YThe average value of a value is the best predicted value (because the average value of a group of values can usually be a useful predicted value of the next observed value ).

To test whether the T statistic value is large, you can skip this step. YThe average value of the value is used as the best predicted value, and the probability of random T statistic values is calculated. If the probability is low, you can avoid the assumption that the average value is the best predicted value, and you can be sure that the simple linear model is well fit with the data. (For more information about calculating the probability of the T statistic value, see Part 1.)

Go back and discuss the statistical decision-making process. It tells you when to ignore invalid assumptions, but does not tell you whether to accept the alternative assumptions. In the research environment, we need to use theoretical parameters and statistical parameters to establish the alternative hypothesis for linear models.

You have built a data research tool to implement a statistical decision-making process for a linear model (T test), and provided summarized data that can be used to construct theoretical and statistical parameters, these parameters are required to establish a linear model. Data Research tools can be classified as decision-making support tools for knowledge workers to conduct centralized research on small and medium-sized data.

From the perspective of learning, simple linear regression modeling is worth studying, because it is the only way to understand more advanced forms of statistical modeling. For example, many core concepts in simple linear Regression are to understand Multiple Regression, Factor Analysis, and Time Series.

Simple linear regression is also a multi-purpose modeling technology. You can use the original data (usually logarithm or power conversion) to model the curve data. These transformations can make the data linear, so that you can use simple linear regression to model the data. The generated linear model is represented as a linear formula related to the converted value.

   Probability Functions

In Previous articleIn, I use R to obtain the probability value, thus avoiding the problem of using PHP to implement the probability function. I am not completely satisfied with this solution, so I began to study this problem: what is required for developing PHP-based probability functions.

I started searching for information and code online. One of the two sources is books. Numerical Recipes in C. I re-implemented some probability function codes (gammln. c and betai. c functions) using PHP, but I am still not satisfied with the results. Compared with some other implementations, the code seems to be a little more. In addition, I also need anti-probability functions.

Fortunately, I happened to find Interactive Statistical Calculation of John Pezzullo. John's website on probability distribution functions has all the functions I need. for ease of learning, these functions are implemented in JavaScript.

I transplanted Student T and Fisher F functions to PHP. I made some changes to the API to conform to the Java Naming style and embed all functions into the class named Distribution. A great feature of this implementation is the doCommonMath method, which is reused by all functions in this library. The doCommonMath method is also used for other tests (normality and chi-square tests) that I do not have the effort to implement.

Another aspect of this transplantation is worth noting. By using JavaScript, you can assign dynamically determined values to instance variables, for example:

 var PiD2 = pi() / 2  


You cannot do this in PHP. Only simple constant values can be assigned to instance variables. We hope to solve this problem in PHP5.

Note that the code in listing 1 does not define the instance variables-this is because they are dynamically assigned values in the JavaScript version.

Listing 1. implementing probability functions

 <?php   // Distribution.php   // Copyright John Pezullo   // Released under same terms as PHP.   // PHP Port and OO'fying by Paul Meagher   class Distribution {    function doCommonMath($q, $i, $j, $b) {             $zz = 1;      $z  = $zz;      $k  = $i;                      while($k <= $j) {           $zz = $zz * $q * $k / ($k - $b);           $z  = $z   $zz;           $k  = $k   2;      }     return $z;    }            function getStudentT($t, $df) {       $t  = abs($t);      $w  = $t  / sqrt($df);      $th = atan($w);             if ($df == 1) {       return 1 - $th / (pi() / 2);      }           $sth = sin($th);      $cth = cos($th);           if( ($df % 2) ==1 ) {       return        1 - ($th   $sth * $cth * $this->doCommonMath($cth * $cth, 2, $df - 3, -1))                           / (pi()/2);     } else {      return 1 - $sth * $this->doCommonMath($cth * $cth, 1, $df - 3, -1);      }          }          function getInverseStudentT($p, $df) {              $v =  0.5;      $dv = 0.5;      $t  = 0;             while($dv > 1e-6) {       $t = (1 / $v) - 1;       $dv = $dv / 2;       if ( $this->getStudentT($t, $df) > $p) {        $v = $v - $dv;      } else {        $v = $v   $dv;      }      }     return $t;    }          function getFisherF($f, $n1, $n2) {     // implemented but not shown        }    function getInverseFisherF($p, $n1, $n2) {      // implemented but not shown        }   }   ?>  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.