Concept
The basic goal behind simple linear regression modeling is to
XValues and
YValue (that is,
XAnd
YMeasured values) to find the most consistent line in a two-dimensional plane. Once used
Minimum Variance methodIf you find this line, you can perform various statistical tests to determine the line and the observed
YThe deviation of the value is consistent with the degree.
Linear equations
y = mx + b) There are two parameters that must be based on the provided
XAnd
YThe data is estimated, they are the slope (
m) and y-axis intercept (
b)。 Once the two parameters are estimated, the observed values can be entered into a linear equation, and the resulting
YPredicted value.
The minimum variance method is used to estimate
mAnd
bParameters, you will find the estimates for M and b so that they are
XWorth the
YValues are minimized with the observed and predicted values. The difference between the observed value and the predicted value is called the error (
y i-(MX i+ b)), and if each error value is squared, then the sum of the residuals is obtained, and the result is a known
predicted squared differenceThe number. Using the minimum variance method to determine the most consistent line involves looking for the least predictable variance
mAnd
bThe estimated value.
There are two basic ways to find estimates that satisfy the minimum variance method
mAnd
b。 The first method, you can use the numeric search process to set different
mAnd
bValues and evaluates them, resulting in an estimate of the minimum variance. The second approach is to use calculus to find an estimate
mAnd
bThe equation. I'm not going to delve into the calculus involved in deriving these equations, but I did use these analytic equations in the Simplelinearregression class to find
mAnd
bMinimum squared estimate (see Getslope () and Getyintercept methods in the Simplelinearregression Class).
Even if you have something you can use to find
mAnd
bThe least square estimate of the equation, also does not mean that as long as these parameters into a linear equation, the result is a good match with the data line. The next step in this simple linear regression process is to determine whether the remaining predicted variances are acceptable.
You can use the statistical decision process to veto the "line-to-data" alternative hypothesis. This process is based on the calculation of the T statistic value, and the probability function is used to obtain a random large observation value. As mentioned in part 1th, the Simplelinearregression class generates a large number of aggregated values, one of which is the T statistic, which can be used to measure how well a linear equation fits the data. If the match is good, the t statistic value is often a large value, and if the T value is small, you should replace your linear equation with a default model that assumes
YThe average value is the best predictor (because the average of a set of values can usually be a useful predictor of the next observation).
To test whether the T statistic value is large to be
YThe average value of the values as the best predictor, the probability of randomly acquiring T-statistic values needs to be calculated. If the probability is low, then it is not possible to take an invalid assumption that the mean is the best predictor, and to be confident that the simple linear model fits well with the data. (For more information on calculating the probability of a T statistic, see part 1th.) )
Go back and discuss the statistical decision-making process. It tells you when not to use the null hypothesis, but does not tell you whether to accept the alternative hypothesis. In the research environment, it is necessary to establish the linear model hypothesis by theoretical parameters and statistical parameters.
You will build a data research tool that implements the statistical decision-making process for a linear model (T-test) and provides aggregated data that can be used to construct theoretical and statistical parameters that are needed to build a linear model. Data research tools can be categorized as decision support tools for knowledge workers to study patterns in small and medium-sized data sets.
From a learning point of view, simple linear regression modeling is worth studying because it is the only way to understand more advanced forms of statistical modeling. For example, many of the core concepts in simple linear regression have established a good foundation for understanding multiple regression (multiple Regression), feature analysis (Factor analyses), and time series.
Simple linear regression is also a multi-purpose modeling technique. You can use it to model curve data by converting the original data, which is usually converted by a logarithmic or a power. These transformations allow data to be linearized so that simple linear regression can be used to model the data. The resulting linear model is represented as a linear formula associated with the converted value.
probability function
In the previous article, I passed R to get the probability value, thus avoiding the problem of implementing probability function with PHP. I wasn't completely satisfied with the solution, so I started to look at the question: what is needed to develop a PHP-based probabilistic function.
I started surfing the internet to find information and code. A source of both is books
numerical Recipes in CThe probability function in the. I used PHP to re-implement some probability function code (GAMMLN.C and BETAI.C functions), but I was not satisfied with the result. The code seems to be a little bit more than some of the other implementations. In addition, I also need inverse probability function.
Fortunately, I stumbled upon John Pezzullo's Interactive statistical calculation. John has all the functions I need on the site of the probability distribution function, which has been implemented in JavaScript for ease of learning.
I ported the Student T and Fisher F functions to PHP. I made a little change to the API to conform to the Java naming style and embed all the functions into a class named distribution. A great feature of this implementation is the Docommonmath method, which all the functions in the library reuse. Other tests (normal and Chi-square) that I don't have the strength to implement also use the Docommonmath method.
Another aspect of this transplant is also noteworthy. By using JavaScript, users can assign dynamically determined values to instance variables, such as:
This cannot be done in PHP. You can assign a simple constant value to an instance variable only. Hopefully, this flaw will be resolved in PHP5.
Note that the code in Listing 1 does not define instance variables-this is because in JavaScript versions, they are dynamically assigned values.
Listing 1: Implementing the probability function
<?php//distribution.php//Copyright John Pezullo//released under same TE RMS as PHP. PHP Port and oofying by Paul Meagher class Distribution {function Docommonmath ($q, $i, $j, $b) { $zz = 1; $z = $zz; $k = $i; while ($k <= $j) {$zz = $zz * $q * $k/($k-$b); $z = $z + $zz; $k = $k + 2; } return $z; } function Getstudentt ($t, $df) {$t = ABS ($t); $w = $t/sqrt ($DF); $th = Atan ($w); if ($df = = 1) {return 1-$th/(PI ()/2); } $sth = sin ($th); $cth = cos ($th); if ($df% 2) ==1 {return 1-($th + $sth * $cth * $this->docommonmath ($CTH * $cth, 2, $DF-3,- 1))/(PI ()/2); } else {return 1-$sth * $this->docommonmath ($CTH * $cth, 1, $DF-3,-1); }} function Getinversestudentt ($p, $df) {$v = 0.5; $DV = 0.5; $t = 0; while ($DV > 1e-6) {$t = (1/$v)-1; $DV = $DV/2; if ($this->getstudentt ($t, $DF) > $p) {$v = $v-$DV; } else {$v = $v + $dv; }} return $t; } function Getfisherf ($f, $n 1, $n 2) {//implemented but not shown} function Getinversefisherf ($p, $n 1, $n 2) {//implemented but not shown}}? |
Output Method
Now that you have implemented the probability function with PHP, the only problem with developing a PHP-based data research tool is to design a method for displaying the results of the analysis.
A simple solution is to display the values of all instance variables on the screen as needed. In the first article, when showing the linear equation of the fuel consumption study (burnout Study),
TValues and
TProbability, that's what I do. It is helpful to have access to specific values for specific purposes, and simplelinearregression supports such usage.
However, another way to output results is to systematically group the parts of the output. If you study the output of the main statistical software packages used for regression analysis, you will find that they tend to group the output in the same way. They tend to have
Summary tables (Summary table)、
Deviation Value Analysis (analyst of Variance)Table
Parameter estimates (Parameter Estimate)Tables and
R VALUES (r value)。 Similarly, I created some output methods with the following names:
- showsummarytable ()
http://www.bkjia.com/phpjc/ 508478.html www.bkjia.com true http://www.bkjia.com/phpjc/508478.html techarticle concept The basic goal behind simple linear regression modeling is to find the most consistent line in a two-dimensional plane consisting of paired X-and Y-values (i.e., x and y measurements). Once the minimum variance is used ...
-