Data
concept
The basic goal behind simple linear regression modeling is from the right
XValues and
YValue (that is,
XAnd
YMeasured values), the most consistent line is found in the two-dimensional plane. Once you use
Minimum Variance methodBy finding this line, you can perform various statistical tests to determine the line and the observed
YThe degree of coincidence of the deviation of the value.
Equations
y = mx + b) has two parameters that must be based on the provided
XAnd
YThe data estimates that they are slopes (
m) and y-axis intercept (
b)。 Once these two parameters are calculated, the observed values can be entered into a linear equation and observed
YThe forecast value.
To use the minimum variance method to estimate
mAnd
bparameter, you will find the estimates for M and b so that they
XIt's worth it.
YThe observed value and the predicted value are the smallest. The difference between observed and predicted values is called error (
y -i (MX i+ b) And, if the sum of each error is squared and then the sum of the residuals is obtained, the result is a
Predictive squared DifferenceThe number. Using the minimum variance method to determine the most consistent line involves looking for the least predicted variance
mAnd
bThe estimated value.
Two basic methods can be used to find estimates that satisfy the minimum variance method.
mAnd
b。 In the first method, you can use a numeric search procedure to set different
mAnd
bValues and evaluate them, and ultimately determine the estimate that produces the minimum variance. The second approach is to use calculus to find a way to estimate
mAnd
bof the equation. I'm not going to delve into the calculus involved in deriving these equations, but I did use these analytic equations in the Simplelinearregression class to find
mAnd
bThe minimum square estimate (see the Getslope () and Getyintercept methods in the Simplelinearregression Class).
Even if you have a way to find
mAnd
bThe equation of the least square estimator does not mean that as long as the arguments are put into a linear equation, the result is a straight line that is in good agreement with the data. The next step in this simple linear regression process is to determine whether the remaining predictive variances are acceptable.
You can use the statistical decision process to veto the alternative hypothesis of "line and data anastomosis." This process is based on the calculation of T statistic value and uses the probability function to obtain the probability of the random large observed value. As mentioned in part 1th, the Simplelinearregression class generates a large number of summary values, one of which is the T statistic, which can be used to measure the degree to which a linear equation fits the data. If the agreement is good, the t statistic is often a larger value; if the T value is small, you should replace your linear equation with a default model that assumes
YThe average of a value is the best predictor (since the average of a set of values can usually be a useful predictor of the next observation).
To test if the T statistic is too large to be used
YThe average value of the values is the best predictor, and the probability of randomly obtaining T-statistic values is computed. If the probability is low, then the invalid assumption that the average value is the best predictor can be avoided, and the simple linear model can be assured that the data is well matched. (For more information on calculating the probability of T statistics, see part 1th.) )
Go back and discuss the statistical decision-making process. It tells you when not to take an invalid hypothesis, but does not tell you whether to accept the optional hypothesis. In the research environment, it is necessary to establish the hypothesis of the linear model by the theoretical parameters and statistical parameters.
The Data research tool that you build implements the statistical decision-making process for linear models (T-Test) and provides summary data that can be used to construct theoretical and statistical parameters that are required to establish a linear model. Data research tools can be categorized as decision support tools for knowledge workers to study patterns in small and medium sized data sets.
From a learning point of view, simple linear regression modeling is worth studying, because it is the only way to understand more advanced forms of statistical modelling. For example, many of the core concepts in simple linear regression have established a good foundation for understanding multiple regressions (multiple regression), factor analysis (Factor analyses) and time series (temporal Series).
Simple linear regression is also a versatile modeling technique. You can use it to model curve data by converting the original data, which is usually logarithmic or power-converted. These transformations allow data to be linearized so that simple linear regression can be used to model the data. The generated linear model will be represented as a linear formula associated with the converted value.
probability function
In the previous article, I managed to avoid the problem of using PHP to implement a probability function by giving R the probability value. I wasn't completely satisfied with the solution, so I started to look at the problem: what it takes to develop a probabilistic function based on PHP.
I started searching the Internet for information and code. A source of both is a book
numerical Recipes in CThe probability function in. I've implemented some of the probability function codes (GAMMLN.C and BETAI.C functions) in PHP, but I'm not satisfied with the results. Compared with some other implementations, the code seems to be a little more. In addition, I also need the inverse probability function.
Luckily, I stumbled upon John Pezzullo's Interactive statistical calculation. John has all the functions I need on the site of the probability distribution function, and for the sake of learning, these functions have been implemented in JavaScript.
I ported the Student T and Fisher F functions to PHP. I made some changes to the API to conform to the Java naming style and embed all functions in a class named distribution. A great feature of this implementation is the Docommonmath method, which is reused by all functions in this library. Other tests (normal and card-side tests) that I don't have the strength to implement also use the Docommonmath method.
Another aspect of this transplant is also noteworthy. By using JavaScript, a user can assign a dynamically determined value to an instance variable, such as:
var PiD2 = Pi ()/2
You cannot do this in PHP. You can only assign simple constant values to instance variables. It is hoped that this flaw will be solved in the PHP5.
Note that the code in Listing 1 does not define instance variables-This is because they are dynamically assigned values in the JavaScript version.
Listing 1. Implementing Probability Functions
<?php//distribution.php//Copyright John Pezullo//released under Same as PHP. PHP Port and OO ' fying by Paul Meagher class Distribution {function Docommonmath ($q, $i, $j, $b) {$zz = 1; $z = $zz; $k = $i; while ($k <= $j) {$zz = $zz * $q * $k/($k-$b); $z = $z + $zz; $k = $k + 2; return $z; function Getstudentt ($t, $df) {$t = ABS ($t); $w = $t/sqrt ($DF); $th = Atan ($w); if ($df = = 1) {return 1-$th/(PI ()/2); $sth = sin ($th); $cth = cos ($th); if (($df% 2) ==1) {return 1-($th + $sth * $cth * $this->docommonmath ($CTH * $cth, 2, $DF-3,-1)) /(Pi ()/2); else {return 1-$sth * $this->docommonmath ($CTH * $cth, 1, $DF-3,-1); } function Getinversestudentt ($p, $df) {$v = 0.5; $DV =0.5; $t = 0; while ($DV > 1e-6) {$t = (1/$v)-1; $DV = $DV/2; if ($this->getstudentt ($t, $DF) > $p) {$v = $v-$DV; else {$v = $v + $dv; } return $t; function Getfisherf ($f, $n 1, $n 2) {//implemented but not shown} function Getinversefisherf ($p, $n 1, $n 2) {//implemented but not shown}}? >
[1] [2] [3] [4] [5] Next page