The importance of the mathematical library of Linux Script Development Technology in PHP-Linux general technology-Linux programming and kernel information. The following is a detailed description. Introduction compared with other open-source languages (such as Perl and Python), the PHP community lacks powerful work to develop a math library. One reason for this situation may be that there are already a large number of mature mathematical tools, which may hinder the Community's self-development of PHP tools. For example, I have studied a powerful tool, S System, which has an impressive set of statistical libraries specially designed to analyze datasets, in 1998, the ACM Award was awarded for its language design. If S or its open source code similar R is just an exec_shell call, why bother using PHP to implement the same statistical computing function? For more information about S System, its ACM award, or R, see references. Isn't that a waste of developer energy? If the motivation for developing a PHP Math library is to save developers' energy and use the best tools to complete the work, the current topic of PHP is very meaningful. On the other hand, the development of PHP math libraries may be encouraged out of Teaching Motivation. For about 10% of people, mathematics is an interesting topic worth exploring. For those who are still familiar with PHP, the development of the PHP Math library can enhance the math learning process. In other words, do not just read the section on T testing, you also need to implement a class that can calculate the corresponding median and display them in the standard format. Through guidance and training, I hope to prove that developing a PHP Math library is not a very difficult task. It may represent an interesting technology and learning challenge. In this article, I will provide a PHP Math library example named SimpleLinearRegression, which demonstrates a common method for developing the PHP Math library. Let's start with discussing some general principles that guide me to develop this SimpleLinearRegression class. I used six general principles to guide the development of the SimpleLinearRegression class. Create a class for each analysis model. Use reverse links for development. A large number of getters are expected. Store intermediate results. Develop preferences for detailed APIs. Perfection is not a goal. Let us study these guidelines one by one in more detail. Each analysis model creates a class. Each major analysis test or process should have a PHP class with the same name as the test or process name, this class contains the input function, the function for calculating the median and aggregate value, and the output function (display the median and aggregate value in text or graphic format on the screen ). In mathematical programming, the objective of coding is usually the standard output value that is expected to be generated by the analytical process (such as MultipleRegression, TimeSeries, or ChiSquared. From the perspective of solving the problem, this means you can use reverse links to develop mathematical methods. For example, the summary output screen displays one or more summary statistical results. These summary statistical results depend on the calculation of intermediate statistical results. These intermediate statistical results may involve intermediate statistical results at a deeper level, and so on. This reverse link-based development method exports the next principle. It is expected that most of the development work of getter mathematics involves calculating the median and aggregate value. In fact, this means that you should not be surprised if your class contains many getter methods for calculating the median and aggregate value. Store intermediate results to store intermediate results in the result object, so that you can use intermediate results as input for subsequent computation. This principle is implemented in S language design. In the current environment, this principle is implemented by selecting instance variables to represent the calculated median and summary results. Develop preferences for detailed APIs. When a naming scheme is set for member functions and instance variables in the SimpleLinearRegression class, I find that if I use a long name (such as getSumSquaredError, instead of getYY2) to describe member functions and instance variables, it is easier to understand the operation content of the function and the meaning of the variables. I did not give up the abbreviated name. However, when I use a short name, I have to provide a comment to fully describe the meaning of the name. In my opinion, naming schemes that are highly abbreviated are common in mathematical programming, but they make it more difficult to understand and prove whether a mathematical routine is step-by-step. Perfection is not the goal. The goal of this coding exercise is not to develop a highly optimized and rigorous mathematical engine for PHP. In the early stages, we should emphasize the importance of analyzing and testing the implementation of learning and solve this problem. Instance variables when modeling statistical tests or processes, You need to specify which instance variables are declared. The selection of instance variables can be determined by the Intermediate Value and the total value generated by the analysis process. Each median value and aggregate value can have a corresponding instance variable that uses the variable value as the object property. I used this analysis to determine which variables are declared for the SimpleLinearRegression class in Listing 1. Similar analysis can be performed on MultipleRegression, ANOVA, or TimeSeries processes. Listing 1. instance variables of the SimpleLinearRegression class The constructor method of the constructor SimpleLinearRegression class accepts one X and one Y vector, each of which has the same number of values. You can also set a confidence interval (confidence interval) with the default value of 95% for your expected Y value ). The constructor method starts by verifying whether the data format is suitable for processing. Once the input vector passes the "equal size" and "value greater than 1" test, the core part of the algorithm is executed. Executing this task involves using a series of getter methods to calculate the median and total value of the statistical process. Assign the return value of each method call to an instance variable of the class. This method is used to store the computing results so that the call routine in the calculation of the front and back links can use the median value and the total value. You can also call the output method of this class to display these results, as described in Listing 2. List 2. Call the class Output Method N = $ numX; $ this-> X = $ X; $ this-> Y = $ Y; $ this-> ConfInt = $ ConfidenceInterval; $ this-& gt; Alpha = (1 + ($ this-& gt; ConfInt/100)/2; $ this-> XMean = $ this-> getMean ($ this-> X); $ this-> YMean = $ this-> getMean ($ this-> Y ); $ this-> SumXX = $ this-> getSumXX (); $ this-> SumYY = $ this-> getSumYY (); $ this-> SumXY = $ this-> getSumXY (); $ this-> Slope = $ this-> getSlope (); $ this-> YInt = $ this-> getYInt (); $ this-> PredictedY = $ t His-> getPredictedY (); $ this-> Error = $ this-> getError (); $ this-> SquaredError = $ this-> getSquaredError (); $ this-> SumError = $ this-> getSumError (); $ this-> TotalError = $ this-> getTotalError (); $ this-> SumSquaredError = $ this-> getSumSquaredError (); $ this-> ErrorVariance = $ this-> getErrorVariance (); $ this-> StdErr = $ this-> getStdErr (); $ this-> SlopeStdErr = $ this-> getSlopeStdErr (); $ this-> YIntStdErr = $ this-> GetYIntStdErr (); $ this-> SlopeTVal = $ this-> getSlopeTVal (); $ this-> YIntTVal = $ this-> getYIntTVal (); $ this-> R = $ this-> getR (); $ this-> RSquared = $ this-> getRSquared (); $ this-> DF = $ this-> getDF (); $ this-> SlopeProb = $ this-> getStudentProb ($ this-> SlopeTVal, $ this-> DF ); $ this-> YIntProb = $ this-> getStudentProb ($ this-> YIntTVal, $ this-> DF ); $ this-> AlphaTVal = $ this-> getInverseStudentProb ($ this-> Alpha, $ thi S-> DF); $ this-> ConfIntOfSlope = $ this-> getConfIntOfSlope (); return true ;}?> The method name and its sequence are derived by combining reverse links and referring to the statistical textbooks used by undergraduate students. This textbook illustrates how to calculate the median value step by step. The name of the median that I want to calculate carries the "get" prefix to push and export the method name. The SimpleLinearRegression process is used to generate a line that matches the data. The line has the following standard equation: y = B + mx. The PHP format of this equation looks similar to listing 3: listing 3. PHP equations that match the model and data $ PredictedY [$ I] = $ YIntercept + $ Slope * $ X [$ I] SimpleLinearRegression class use the least square method to derive Y-axis intercept (Y Intercept) and the estimated value of the Slope parameter. These estimated parameters are used to construct a linear equation (see listing 3), which models the relationship between X and Y values. Using the derived linear equation, you can obtain the predicted Y value corresponding to each X value. If the linear equation is very consistent with the data, the observed values of Y are close to the predicted values. How to determine whether the SimpleLinearRegression class generates a considerable number of summary values. An important summary value is the T statistical value, which can be used to measure the degree of fit between a linear equation and data. If they are very consistent, the T value is usually very large. If the T statistic value is small, a model should be used to replace the linear equation. This model assumes that the mean value of Y is the best predicted value (that is, the mean value of a set of values is usually a useful predicted value of the next observed value, making it the default model ). To test whether the T statistic value is large enough to not take the mean value of Y as the best predicted value, you need to calculate the random probability of getting the T statistic value. If the probability of getting the T statistical value is very low, you can deny the invalid assumption that the mean is the best predicted value. Correspondingly, you are sure that the simple linear model is very consistent with the data. So how can we calculate the probability of T Statistic Values? Calculate the probability of the T statistical value because PHP lacks a mathematical routine for calculating the probability of the T statistical value, so I decided to hand this task over to the statistical calculation package R (see references for www.r-project.org) to obtain the required values. I would also like to remind you to pay attention to this package, because R provides a lot of ideas that PHP developers may simulate in the PHP Math library. With R, you can determine whether the values obtained from the PHP mathematical library are consistent with those obtained from the mature free and available open source Statistical Package. The code in Listing 4 demonstrates how easy it is to hand it over to R to get a value. Listing 4. submit it to the R statistical calculation package for processing to obtain a value. RPath -- slave "; $ result = shell_exec ($ cmd); list ($ LineNumber, $ Probability) = explode (" ", trim ($ result); return $ Probability ;} function getInverseStudentProb ($ alpha, $ df) {$ InverseProbability = 0.0; $ cmd = "echo 'qt ($ alpha, $ df) '| $ this-> RPath -- slave "; $ result = shell_exec ($ cmd); list ($ LineNumber, $ InverseProbability) = explode ("", trim ($ result); return $ InverseProbability ;}}?> Note that the path to the R executable file has been set and used in the two functions. The first function returns the probability value related to the T statistical value based on the student's T distribution, and the second inverse function calculates the T statistical value corresponding to the given alpha setting. The getStudentProb method is used to evaluate the degree of fit of the linear model. The getInverseStudentProb method returns an intermediate value, which is used to calculate the confidence interval of each predicted Y value. Due to limited space, it is impossible for me to detail all the functions in this class one by one. Therefore, if you want to understand the terms and steps involved in simple linear regression analysis, I encourage you to refer to the statistical textbooks used by undergraduate students. The fuel consumption research will demonstrate how to use this class. I can use data from the burnout research in public utilities. Michael Leiter and Kimberly Ann Meechan studied the relationship between the units of consumption measurement called the Exhaustion Index and the independent variables called Concentration. Concentration refers to the proportion of people's social interactions from their work environment. To study the relationship between personal consumption exponent values in their samples and concentration values, load these values into the appropriate named array and instantiate the class with these array values. After the class is instantiated, some summary values generated by the class are displayed to evaluate the degree of fit between the linear model and the data. Listing 5 shows the scripts for loading data and displaying summary values: Listing 5. scripts for loading data and displaying summary values Format, $ slr-> YInt); $ Slope = sprintf ($ slr-> format, $ slr-> Slope); $ SlopeTVal = sprintf ($ slr-> format, $ slr-> SlopeTVal); $ SlopeProb = sprintf ("% 01.6f", $ slr-> SlopeProb);?>