The nonlinear regression is the process of optimizing the parameters of nonlinear function under the precondition of the nonlinear relation of the variables, and the optimized parameters make the model's RSS (residual squared sum) minimum. The most commonly used nonlinear regression modeling function in the R language is the NLS, which is explained using the Uspop dataset in the car package as an example. The population represents the number of people in the data, and year represents years. If you plot the two scatter plots, you can discover the nonlinear relationships between them. In establishing the nonlinear regression model, it is necessary to determine two things in advance, one is the nonlinear function form and the other is the initial value of the parameter.
first, the model fits
For the population model, it can take the form of the logistic growth function, which considers the initial exponential growth and the limit of the total resources. Its function is as follows.
The car package is first loaded to read the data and then modeled with the NLS function, where theta1, Theta2, theta3 represent three parameters to be estimated, start sets the parameter initial value, and the trace is set to show the iteration process. The NLS function uses the Gauss-newton method to find the Extremum by default, the first column in the iterative process is the RSS value, and the next three columns are the parameter estimates. The return result is then returned with summary.
Library (CAR) pop.mod1 <-nls (population ~ theta1/(1+exp (-(Theta2+theta3*year))), start=list (theta1 = $, Theta2 = 49 , theta3 = 0.025), Data=uspop, trace=t) Summary (POP.MOD)
Another easy way to do this is to use the built-in self-boot model (self-starting Models), at which point we only need to specify the function form, without having to specify the initial value of the parameter. In this example, the Selfstarting function for the logistic function is named Sslogis
Pop.mod2 <-NLS (population ~ sslogis (year,phi1,phi2,phi3), Data=uspop)
second, to judge the effect of fitting
After the establishment of nonlinear regression model, it is necessary to judge the fitting effect, because sometimes the parameter optimization process can catch local extremum point rather than global extremum point. The most intuitive approach is to draw a fit curve on the original data point .
Library (GGPLOT2) p <-ggplot (Uspop,aes (year, population)) P+geom_point (size=3) +geom_line (Aes (Year,fitted ( POP.MOD1)), col= ' red ')
NOTES: About fitted--a good blog
If you compare the fit effect of multiple models, you can use the AIC function to take the minimum value. (AIC is the red pool coefficient used to compare models, similar BIC is the Bayes coefficient)
third, residual error diagnosis
In order to detect whether these assumptions are true, we use the residuals of the fitting model to judge the error.
Plot (fitted (POP.MOD1), Resid (pop.mod1), type= ' B ')
Fitted is the Fit value (predict is the predicted value) Resid and residuals represent residuals
parameter estimation of function model
There are at least a few questions about function estimates that need to be cared for:
1, know the function of a approximate model, need to estimate the parameters of the function;
2, do not know the model, but want to use a non-bad model to portray it;
3, do not know the model, not too concerned about its explicit expression is what, just want to know its not observed in the value of the point.
The first of these three questions is fitting or parameter estimation, the second is called function approximation, and the third is called function interpolation. From a statistical point of view, the first is the parameter problem, the remaining non-parametric problem
Taking exponential function with constant term as an example
Simulation Model (Y=x^beta+mu +varepsilon), here is assumed (beta=3,mu=5.2)
Generate Simulation Data
Len <-24x <-runif (len) y <-x^3 + 5.2 + rnorm (len, 0, 0.06) DS <-data.frame (x = x, y = y) str (DS)
' Data.frame ': Obs. of 2 variables:
$ x:num 0.3961 0.2004 0.0407 0.9873 0.83 ...
$ y:num 5.37 5.15 5.21 6.06 5.75 ...
Plot (y ~ x, main = "exponential model") s <-seq (0, 1, length = +) lines (s, s^3, lty = 2, col = "Red")
Using the NLS function is estimated as follows:
RHS <-function (x, B0, B1) { B0 + x^b1}m.1 <-nls (y ~ RHS (x, Intercept, power), data = ds, start = List (intercep t = 0, power = 2), trace = t)
The echo is as follows:
629.9495:0 2
0.08918652:5.174334 2.526742
0.08346069:5.184992 2.786072
0.08303884:5.188992 2.870896
0.08301127:5.190252 2.894080
0.08300947:5.190594 2.900075
0.08300935:5.190683 2.901602
0.08300934:5.190705 2.901989
0.08300934:5.190711 2.902088
0.08300934:5.190713 2.902112
Summary (m.1)
The echo is as follows:
Formula:y ~ RHS (x, Intercept, power)
Parameters:
Estimate Std. Error t value Pr (>|t|)
Intercept 5.19071 0.02189 237.150 < 2e-16
Power 2.90211 0.30825 9.415 3.58e-09
Intercept * * *
Power * * *
---
Signif. Codes
0 ' * * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 "1
Residual standard error:0.06143 on degrees of freedom
Number of iterations to Convergence:9
Achieved Convergence tolerance:4.296e-06
If the least squares estimation method is used, the results are as follows:
Model <-LM (I (log (y)) ~ I (log (x))) Summary (model)
The echo is as follows:
Call:
LM (formula = I (log (y)) ~ I (log (x)))
Residuals:
Min 1Q Median 3Q Max
-0.054991-0.029944-0.008994 0.037530 0.073063
Coefficients:
Estimate Std. Error t value Pr (>|t|)
(Intercept) 1.755422 0.011128 157.755 < 2e-16
I (log (x)) 0.055445 0.009502 5.835 7.17e-06
(Intercept) * * *
I (log (x)) * * *
---
Signif. Codes
0 ' * * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 "1
Residual standard error:0.03851 on degrees of freedom
Multiple r-squared:0.6075,adjusted r-squared:0.5897
f-statistic:34.05 on 1 and DF, p-value:7.166e-06
We can show the results of the estimation data, the real model, the NLS estimation model, the least squares model, to fit the good or bad to have an intuitive judgment:
Plot (ds$y ~ ds$x, main = "fitted power model, with intercept", sub = "blue:fit; Magenta (magenta): Fit LSE; Green:known ") lines (s, s^3 + 5.2, lty = 2, col =" Green ") lines (s, Predict (m.2, list (x = s)), lty = 1, col =" Blue ") lines (s , exp (Predict (model, list (x = s)), lty = 2, col = "Magenta") segments (x, y, X, fitted (m.2), lty = 2, col = "Red")
Legend ("TopLeft", C ("NSL fit", "least squares (magenta)", "real", "NLS estimate"), Col=c ("Blue", "magenta", "green", "Red"), Pch=15:15,cex = 0.7)
R language using NLS for nonlinear regression and parameter estimation of function model