R language using NLS for nonlinear regression and parameter estimation of function model

Source: Internet
Author: User

The nonlinear regression is the process of optimizing the parameters of nonlinear function under the precondition of the nonlinear relation of the variables, and the optimized parameters make the model's RSS (residual squared sum) minimum. The most commonly used nonlinear regression modeling function in the R language is the NLS, which is explained using the Uspop dataset in the car package as an example. The population represents the number of people in the data, and year represents years. If you plot the two scatter plots, you can discover the nonlinear relationships between them. In establishing the nonlinear regression model, it is necessary to determine two things in advance, one is the nonlinear function form and the other is the initial value of the parameter.

first, the model fits

For the population model, it can take the form of the logistic growth function, which considers the initial exponential growth and the limit of the total resources. Its function is as follows.

The car package is first loaded to read the data and then modeled with the NLS function, where theta1, Theta2, theta3 represent three parameters to be estimated, start sets the parameter initial value, and the trace is set to show the iteration process. The NLS function uses the Gauss-newton method to find the Extremum by default, the first column in the iterative process is the RSS value, and the next three columns are the parameter estimates. The return result is then returned with summary.

Library (CAR) pop.mod1 <-nls (population ~ theta1/(1+exp (-(Theta2+theta3*year))), start=list (theta1 = $, Theta2 = 49 , theta3 = 0.025), Data=uspop, trace=t) Summary (POP.MOD)

Another easy way to do this is to use the built-in self-boot model (self-starting Models), at which point we only need to specify the function form, without having to specify the initial value of the parameter. In this example, the Selfstarting function for the logistic function is named Sslogis

Pop.mod2 <-NLS (population ~ sslogis (year,phi1,phi2,phi3), Data=uspop)
second, to judge the effect of fitting

After the establishment of nonlinear regression model, it is necessary to judge the fitting effect, because sometimes the parameter optimization process can catch local extremum point rather than global extremum point. The most intuitive approach is to draw a fit curve on the original data point .

Library (GGPLOT2) p <-ggplot (Uspop,aes (year, population)) P+geom_point (size=3) +geom_line (Aes (Year,fitted ( POP.MOD1)), col= ' red ')
NOTES: About fitted--a good blog

If you compare the fit effect of multiple models, you can use the AIC function to take the minimum value. (AIC is the red pool coefficient used to compare models, similar BIC is the Bayes coefficient)

third, residual error diagnosis

In order to detect whether these assumptions are true, we use the residuals of the fitting model to judge the error.

Plot (fitted (POP.MOD1), Resid (pop.mod1), type= ' B ')

 

Fitted is the Fit value (predict is the predicted value) Resid and residuals represent residuals

parameter estimation of function model

There are at least a few questions about function estimates that need to be cared for:

1, know the function of a approximate model, need to estimate the parameters of the function;

2, do not know the model, but want to use a non-bad model to portray it;

3, do not know the model, not too concerned about its explicit expression is what, just want to know its not observed in the value of the point.

The first of these three questions is fitting or parameter estimation, the second is called function approximation, and the third is called function interpolation. From a statistical point of view, the first is the parameter problem, the remaining non-parametric problem

Taking exponential function with constant term as an example

Simulation Model (Y=x^beta+mu +varepsilon), here is assumed (beta=3,mu=5.2)

Generate Simulation Data

Len <-24x <-runif (len) y <-x^3 + 5.2 + rnorm (len, 0, 0.06) DS <-data.frame (x = x, y = y) str (DS)

' Data.frame ': Obs. of 2 variables:
$ x:num 0.3961 0.2004 0.0407 0.9873 0.83 ...
$ y:num 5.37 5.15 5.21 6.06 5.75 ...

Plot (y ~ x, main = "exponential model") s <-seq (0, 1, length = +) lines (s, s^3, lty = 2, col = "Red")

 

Using the NLS function is estimated as follows:

RHS <-function (x, B0, B1) {    B0 + x^b1}m.1 <-nls (y ~ RHS (x, Intercept, power), data = ds, start = List (intercep t = 0,     power = 2), trace = t)

The echo is as follows:

629.9495:0 2
0.08918652:5.174334 2.526742
0.08346069:5.184992 2.786072
0.08303884:5.188992 2.870896
0.08301127:5.190252 2.894080
0.08300947:5.190594 2.900075
0.08300935:5.190683 2.901602
0.08300934:5.190705 2.901989
0.08300934:5.190711 2.902088
0.08300934:5.190713 2.902112

Summary (m.1)

The echo is as follows:

Formula:y ~ RHS (x, Intercept, power)

Parameters:
Estimate Std. Error t value Pr (>|t|)
Intercept 5.19071 0.02189 237.150 < 2e-16
Power 2.90211 0.30825 9.415 3.58e-09

Intercept * * *
Power * * *
---
Signif. Codes
0 ' * * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 "1

Residual standard error:0.06143 on degrees of freedom

Number of iterations to Convergence:9
Achieved Convergence tolerance:4.296e-06

If the least squares estimation method is used, the results are as follows:

Model <-LM (I (log (y)) ~ I (log (x))) Summary (model)

The echo is as follows:

Call:
LM (formula = I (log (y)) ~ I (log (x)))

Residuals:
Min 1Q Median 3Q Max
-0.054991-0.029944-0.008994 0.037530 0.073063

Coefficients:
Estimate Std. Error t value Pr (>|t|)
(Intercept) 1.755422 0.011128 157.755 < 2e-16
I (log (x)) 0.055445 0.009502 5.835 7.17e-06

(Intercept) * * *
I (log (x)) * * *
---
Signif. Codes
0 ' * * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 "1

Residual standard error:0.03851 on degrees of freedom
Multiple r-squared:0.6075,adjusted r-squared:0.5897
f-statistic:34.05 on 1 and DF, p-value:7.166e-06

We can show the results of the estimation data, the real model, the NLS estimation model, the least squares model, to fit the good or bad to have an intuitive judgment:

Plot (ds$y ~ ds$x, main = "fitted power model, with intercept", sub = "blue:fit; Magenta (magenta): Fit LSE; Green:known ") lines (s, s^3 + 5.2, lty = 2, col =" Green ") lines (s, Predict (m.2, list (x = s)), lty = 1, col =" Blue ") lines (s , exp (Predict (model, list (x = s)), lty = 2, col = "Magenta") segments (x, y, X, fitted (m.2), lty = 2, col = "Red")
Legend ("TopLeft", C ("NSL fit", "least squares (magenta)", "real", "NLS estimate"), Col=c ("Blue", "magenta", "green", "Red"), Pch=15:15,cex = 0.7)

  

R language using NLS for nonlinear regression and parameter estimation of function model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.