_r language based on R-language for the regression of quantile (regression)

Source: Internet
Author: User

Quantile regression
In this, we talk about the regression of the number of places, I think we often see the traditional reunification. The return of the number of people may see less, in fact, this method is very early, probably the 78 's, but at that time the theory is not perfect. By 2005, Koenker R, the founder of the division's return, had written a monograph on the return of the book, published by Cambridge University Press. This year originally the father will out a book "Handbook of Quantile Regression", has not officially come out, at present, the number of the application of the regression is very wide. is particularly important in the financial sector. Here to give you a brief introduction, the basic principle of the number of regression, and then take R to do a complete case. Why take R software, because the number of the inventor of the return of the Quantreag to write a package, called the first, is the only one of the number of the return of the package, now, see Python,julia also have related packages. But the feeling of this r is still the best.
So what is the return of the number of points, this is from the traditional regression, the traditional return, generally called the least squares regression, also called mean return. The mean value is the mean value of the condition. More abstract, I explained in more detail in the previous blog post. So the regression of the number of digits is the extension of mean regression, that is, it can fit other points other than the mean to form a number of regression lines, the first thing to be emphasized here is that the point of return of the number of digits is the point of the variable y, not X. So if we set multiple points, we get a number of regression lines. Of course, the regression of the number of digits has also developed nonlinear regression, that is, we can fit a number of curves, or the same as the generalized linear regression model can be applied to two-valued variables. To say that the number of points back to the specific principle, there is no time to discuss later. Here we take r language to do a case, we can gradually feel the number of points back to the specific meaning of the. The data used in the case, we should be more familiar with, is the income and food consumption expenditure data, the following look at the code.

#导入分位数回归的包
Library (Quantreg)                         
# introduces Data
(Engel)
#查看数据格式
mode (engel)
[1] "list"
# View variable name
names (Engel)
[1] "income"  "Foodexp"
#查看格式
Class (Engel)
[1] "Data.frame"
# View data before the five-element head
(Engel)
income  foodexp
1 420.1577 255.8394
2 541.4117 310.9587
3 901.1575 485.6800
4 639.0802 402.9974
5 750.8756 495.5608
6 945.7989 633.7978
#画个散点图看看数据
Plot (Engel$income, engel$foodexp, xlab= ' income ', ylab= ' foodexp ')

This is the picture.

Let's continue with a brief look at the data

#查看foodexp的变化范围
BoxPlot (engel$foodexp, xlab= ' foodexp ')
#简单验证一下因变量foodexp是否服从正态分布
qqnorm (engel$ Foodexp, main= ' QQ plot ')
qqline (engel$foodexp, col= ' Red ', lwd=2)

The results are as follows:

Below is the QQ chart

The results show that the variable y obviously does not obey the normal distribution, but it does not require Y to obey the normal distribution, and the regression of the number of digits is not sensitive to the anomaly value point. Let's continue, for comparison, we still do a mean regression, and then do a decimal regression.

#可以直接调用数据框里变量 Attach (Engel) #设置0.05, 0.25, 0.5, 0.75, 0.95 Five, and a number of decimal regressions, which can be achieved by five-digit regression line rq_result <-RQ (foodexp ~ Inc ome, Tau=c (0.05, 0.25, 0.5, 0.75, 0.95)) Summary (Rq_result) Call:rq (formula = Foodexp ~ Income, tau = C (0.05, 0.25, 0.5, 0.75, 0.95)) Tau: [1] 0.05 coefficients:coefficients LOWER BD Upper BD (Intercept) 124.88004 98. 30212 130.51695 Income 0.34336 0.34333 0.38975 (Call:rq = Formula ~ Foodexp, tau = C (0.05, 0.25, 0.5 , 0.75, 0.95)) Tau: [1] 0.25 coefficients:coefficients LOWER BD Upper BD (Intercept) 95.48354 7 3.78608 120.09847 Income 0.47410 0.42033 0.49433 (Call:rq = Formula ~ Foodexp, tau = C (0.05, 0.25, 0     .5, 0.75, 0.95)) Tau: [1] 0.5 coefficients:coefficients LOWER BD UPPER BD (Intercept) 81.48225 53.25915 114.01156 Income 0.56018 0.48702 0.60199 (Call:rq = Formula ~ Foodexp, tau = C (0.05, 0.25, 0.5, 0.75, 0.95)) TAU: [1] 0.75 coefficients:coefficients LOWER BD Upper BD (Intercept) 62.39659 32.74488 107.31362

Me 0.64401 0.58016 0.69041 call:rq (formula = Foodexp ~ Income, tau = C (0.05, 0.25, 0.5, 0.75, 0.95))        Tau: [1] 0.95 coefficients:coefficients LOWER BD Upper BD (Intercept) 64.10396 46.26495 83.57896 0.70907 0.67390 0.73444 #上面就是没条回归线的回归系数, let's take a look at plot (income, Foodexp, cex=0.25, type= ' n ', xlab= ' income '), Yla b= ' Foodexp ') points (income, foodexp, cex=0.5, col= ' Blue ') #加中位数数回归的直线 abline (RQ (Foodexp~income, tau=0.5), col= ' Blue ') # Five linear abline with mean regression (LM (foodexp~income), lty=2, col= ' red ') #将分位数回归的五条线加上去 Taus <-C (0.05, 0.1, 0.25, 0.75, 0.9, 0.95) # fo

 R (I in 1:length (Taus)) {Abline (RQ (Foodexp~income, Tau=taus[i)), col= ' Gray ')}

The

effect is as follows:

from the above figure, we can see that the number of points regression can be fitted out a number of lines, which for our data distribution is more complex, it is very useful, each line reflects the different grades, the relationship between the independent variable and the dependent variable. In fact, this is only a small fraction of the application of the regression, to get the data under different points, we can also carry out the probability density estimation, the corresponding probability density prediction.
That's where we're going.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.