R in Action reading notes (17) 12th chapter re-sampling and self-help method

Last Update:2015-05-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

12.4 Replacement Inspection Reviews

In addition to the coin and lmperm packages, R also provides additional packages that can be used for displacement testing. The perm package can implement some of the functions in the coin package, so it can be used as a validation of the results of the coin package. The Corrperm package provides a displacement test with a correlation of repeated measurements.

The Logregperm package provides a permutation test for logistic regression. Another very important package is glmperm, which covers the displacement test of the generalized linear model relies on the basic sampling distribution theory knowledge, and the displacement test provides another very powerful optional test idea. For each permutation test described above, we can completely ignore the normal distribution, t distribution, f distribution, or chi-square distribution in the statistical hypothesis test. Of course, the actual function of the displacement test is to deal with non-normal data (such as a large distribution bias), the presence of outliers, samples are small, or can not do parameter testing. However, if the initial sample has a poor representation of the overall situation of interest, even a displacement test does not improve the inference effect. The displacement test is primarily used to generate the P-value of the test 0 hypothesis, which helps to answer questions such as "whether the effect exists."

12.5 Self-help method

The so-called self-help method, which is a random substitution sampling from the initial sample, generates an empirical distribution of the statistical quantities to be tested.

You can generate confidence intervals for statistics without assuming a specific theoretical distribution, and can test statistical assumptions. For example, you want to calculate a confidence interval for a sample mean value of 95%. Suppose that the sample distribution of the mean is not normally distributed:

(1) Randomly select 10 observations from the sample, and then return them after sampling. Some observations may be selected several times and some may

Will not be selected.

(2) Calculate and record the sample mean value.

(3) Repeat 1 and 21,000 times.

(4) The average of 1000 samples is sorted from small to large.

(5) Find the sub-site of the sample mean value 2.5% and 97.5%. At this point, the initial position and the 25th number of the last position, they are limited

A 95% confidence interval is set.

Self-service method in 12.6 boot Package

The boot package extends the related use of self-service and re-sampling. You can have a statistic (such as the median) or a statistic

Measure vectors (such as column regression coefficients) use the self-help method.

The self-help approach has three main steps.

(1) Write a function that returns the value of the statistic to be studied. If there is only a single statistic (such as the median), the function should return

A value; If there is a column of statistics (such as a column regression factor), the function should return a vector.

(2) to generate the number of valid statistics required for the self-service method in R, use the boot () function to process the functions written above.

(3) Use the Boot.ci () function to obtain a confidence interval for the statistics generated by step (2).

The main self-service function is boot (), which is in the format: Bootobject<-boot (data=,statistic=,r=,...)

Data: volume, matrix, or frame

Statistic: A function that generates K-Statistics for bootstrap (self-sampling of individual statistics when k=1) functions need to include the indices parameter so that the boot () function uses it to select instances from each repetition

R: Number of self-service samples

...: Other parameters that are useful for generating statistics to be studied can be transferred in a function

The boot () function calls the statistic function r, and each time it generates a list of random fingers that are put back in the integer 1:nrow (data)

Indicators, which are used by the statistic function to select samples. The statistics are calculated based on the selected sample, and the results are stored in

The Bootobject. The elements contained in the object are returned in the boot () function

T0 observations of K statistics obtained from raw data

T a rxk matrix with a self-repeating value of K statistics per line

You can get these elements like Bootobject$t0 and bootobject$t.

Once a self-service sample has been generated, the results can be checked through print () and plot (). If the results look reasonable,

Use the Boot.ci () function to get the confidence interval for the statistic. The format is as follows:

Boot.ci (bootobject,couf=type=)

The object returned by the Bootobject boot () function

Conf expected confidence Interval (default: Conf =0.95)

The type of confidence interval returned by type. Possible values are norm, basic, Stud, perc, BCA, and all (default: Type =all)

The type parameter sets the method for obtaining the confidence interval. The Perc method (the number of bits) shows the sample mean, and the BCA will be based on

Deviations make simple adjustments to intervals

12.6.1 using self-help methods for individual statistics

> rsq<-function (formula,data,indices) {

+ D<-data[indices,]

+ FIT<-LM (Formula,data=d)

+ RETURN (summary (FIT) $r. Square)

+ }

> set.seed (1234)

> Results<-boot (DATA=MTCARS,STATISTIC=RSQ,R=1000,FORMULA=MPG~WT+DISP)

> Print (Results)

Ordinary Nonparametricbootstrap

Call:

Boot (data = Mtcars,statistic = rsq, R = $, formula = MPG ~

WT + DISP)

Bootstrap Statistics:

Original bias std. Error

T1* 0.78093060.0133367 0.05068926

As you can see, the R-squared value of self-service is not normally distributed. Its 95% confidence interval can be achieved by the following generation

Code to obtain:

> Boot.ci (Results,type=c ("perc", "BCA"))

BOOTSTRAP CONFIDENCE intervalcalculations

Based on Bootstrap replicates

Call:

Boot.ci (Boot.out = results, type =c ("perc", "BCA"))

Intervals:

Level percentile BCa

95% (0.6838, 0.8833) (0.6344, 0.8549)

Calculations and intervals on Originalscale

Some BCa intervals may unstable

12.6. Self-help method with more than 2 statistics

First, create a function that returns a vector of regression coefficients:

> bs<-function (formula,data,indices) {

+ D<-data[indices,]

+ FIT<-LM (Formula,data=d)

+ RETURN (Coef (FIT))

+ }

> Results<-boot (DATA=MTCARS,STATISTIC=BS,R=1000,FORMULA=MPG~WT+DISP)

> Print (Results)

Ordinary Nonparametric BOOTSTRAP

Call:

Boot (data = mtcars, statistic = BS, r=, formula = MPG ~

WT + DISP)

Bootstrap Statistics:

Original bias std. Error

t1* 34.96055404 0.1101475088 2.445950503

t2* -3.35082533-0.08573079461.114315903

t3*-0.01772474 0.0003223228 0.008201569

Plot (results,index=2)

12.7 Summary

In this chapter, we introduce a series of computer-intensive methods based on randomization and resampling, which allows you to eliminate the need for theoretical distribution

Knowledge can be tested by hypothesis, and the confidence interval is obtained. When the data comes from an unknown distribution, or there are serious outliers, or

If the sample size is too small, or if there are no parametric methods to answer the hypothetical question you are interested in, these methods are very useful.

R in Action reading notes (17) 12th chapter re-sampling and self-help method

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

R in Action reading notes (17) 12th chapter re-sampling and self-help method

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

R in Action reading notes (17) 12th chapter re-sampling and self-help method

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support