A brief introduction to several statistical distributions in R

Source: Internet
Author: User

There are many statistical distributions, which are basically described in R. Due to their limited capabilities, we have selected a few common, more important, and simple descriptions of each distribution definition, formula, and presentation in R.

Here's a list of distributions:

rnorm (n, mean=0, sd=1) Gaussian (normal) distribution
rexp (n, rate=1)? Exponential distribution
rgamma (n, Shape, scale=1) Gamma distribution

Rpois (n, Lambda) Poisson distribution
rweibull (n, Shape, scale=1) Weibull distribution
rcauchy (n, location=0, scale=1) Cauchy distribution
Rbeta (n, Shape1, shape2) Beta distribution
RT (n, DF) t distribution
RF (n, DF1, DF2) F Distribution
RCHISQ (n, DF) χ2 Distribution
rbinom (n, size, prob) two distributions?
rgeom (n, prob) geometric distribution
rhyper (NN, m, N, k) ? hypergeometric distribution
Rlogis (n, location=0, scale=1) Logistic distribution
rlnorm (n, meanlog=0, sdlog=1) logarithmic normality
rnbinom (n, size, prob) negative two-item distribution
runif (n, min=0, max=1) evenly distributed
Rwilcox (NN, m, n), Rsignrank (NN, n) Wilcoxon distribution
Note that the above distribution has a pattern, that is, all functions are preceded by R , so, if you want to get the probability density, use d to replace R

If you want to get cumulative probability density, replace r with p

If you want to get the number of bits, replace R with Q

Two items distributed:

That is to repeat the N-time independent Bernoulli test. There are only two possible outcomes in each test, the two outcomes are antagonistic and independent, unrelated to the results of other tests, and the probability of occurrence or absence of events remains unchanged in each independent test , This series of experiments is always called N-heavy-Knoop experiment, when the number of trials is 1 o'clock, two distributions obey 0-1 distribution.

Formula: P (ξ=k) = C (n,k) * p^k * (1-p) ^ (n-k)

where p is the probability of success, N is n independent repetition experiment, K is the probability of k occurrence of n experiment

Expected: EΞ=NP

Variance: DΞ=NP (1-p)

The two distributions are shown in R:

p=.4

k=200

n=10000

X=rbinom (N,K,P)

hist (x)


To standardize processing:

Mean=k*p

var=k*p* (1-P)

z= (X-mean)/sqrt (Var)

hist (z)


Plot density plots

Mean=k*p

var=k*p* (1-P)

z= (X-mean)/sqrt (Var)

hist (z)


Normal distribution :

The normal curve is bell-shaped, the two are low, the middle is high, the symmetry of left and right is bell-shaped, so people often call it bell-shaped curve.

If the random variable x obeys a normal distribution with a mathematical expectation of μ and a variance of σ^2, it is recorded as N (μ,σ^2)

When μ= 0,σ= 1 o'clock, the normal distribution is the standard normal distribution.

Representation of normal distribution in R:

X=rnorm (k, Mean=mean,sd=sqrt (Var))

hist (x)


Poisson distribution:

is a discrete probability distribution commonly found in statistics and probability, published by French mathematician Simon Denis (Siméon-denis Poisson) in 1838.

probability function of Poisson distribution:

The parameter λ of the Poisson distribution is the average occurrence of random events in the unit time (or unit area). The Poisson distribution is suitable for describing the number of random events that occur per unit time.

The Poisson distribution in R shows:

Par (Mfrow=c (2,2), mar = C (3,4,1,1))

lambda=.5

X=rpois (k, Lambda)

hist (x)

Lambda=1

X=rpois (k, Lambda)

hist (x)

Lambda=5

X=rpois (k, Lambda)

hist (x)

lambda=10

X=rpois (k, Lambda)

hist (x)


Two distribution and Poisson distribution:

When N of two distributions is large and p is very small, the Poisson distribution can be approximated as a two-item distribution, where λ is NP. Usually, when n≧10,p≦0.1, the Poisson formula can be used to approximate the calculation.

Par (Mfrow=c (3,3), mar = C (3,4,1,1))

k=10000

P=c (. 5,. 05,. 005)

N=c (10,100,1000)

For (I in P) {

for (j in N) {

X=rbinom (K,j,i)

hist (x)

}}


Chi-Square Distribution:

If n independent random variables ξ, ξ?、......、 ξn, are subject to the standard normal distribution (also known as independent distribution in the standard normal distribution), then the sum of the squares of the random variables which obey the standard normal distribution is a new random variable, the distribution law is called Chi-square distribution (chi-square Distribution).

Chi-square distribution is a new distribution constructed from the normal distribution, when the degree of Freedom N is large,

The distribution is approximate to normal distribution.

Chi-square distribution in R:

k=10000

Par (Mfrow=c (2,2), mar = C (3,4,1,1))

X=RCHISQ (k,2)

D=density (x)

Plot (d)

X=RCHISQ (k,5)

D=density (x)

Plot (d)

X=RCHISQ (k,100)

D=density (x)

Plot (d)

X=RCHISQ (k,1000)

D=density (x)

Plot (d)


F Distribution:

The F distribution is defined as: set X, y as two independent random variables, x obey the chi-square distribution of degrees of freedom K1, y obey the K2 distribution of degrees of freedom, these 2 independent chi-square distributions are separated by their degrees of freedom in addition to the ratio of this statistic distribution. That is: the F-distribution is the distribution that obeys the first degree of freedom for K1 and the second degree of freedom for K2.

k=10000

Par (Mfrow=c (2,2), mar = C (3,4,1,1))

X=RF (k,1, 100)

hist (x)

X=RF (k,1, 10000)

hist (x)

X=RF (k,10, 10000)

hist (x)

X=RF (k,10000, 10000)

hist (x)


T distribution:

The shape of the T distribution curve is related to the size of N (exactly, the degree of Freedom V). Compared with the standard normal distribution curve, the lower the Freedom V, the flatter the T distribution curve, the lower the middle of the curve, the higher the end of the curve, the more the T distribution curve is closer to the normal distribution curve, and the T distribution curve is the normal normal distribution curve when the degree of freedom v=∞.

k=10000

Par (Mfrow=c (2,2), mar = C (3,4,1,1))

X=rt (k,2)

hist (x)

X=rt (k,5)

hist (x)

X=rt (k,10)

hist (x)

X=rt (k,100)

hist (x)


Diagram of several distribution relationships:


I2mean=function (x,n=10) {

K=length (x)

nobs=k/n

Xm=matrix (X,nobs,n)

Y=rowmeans (XM)

Return (y)

}

Par (Mfrow=c (5,1), mar = C (3,4,1,1))

#Binomia

p=.05

n=100

k=10000

X=i2mean (Rbinom (k, n,p))

D=density (x)

Plot (d,main= "binomial")

#Poisson

lambda=10

X=i2mean (Rpois (k, Lambda))

D=density (x)

Plot (d,main= "Poisson")

#Chi-square

X=i2mean (RCHISQ (k,5))

D=density (x)

Plot (d,main= "Chi-Square")

#F

X=i2mean (RF (k,10, 10000))

D=density (x)

Plot (d,main= "F Dist")

#t

X=i2mean (RT (k,5))

D=density (x)

Plot (d,main= "T Dist")

From: Chopping wood and asking the woodcutter

A brief introduction to several statistical distributions in R

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.