R Language Data Analysis series four --by Comaple.zhang
When it comes to statistical analysis, we can't leave random variables, so-called random variables are mathematical models that mathematicians build to better fit the real-world data. With her, we can even predict a website for the next few days to visit users, the future trend of stocks and so on. So in this section we will explore the following common function distributions, as well as the process control statements.
Common distributions are: normal distribution (Gaussian distribution), exponential distribution, beta distribution, gamma distribution, etc.
Normal
If the random variable x obeys a normal distribution with a mathematical expectation of μ and a variance of σ^2, it is recorded as N (μ,σ^2). The probability density function curve is determined by the expected μ of the normal distribution, and its standard deviation σ determines the amplitude of the distribution. Because the curve is bell-shaped, it is often called the bell-shaped curve. The standard normal distribution we usually call is the normal distribution of μ= 0,σ= 1.
Par (Mgp=c (0.6,0.6,0)) x <-seq ( -5,5,length.out=100) y <-dnorm (x,0,1) plot (X,y,xlim=c ( -4,4), col= ' Red ', ylim=c ( 0,0.8), type= ' l ', ylab= ' density ', xlab= ' x ', main= "The Normal density Distribution") lines (X,dnorm (x,0,2), col= "Blue") Lines (X,dnorm (x,-2,1), col= "Orange") lines (X,dnorm (x,0,0.5), col= "green")
Exponential distribution
The life distribution of many electronic products is generally subject to exponential distribution. The lifetime distributions of some systems can also be approximated by exponential distributions. It is one of the most commonly used distribution forms in reliability research. When the failure of a product is accidental failure, its lifetime is subject to exponential distribution. For example, if an original is known to have been used for S-hour, then it can use the T-hour conditional probability, just like the probability of using T-hour from the beginning. This is the non-memory of the exponential distribution, it has a wide range of applications in reliability research.
X<-seq ( -1,2,length.out=100) y<-dexp (x,0.5) plot (x,y,col= "Red", Xlim=c (0,2), Ylim=c (0,5), type= ' l ', xaxs= " I ", yaxs=" I ", ylab= ' density ', xlab= ' x ', main=" The exponential density Distribution ") lines (X,dexp (x,1), col=" Green ") lines (X,dexp (x,2), col=" Blue ") lines (X,dexp (x,5), col=" Orange ")
Gamma Gamma distribution
Gamma function:
The gamma function is the generalization of the factorial on the real number.
Probability density function of gamma distribution:
X<-seq (0,10,length.out=100) Y<-dgamma (x,1,2) plot (x,y,col= "Red", Xlim=c (0,10), Ylim=c (0,2), type= ' l ', Xaxs= "I", yaxs= "I", ylab= ' density ', xlab= ', main= "The Gamma density Distribution") lines (X,dgamma (x,2,2), col= " Green ") lines (X,dgamma (x,3,2), col=" Blue ") lines (X,dgamma (x,5,1), col=" Orange ") lines (X,dgamma (x,9,1), col=" Black ")
Beta Tower Distribution
An important part of the beta distribution should be the presence of conjugate prior distributions as Bernoulli distributions and two-term distributions, which have important applications in machine learning and mathematical statistics.
The distribution has two parameters,α,β (α,β>0).
X<-seq ( -5,5,length.out=10000) Y<-dbeta (x,0.5,0.5) plot (x,y,col= "Red", xlim=c (0,1), Ylim=c (0,6), type= ' l ', xaxs= "i", yaxs= "I", ylab= ' density ', xlab= ', main= "The Beta density Distribution") lines (X,dbeta (x,5,1), col= "Green") lines (X,dbeta (x,1,3), col= "Blue") lines (X,dbeta (x,2,2), col= "Orange") lines (X,dbeta (x,2,5), col= "Black") Legend ("Top", Legend=paste ("A=", C (. 5,5,1,2,2), "b=", C (. 5,1,3,2,5)), Lwd=1,col=c ("Red", "green", "Blue", "Orange", " Black "))
Flow Control Statement Branch statement
If else:
A < 5if (a>10) {print (' a>10 ')} else if (a<10) {print (' a<10 ')} else{print (' a=10 ')}
Switch Branch statement:
Case <-4
switch (case, ' low anomaly ', ' low ', ' normal ', ' high ', ' high anomaly ')
Too high
For loop:
WEB.PV <-C (sample (100:5000,30)) Web.day <-seq (as. Date (' 2015-01-01 '), by=1,length=30) web.data <-data.frame (WEB.DAY,WEB.PV) for (item INWEB.DATA$WEB.PV) {print ( Paste (Web.data$web.day[which (WEB.DATA$WEB.PV ==item)], ", item)}
While loop:
while (I <length (WEB.PV)) {print (web.pv[i]); i = i + 1}; i=0
Function
Define a function expression: Y=a*x + B, and then we also draw the function path graph:
Demo.fun1 <-Function (x,a,b) {return (A * x + b)} a=3b=7y <-demo.fun1 (x,a,b) DF <-data.frame (x, y) G<-ggplot (d F,aes (x, y)) G <-G + geom_line (col= ' red ') # once equation curve g <-g + geom_hline (yintercept=0) +geom_vline (yintercept=0) #设置坐标轴 G <-G + ggtitle (paste (' y= ', A, ' * x+ ', b)) # Add title G
Define a multiple equation function:
Demo.fun3 <-Function (x,a,b,c,d) {return (A * x^3 + b * x^2 + c * x +d)} a=1b=5c=6d=-10x <-seq ( -5,5,by=0.01) y <- DEMO.FUN3 (x,a,b,c,d) DF <-data.frame (x, y) G <-ggplot (Df,aes (x, y)) G <-G + geom_line (col= ' green ') #三次曲线g <-g + Geom_hline (yintercept=0) + geom_vline (yintercept=0) #设置坐标轴g <-g + ggtitle (paste (' y= ', A, ' *x^3 + ', B, ' *x^2 + ', C, ' * x + ') , d) # add title G
R Language Data Analysis series four