Introduction
We often want to observe a batch of data distribution patterns, histograms, density map, box line map, violin map and point map are very good to achieve the form. Here, we briefly introduce histograms, density graphs and box-line graphs, which are more commonly used by us. Histogram
A lot of people don't understand the difference between a bar chart and a histogram. The bar chart is mainly used to display classified data, that is, nominal data, each group is separated. And the histogram is used to show the numerical data, each group dependencies. Single Group histogram
The most basic statement is to add Geom_histogram () after the Ggplot statement.
Library (Gcookbook)
Library (Ggplot2)
Ggplot (Faithful, AES (x=waiting)) + Geom_histogram ()
The maximum histogram defaults to 30 groups, we can use *binwidth to change.
Ggplot (Faithful, AES (x=waiting)) +geom_histogram (binwidth=8, fill= "white", colour= "Black") #改为8组
Grouping histograms
Grouping histograms As with other graphs, we use the Facet_grid (var ~), which is categorized by Var variables, doing multiple graphs, not multiple histograms in a graph. If the variable is a number, it should be factorial.
Library (MASS) #取binwidth数据
ggplot (BIRTHWT, AES (X=BWT)) +geom_histogram (fill= "white", colour= "Black") +facet_ Grid (smoke ~.)
Nuclear density curve
If you want to do a density curve, use geom_density to map a continuous variable.
Ggplot (Faithful, AES (x=waiting)) + geom_density ()
#你也可以将包住的部分给填充颜色
Ggplot (Faithful, AES (x=waiting)) +
Geom_density (fill= "Blue", alpha=.2) +
Xlim (
#如果你不喜欢线与下方相连) can be used in a different way
Ggplot (Faithful, AES (x= Waiting) + geom_line (stat= "density") +
expand_limits (y=0) #expand_limits使y轴范围包含0值.
#密度曲线与直方图共戏
Ggplot (Faithful, AES (X=waiting, y=. Density.)) +
Geom_histogram (fill= "Cornsilk", colour= "grey60", size=.2) +
geom_density () +
Xlim (35, 105)
Grouped density curve
Birthwt1 <-birthwt
birthwt1$smoke <-factor (Birthwt1$smoke)
Ggplot (BIRTHWT1, AES (X=BWT, Fill=smoke) ) + geom_density (alpha=.3)
Box Line diagram
The box line diagram is widely used, especially in the comparison of multiple sets of data. On the code to see how practical.
Ggplot (BIRTHWT, AES (X=factor (race), Y=BWT)) + Geom_boxplot ()
#如果存在多个多个离群点, available outlier.size and outlier.shape for size and shape settings
ggplot (BIRTHWT, AES (X=factor (race), Y=BWT)) +
Geom_boxplot (outlier.size=1.5, outlier.shape=21)
#为了看数据分布是否有偏, we can also increase the mean value and the median to compare, mainly with stat_summary to show the mean in diamond.
Ggplot (BIRTHWT, AES (X=factor (race), Y=BWT)) + geom_boxplot () +
stat_summary (fun.y= "mean", geom= "point", Shape=23, size=3, fill= "white")