R language Ggplot packet data distribution summary _r language

Source: Internet
Author: User
Tags ggplot
Introduction

We often want to observe a batch of data distribution patterns, histograms, density map, box line map, violin map and point map are very good to achieve the form. Here, we briefly introduce histograms, density graphs and box-line graphs, which are more commonly used by us. Histogram

A lot of people don't understand the difference between a bar chart and a histogram. The bar chart is mainly used to display classified data, that is, nominal data, each group is separated. And the histogram is used to show the numerical data, each group dependencies. Single Group histogram

The most basic statement is to add Geom_histogram () after the Ggplot statement.

Library (Gcookbook)
Library (Ggplot2)
Ggplot (Faithful, AES (x=waiting)) + Geom_histogram ()


The maximum histogram defaults to 30 groups, we can use *binwidth to change.

Ggplot (Faithful, AES (x=waiting)) +geom_histogram (binwidth=8, fill= "white", colour= "Black") #改为8组

Grouping histograms

Grouping histograms As with other graphs, we use the Facet_grid (var ~), which is categorized by Var variables, doing multiple graphs, not multiple histograms in a graph. If the variable is a number, it should be factorial.

Library (MASS) #取binwidth数据
ggplot (BIRTHWT, AES (X=BWT)) +geom_histogram (fill= "white", colour= "Black") +facet_ Grid (smoke ~.)

Nuclear density curve

If you want to do a density curve, use geom_density to map a continuous variable.

Ggplot (Faithful, AES (x=waiting)) + geom_density ()
#你也可以将包住的部分给填充颜色
Ggplot (Faithful, AES (x=waiting)) +
Geom_density (fill= "Blue", alpha=.2) +
Xlim (
#如果你不喜欢线与下方相连) can be used in a different way
Ggplot (Faithful, AES (x= Waiting) + geom_line (stat= "density") +
expand_limits (y=0) #expand_limits使y轴范围包含0值.
#密度曲线与直方图共戏
Ggplot (Faithful, AES (X=waiting, y=. Density.)) +
Geom_histogram (fill= "Cornsilk", colour= "grey60", size=.2) +
geom_density () +
Xlim (35, 105)




Grouped density curve

Birthwt1 <-birthwt
birthwt1$smoke <-factor (Birthwt1$smoke)
Ggplot (BIRTHWT1, AES (X=BWT, Fill=smoke) ) + geom_density (alpha=.3)

Box Line diagram

The box line diagram is widely used, especially in the comparison of multiple sets of data. On the code to see how practical.

Ggplot (BIRTHWT, AES (X=factor (race), Y=BWT)) + Geom_boxplot ()
#如果存在多个多个离群点, available outlier.size and outlier.shape for size and shape settings
ggplot (BIRTHWT, AES (X=factor (race), Y=BWT)) +
Geom_boxplot (outlier.size=1.5, outlier.shape=21)
#为了看数据分布是否有偏, we can also increase the mean value and the median to compare, mainly with stat_summary to show the mean in diamond.
Ggplot (BIRTHWT, AES (X=factor (race), Y=BWT)) + geom_boxplot () +
stat_summary (fun.y= "mean", geom= "point", Shape=23, size=3, fill= "white")



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.