June has been busy in the final exam, today to quickly learn the Ggplot2 package of simple drawing.
There are also many drawing functions in the base package of R, such as plot (); Barplot (); Qqplot ();
But there is also the famous Ggplot2 bag, using the function of this package to draw a beautiful picture, and the use of flexible.
In Ggplot's Official Handbook, a statistical graph is made up of data to geometric objects (geometric object, recorded as Geom, dots, lines, bars, etc.), graphic properties (aesthetic attributes, recorded as AES, such as color, shape, size) of a map. In addition, graphics can include statistical transformations of the data (statistical transformation, written as stats). Finally, painting is in a coordinate system (coordinate system, recorded as Coord), and the faceted (facet, dividing the drawing window into several child windows) is a graphic used to generate different subsets of the data
First, it introduces the basic elements: data and mapping geometric objects Geom statistical changes Stats scale coordinate system coord faceted facet
These components are glued together by "+" to a layer (layer), so the layer is an important concept in Ggplot2.
The following data is a graduate data, from Wang Bin's editor-in-chief of the "Data analysis and R language Modeling" practice data, a total of 48 sample points, 9 attributes
One, data
In Ggplot2, the accepted data set must be in data.frame format. This format is easy to save data, and can easily change the existing dataset with%+% under the original drawing parameters.
The Library ("Ggplot2") #调用包
ug=read.table ("clipboard", header=t);
Head (UG)
P=ggplot (Ug,aes (score,income), Color=sex) +geom_point ()
ug.c=transform (ug,income=income*1.5) # Enlarges the revenue by 1.5 times times, the other unchanged
p%+%UG.C
Second, mapping
The AES () function is a mapping function in Ggplot2, which is a corresponding relationship between the data in a dataset and the corresponding graphic attributes.
1. The concept of mapping
>p=ggplot (Ug,aes (score,income,color=sex)) +geom_point ()
> Summary (P)
data:id, name, sex, region, birth , income, height, weight, score
[48x9]
mapping: x = score, y = income, colour = Sex
> P1=ggplot (data=ug )
> Summary (p1)
data:id, name, sex, region, birth, income, height, weight, score
[48x9]
It can be found that the x axis is specified in P as the score,y axis is income, and the color is sex, which is different from P1
2. Setting and Mapping
Mappings correlate discrete or contiguous data in a variable with a different parameter in a graphic property, and the setting can unify all the data in the variable into one graphic property.
P2=ggplot (Ug,aes (score,income))
p2+geom_point (color= "Blue") #设定散点的颜色为蓝色
P2+geom_point (Aes (color= "Blue" ))
There was a mistake in the last sentence, because in Aes, the actual meaning of color = "Blue" is to take "blue" as a variable, using the data in this variable to correlate the parameters in the graphic attribute, while "blue" contains only one character variable, which by default is a discrete variable, Mark by default color scale to Pink
Compare the following three methods
Ggplot (Ug,aes (score,income), Colour=sex) +geom_point () Ggplot (Ug,aes
(score,income,colour=sex)) +geom_point ()
The first point is the black point, the second and third are based on the variable color of the sex, the third is better memory, the equivalent of drawing a good picture, plus the color of the scattered points.
3. Grouping
Ggplot2 is a kind of mapping relationship, by default, Ggplot2 divides all observation points into one group, and if the observation points need to be grouped by additional discrete variables, the default grouping settings must be modified.
Third, layer
1. Set mappings in geometric objects
We can set the mapping relationship in Ggplot (), the mapping relationship is default, or you can use the default mapping relationship that you have set in a later geometry object, or you can make changes in the Geometry object at any time.
The following is used to a diamonds dataset, the number of samples of the dataset is very large, so the first sample, so that the picture is more good-looking.
Data (Diamonds) head
(Diamonds)
set.seed (#设定随机种子)
small.diamonds=diamonds[sample (Nrow (Diamonds), ),]
#提取数据 head
(small.diamonds)
dp =ggplot (Small.diamonds, aes (x = carat, y = price, color = factor (color) ) #设定默认的映射关系
DP + geom_point () #沿用默认的映射关系来绘制散点图
DP + geom_point (AES (shape = factor (cut)) #添加图层中的shape的映射关系
DP + geom_point (AES (y = cut)) #修改默认的y的映射关系, note that the y-axis name in the figure still represents
DP + geom_point (AES (color = NULL)) with the default price # Delete the default color mapping relationship
Pay attention to the drawing of the second and third graphs
Four, geometric objects
DP =ggplot (Small.diamonds, Mapping=aes (x =carat, y = Price,shape=cut,color = factor (color)) #设定默认的映射关系
DP + geom_ Point ()
The second picture of the previous diamond dataset can also be done with these two statements, which is somewhat different in that the front is the first to draw the Ggplot, plus the different mapping of the scatter, and here is the first to draw a different map of the Ggplot, plus a point on the good.
1. Histogram
#直方图
Ggplot (small.diamonds) +geom_histogram (Aes (X=price))
You can also fill different colors according to different variables, such as cut, diamond color
Ggplot (small.diamonds) +geom_histogram (Aes (x=price,fill=cut))
Ggplot (small.diamonds) +geom_histogram (Aes (x= Price,fill=color))
2. Column Chart
#柱形图, according to different variables
ggplot (small.diamonds) +geom_bar (Aes (x=clarity))
Ggplot (small.diamonds) +geom_bar (Aes (x= color))
Note the difference between histogram and column chart: The histogram of the continuous type of data in accordance with a long partition (bin) to be divided, and then count, draw a bar chart. And the histogram is classified data, counted by category
3. Density function diagram
#密度函数图
Ggplot (small.diamonds) +geom_density (Aes (x=price,color=clarity)) #color指定颜色
Ggplot (small.diamonds ) +geom_density (Aes (x=price,fill=cut)) #fill在下方填充
4. Box Line diagram
#箱线图
Ggplot (small.diamonds) +geom_boxplot (Aes (x=cut,y=price,fill=clarity))
There are also many geom_xxx functions in Ggplot, Geom_abline geom_area geom_bar geom_bin2d geom_blank geom_boxplot geom_contour Geom_density geom_density2d geom_dotplot geom_errorbar Geom_errorbarh geom_freqpoly geom_hex geom_histogram geom_hline Geom_jitter geom_line geom_linerange geom_map geom_path geom_point geom_pointrange geom_polygon geom_quantile geom_ Raster Geom_rect geom_ribbon geom_rug geom_segment geom_smooth geom_step geom_text geom_tile Geom_violin geom_vline
Five, scale
#标度
>ggplot (small.diamonds) +geom_point (Aes (X=carat,y=price,shape=cut,color=color))
>ggplot ( Small.diamonds) +geom_point (Aes (X=carat,y=price,shape=cut,color=color)) +scale_y_log10 () +scale_color_manual ( Values=rainbow (7)) #对y变量做了对数变换
Compare the next two ways
VI. Statistical transformations
The statistical transformation computes the original data in some way and then represents it on the graph.
For example, add a regression line to a scatter chart.
#统计变换
Ggplot (Small.diamonds,aes (X=carat,y=price)) +geom_point () +scale_y_log10 () () +stat_smooth ()
There are also some statistical transformations that are optional, such as the following table Stat_abline stat_identity stat_bin stat_qq stat_bin2d stat_quantile stat_bindot stat_smooth Stat_binhex Stat_spoke stat_boxplot stat_sum stat_contour stat_summary stat_density stat_summary2d stat_density2d Stat_summary_hex STAT_ECDF stat_unique stat_function stat_vline stat_hline stat_ydensity
Seven, coordinate system
1. Use Coord_flip () to achieve axis rollover
#坐标系统
Ggplot (Small.diamonds,aes (x=clarity,fill=clarity)) +geom_bar () Ggplot (Small.diamonds,aes (
x= clarity,fill=clarity)) +geom_bar () +coord_flip ()
2. Coord_polar () to achieve conversion polar coordinates
#极坐标
>ggplot (small.diamonds) +geom_bar (Aes (X=factor (1), fill=cut)) +coord_polar (theta= "y")
# X is actually above the clarity, is a factor variable
In fact, we can see that the polar coordinates of the bar is a pie chart.
#靶心图
Ggplot (small.diamonds) +geom_bar (Aes (X=factor (1), fill=cut)) +coord_polar ()
#风玫瑰图
Ggplot (small.diamonds) +geom_bar (Aes (x=clarity,fill=cut)) +coord_pola
Eight, faceted (facet)
According to the different transparency, respectively regression (carat and price for regression), with the faceted
#分面, this is a line of code, paying special attention to the designation of X and Y to be placed in Ggplot
>ggplot (Small.diamonds,aes (x=carat,y=price,color=clarity)) +geom_ Point () +scale_y_log10 () +facet_wrap (~clarity) +stat_smooth ()
Ix. Themes
To customize the graph, such as title, Xlab, Ylab display the title, x axis, Y axis, Ggplot2 provides Ggtitle (), Xlab () and Ylab () to achieve. In addition, we may also need to change the font, font size, axis, background and other elements, which need to be done through the theme () function.
Ggplot2 also provides some topics that have been written, such as Theme_grey () as the default theme, THEME_BW () as a white background theme, and Theme_classic () theme Theme_economist Theme_economist_ White THEME_WSJ theme_excel theme_few theme_foundation theme_igray theme_solarized theme_stata theme_tufte
#主题颜色
install.packages ("Ggthemes")
library ("Ggthemes")
Ggplot (small.diamonds) +geom_boxplot (Aes (x= cut,y=price,fill=clarity)) +THEME_WSJ ()
Ggplot Some of the basic introduction is finished, the key is to practice the application.