R language: Ggplot2 fine drawing--A practical commercial chart drawing as an example

Source: Internet
Author: User
Tags ggplot

1. Preparation before drawing: Custom Ggplot2 format Brush
2, the preparation before drawing: Data Shaping Tool Dplyr/tidyr Introduction
3, commonly used in business diagram:

1) Simple Column chart + text (single variable)
2) Faceted Column chart (Facet_wrap/facet_grid)
3) Cluster Column chart (position= "Dodge")
4) Stacked Column chart (you need to add a percentage, then a column chart for the percent variable)
5) Pie chart, polar chart
6) Multiple linear graphs

Objective

This article is actually a continuation of my previous blog post. Because we got a job. Use R to customize the data report, which involves a lot of work that is finely crafted on the chart. In-depth study of the Ggplot2, deeply felt that using Ggplot2 drawing and using Excel to paint a different.

If you want to draw with Ggplot2, you still need to know a lot of technical details. These details are scattered in both the R visualization and Ggplot2: Data analysis and charting techniques, either of which are scattered online. So here with my study and summary of the process, to the ggplot2 of fine painting to do a elaboration, introduce me after finishing the drawing idea.

If you have any further learning needs, please buy a book directly or practice your own study. Many technical details need to explore their own to know, I wish you good luck.

1. Preparation before drawing: Custom Ggplot2 format Brush

Before drawing, we first define the ggplot2 format Brush .

First, Ggplot2 itself comes with a very beautiful theme format, such as Theme_gray and THEME_BW. But in the work chart, many companies on the chart format color font, etc. have clear provisions. Like our company, the main color, color, font, etc. have strict rules. As Liu Wanxiang teacher early in a color book, we can see that many business magazines chart, color style are very similar. Therefore, it is necessary to modify the theme to make it more suitable for our business needs and to keep the chart style uniform.

Although the Ggplot2 can be added by the code, the table spacing, background color and font are carefully modified. But if you make a diagram that is so finely tuned, the code will be cumbersome and the code changes will be painful in the event of a sudden rise of the boss to change style.

Fortunately, Ggplot2 allows us to customize the chart style in advance, we can generate objects such as mytheme or myline with a clear color theme, and then, like Excel's custom save chart template or format brush, Directly in the generated chart refers to the format of the brush-type theme color, you can quickly and easily change the contents of the chart, to maintain the unity of style.

Before running, load the related package first

Library (GGPLOT2) library (DPLYR) library (colorbrewer) library (Tidyr) library (grid) #载入格式刷 ###### #定义好字体windowsFonts (CA =windowsfont ("Calibri"))

Next is a demonstration. I first shared a theme brush that I used to use, color reference the following:

Body color: Blue 085a9c, red EF0808, grey 526373
Auxiliary color: Light yellow FFFFE7, orange FF9418, green 219431, bright yellow FF9418, purple 9C52AD

Customized MyTheme, Myline_blue, Mycolour and many other objects:

1
#定义好字体 2 windowsfonts (Ca=windowsfont ("Calibri")) 3 #事先定制好要加图形的形状, color, theme, etc. 4 #定制主题, requires a background of all white, no border. Then all the fonts are such a color 5 #定制主题, requires the background is all white, no border. Then all the fonts are so-so color 6 MYTHEME&LT;-THEME_BW () +theme (legend.position= "Top", 7 Panel.border=element_blank (), 8 panel.grid.major=element_line (linetype= "dashed"), 9 panel.grid.m Inor=element_blank (), Plot.title=element_text (size=15,11 Colour= "#003087", family= "CA"), 13 Legend.text=element_text (size=9,colour= "#003087", family= "CA"), 15 Legend.key=element_blank (), Axis.text=element_text (size=10,colour= "# 003087 ", family=" CA "), strip.text=element_t Ext (size=12,colour= "#EF0808 ", family=" CA "), strip.background=e Lement_blank () Pie_theme=mytheme+theme (Axis.text=element_blank (), 24 Axis.ticks=element_blank (), Axis.title=element_blank (), Panel.grid. Major=element_blank ()) #定制线的大小28 myline_blue<-geom_line (colour= "#085A9C", size=2) Myline_red<-geom_line ( Colour= "#EF0808", size=2) Myarea=geom_area (colour=na,fill= "#003087", alpha=.2) Mypoint=geom_point (size=3,shape =21,colour= "#003087", fill= "white") Mybar=geom_bar (fill= "#0C8DC4", stat= "Identity") #然后是配色, considering the diversity of the sample, you can set the color beforehand, Combinations of 3 colors or 7 colors mycolour_3<-scale_fill_manual (values=c ("#085A9C", "#EF0808", "#526373") and Mycolour_7<-scale _fill_manual (Values=c ("#085A9C", "#EF0808", "#526373", "#FFFFE7", "#FF9418", "#219431", "#9C52AD")), PNS Mycolour_line _7<-scale_color_manual (Values=c ("#085A9C", "#EF0808", "#526373", 38 "#0C8DC4", "#FF9418", "#219431", "#9C52AD"))

After running the above code in R, you can use it directly. For example, the following:

1) Sir into a simple chart:

#未使用格式刷p <-ggplot (Iris,aes (x=species,y=sepal_length)) +geom_bar (stat= "identity") + Ggtitle ("Sepal_length by Species ") p

Simply specifying that the x-axis is a discrete variable species,y is summed to get the following column chart

At this point, apply the previously set theme (MyTheme), the background, axis and font color changes accordingly.

P+mytheme

Then, because the previous format brush Part I set a blue bar pattern (mybar), here directly, you can directly generate a blue column chart.

Ggplot (Iris,aes (x=species,y=sepal_length)) +ggtitle ("Sepal_length by species") +mybar+mytheme

2, the preparation before drawing: Data Shaping sharp weapon dplyr/tidyr

With some pre-set format brushes, we can draw quickly and efficiently.

But before the drawing, it is like the Excel drawing always takes the data into the desired form first. In Excel, we use PivotTables or some formula aids, while in R, we use some commonly used packages, such as Dplyr and Tidyr, to recreate the data.

In the two Ggplot2 books that I read before, the basic use is the RESHAPE2+PLYR combination. But in fact Hadley follow-up dplyr and tidyr more useful. Specific use, in JHU Getting and cleaning data has introduced, the teacher also compiled a swirl course for people to use, the installation method is as follows.

Install.packages ("Swirl") library (swirl) #安装getting and cleaning data-related course package install_from_swirl ("Getting and cleaning Data ") Swirl ()

The rest can also refer to my blog post

In short, with good dplyr words, you can quickly put some data, such as the following stock per transaction record

As you want to summarize (group_by & summarize) or even split (spread), for example, the transaction record is split according to the price of the transaction and Buysell

Data #刚刚演示的那些数据, on the forecaster network can download data%>% group_by (Price,buysell)%>% summarize (Money=sum (money,na.rm=true))%>% Spread (Buysell,money)

To do a good job of Ggplot2, the way to quickly shape the data is what we have to master. The S Wirl course above is very useful, and deserves to be the latest technical method, which deserves everyone's learning.

3, commonly used in business diagram

Next, I'll share some of the most common graphic codes I've used during this drawing. First of all, the further practices and variants of these graphs can be found in these two reference books (R visualization Technology | Ggplot2: Data analysis and charting technology). I'm here to pick up some of my more commonly used charts to explain

1. Simple Column chart + text (single variable)
2. Faceted Column chart (Facet_wrap/facet_grid)
3. Cluster Type Column chart (position= "Dodge")
4, stacked column chart (need to add a percentage, and then the percentage of the variable to do a column chart)
5, pie chart, polar chart
6. Multiple linear graphs

Before drawing, first talk about the limitations of Ggplot2.

The biggest limitation of GGPLOT2 is that it does not support double-coordinate and pie-charts. Even if you can do these graphics, but also a lot of settings, it is very cumbersome to do.
According to my own understanding, the root of this limitation is not related to the aesthetic habits and analysis habits of ggplot2 developers Hadley themselves. For details, see his question in StackOverflow:

It's not possible in Ggplot2 because I believe plots with separate Y scales (not y-scales that is transformations of each Other) is fundamentally flawed.

The great God has the skill to be capricious. Even if a bunch of people in his reply to the following various seeking double coordinates. I wonder if Hadley has changed his mind now to list the two coordinates as the next update point for Ggplot2. But if you want to draw a double-or pie-chart, at least by personal practice, these are more difficult, cumbersome and unsightly. It would be easier to do it with Excel, or to listen to the word of God, instead of using a facet map or a column chart.

So, before we know the following common graphics, we need to remember that Ggplot2 is not omnipotent, although it can make very beautiful charts, but there are always some diagrams can not do, so the use of multiple tools is very necessary.

On the basis of knowing the above premise, we take the Diamonds data set of Ggplot2 as the foundation, combine the application of Dplyr/tidyr, introduce the drawing of common figure.

And then, in addition to the two-coordinate chart and pie chart, Ggplot2 can support the drawing of common graphics. Data, we use Ggplot2 's own packet diamonds

First define

Mytitle= "Demo: Using Diamond as an example"

1) Simple Column chart


The code consists of the following, using the Format Painter Mybar and MyTheme, and then adding a column chart label with Geom_text (Vjust=1 is shown in the column chart)

Data1<-diamonds%>% group_by (cut)%>% summarize (Avg_price=mean (price)) Column Chart <-ggplot (Data1,aes (x=cut,y= Avg_price,fill=as.factor (cut)) +        mytitle+mybar+mytheme+        geom_text (Aes (Label=round (Avg_price)), Vjust=1, Colour= "White")

2) Column chart with classification

For example, in some cases, we want to draw quickly. Use Facet_wrap or Facet_grid to quickly draw the appropriate graphic. This is why GGPLOT2 does not support double coordinates: You can quickly draw, you do not need to do so much work.

The code is as follows:

#dplyr处理数据data2 <-diamonds%>% group_by (cut,color)%>% summarize (Avg_price=mean (price)) #画图, Apply a set drawing element Ggplot (Data2,aes (X=color,y=avg_price)) +facet_wrap (~cut,ncol = 2) +        mytitle+mybar+mytheme# in facet_ Wrap, if you add scales= "free", the coordinates will be different.

3) cluster diagram
The drawing point is that when the data is plotted, adding Geom_bar, position= "Dodge" (separate) if this part is removed, the default is to generate a stacked chart.

The code is as follows:

Data3<-diamonds%>% filter (Cut%in% C ("Fair", "Very Good", "Ideal"))%>% group_by        (cut,color)%>% Summarize (Avg_price=mean (price)) #簇状图簇状柱形图 <-ggplot (Data3,aes (x=color,y=avg_price,fill=cut)) +        Geom_bar ( Stat= "Identity", position= "Dodge") +        mytheme+mytitle+mycolour_3 Clustered Column chart

If you want to define the corresponding order of colors, you can use the factor

For example, just use this line of code to redefine the color, use levels to change the factor order, and then paint, the color and the column order will follow the change. Very convenient.

Data3$cut<-factor (Data3$cut,levels=c ("Very good", "Ideal", "Fair"))

4) Percent Stacking chart
Before mapping, we need to add a percentage of data before drawing, here we use mutate (Percent=n/sum (n)) to add the percentage data. and remove position= "Dodge"

Data4<-diamonds%>% filter (Cut%in% C ("Fair", "Very Good", "Ideal"))%>%         count (color,cut)%>%         Mutate (Percent=n/sum (n)) stacked plot <-ggplot (Data4,aes (x=color,y=percent,fill=cut)) +mytitle+        Geom_bar (stat= " Identity ") +mytheme+mytitle+mycolour_3 stacked chart

Of course, you can also make an area chart. However, if the data is missing, the area chart error probability is quite large

5) Pie chart and Polar chart

Refer to this article "R" First kiss r–ggplot Plot pie chart pie chart and this article use Ggplot2 Paint
In the Ggplot2, there is no direct appeased of the diagram, basically, the column chart is first drawn, and then converted to a pie chart with Coord_polar

There are two methods of drawing:
1) do not specify the x-axis, directly using Geom_bar to generate the y-axis, and then fill= the classification color, coord_polar direct projection y
The benefit code of the method is relatively simple (coord_polar ("Y")
To add a label method, see: http://stackoverflow.com/questions/8952077/pie-plot-getting-its-text-on-top-of-each-other#

Data5<-diamonds%>% count (cut)%>%         mutate (Percent=n/sum (n)) Ggplot (Data5,aes (X=factor (1), Y=percent, Fill=cut) +geom_bar (stat= "Identity", width=3) +mycolour_7+        coord_polar ("y") +pie_theme+mytitle

2) Specify the x-axis, the x-axis is also the color (fill), first draw a column chart, and then into a circle. The disadvantage is that the formula is relatively cumbersome.

Ggplot (Data5,aes (x=cut,y=percent,fill=cut)) +        Geom_bar (stat= "Identity", width=3) +        Mycolour_7+coord_polar ("x") +pie_theme+mytitle

But I've tried it many times and it's hard to understand how to tag in a pie chart. If you want to label a pie chart, it might not be as good as a column chart.

Attach a faceted column drawing method:

DATA5_1<-DATA5%>% Filter (color%in% C ("D", "E", "F", "G")) Ggplot (Data5_1,aes (X=factor (1), y=percent,fill=cut)) + Geom_bar (stat= "Identity", width=3) +mycolour_7+        coord_polar ("y") +pie_theme+facet_wrap (~color,ncol = 4) +        Theme (legend.position= "Bottom") +mytitle

6. Line chart

In addition to the above column chart, the line chart we do more.
Simple line chart Just do it right.
And then, like this,

The main point is to make a two-way table such as the a-b-variable, then the x-axis is a,group to B,colour B
The following code shows the processing
If the group is removed, the line chart will not know how to handle the numbers.

Data6<-diamonds%>% count (color,cut)%>% filter (color%in% C ("D", "E", "F"))%>%        mutate (Percent=n/sum (n )) Ggplot (Data6,aes (X=cut,y=n,group=color,colour=color)) +geom_line (size=1.5) +mypoint+        mycolour_line_7+ Mytheme+mytitle

There are some other useful graphics

In short, Ggplot2 's grammar is more unique, and in fact there are pits everywhere, there are surprises. If it is a commercial plot, it takes 1.1 points to explore and change to ensure that the style and details are flawless.
However, the Ggplot2 drawing has the advantage that once the common drawing code is sorted out, it can be applied indefinitely, especially those format brushes, pre-set themes, etc. That is, the Ggplot2 drawing, is completely can be painted faster, and low cost of development.

In addition, Ggplot2 drawing, the individual is more optimistic about its various mappings, as well as in exploratory data analysis of the ability to quickly map, and the ability to combine with maps and so on. There are dynamic interactions and so on.

Examples of the more popular R/python and dynamic Web pages (mostly D3) are now example 1, example 2

I hope you will not confine the drawing to some of the most commonly used graphics and formatting adjustments selected above. Please allow me to spit a sentence, this style learning is really bitter (づ ̄~~ ̄| | |) Old

Resources:

1, Liu Wanxiang Teacher's business chart series
2. Visual Technology of R
3. Ggplot2: Data analysis and charting technology
4, the Almighty StackOverflow
5, Ggplot2
6, Color Related:
Liu Wanxiang Early one of the color blog post
HTML two-color collocation, and HTML by style color palette

R language: Ggplot2 fine drawing--A practical commercial chart drawing as an example

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.