Objective
Ggplot2 is the most powerful graphing software package in R language, stronger than its own data visualization concept . When you are familiar with the basic routines of GGPLOT2, data visualization will become very easy and organized.
This article mainly on the visual concept of ggplot2 and development routines to do a general introduction, specific drawing methods (such as line chart, column chart, box line diagram, etc.) will be explained in the following article respectively.
Core concept
1. Separation of data, data-related drawings, data-independent drawings
This can be said to be the most attractive point of ggplot2. As we all know, data visualization is the process of mapping the information we explore from data to graphical elements.
Ggplot2 the mapping of data, data to graphical features, and the separation of data-independent graphical features, a bit like the Java MVC Framework. This allows GGPLOT2 users to clearly and distinctly feel a real part of the data analysis diagram, targeted development, adjustment.
2. Layer-style development logic
In Ggplot2, the drawing of the graph is added to each layer. For example, we first decided to explore the relationship between height and weight, then drew a simple scatter plot, then decided that the best way to differentiate between sexes, the color of the midpoint of the figure corresponds to a different gender, and then decided to better distinguish between the region, split into three small maps of East and west, and finally decided to join the regression line, intuitively see trends This is a structured process of layered propulsion, in which additional information is added to each push. In the process of using GGPLOT2, each of these steps is a layer and can be superimposed onto the previous step and visualized.
3. Free combination of various graphic elements
Thanks to Ggplot2 's layered development logic, we are free to combine various graphical elements and to be fully free to play with imagination.
Basic Development Steps
1. Initialize-Ggplot ()
This step involves setting the x-axis, y-axis, and " aesthetic features " of the graph. The basic form is as follows:
P <-ggplot (data =, AES (x =, y =))
In this step, setting the x-axis and setting the y-axis is well understood. So what is the "aesthetic feature"?
For example, in this scatter plot, the X-axis represents age, and the y-axis represents height, well understood:
But in addition to showing the relationship between age and height, the figure shows the weight of each sample point: The darker the color, the greater the weight. So weight information, like age and height, also needs to be bound to a specific column. This column is the "aesthetic feature" in the scatter plot.
Take a look at the R language drawing code:
Ggplot (Heightweight, AES (X=ageyear, Y=heightin, colour=weightlb)) +geom_point ()
The colour parameter is the "aesthetic feature" of the graph.
For example, in the following histogram, the X-axis represents the date, and the y-axis represents the weight, well understood:
However, each date in this figure corresponds to two different weights and is compared by two bars, which is another "aesthetic feature".
And look at the drawing code:
Ggplot (Cabbage_exp, AES (X=date, Y=weight, Fill=cultivar)) +geom_bar (position= "Dodge", stat= "identity")
The fill parameter is the "aesthetic feature" of the graph.
To sum up, each sample point in the figure can also display information in other forms, such as size, color depth, grouping, in addition to its coordinate position. These new forms, which require binding columns, are called "aesthetic features."
The "Aesthetic characteristic" form is given as a column in the same way as the X, Y axis, and the number of elements in the column and the x, Y axis are necessarily equal. It is also set in the same way as the X, Y axis in the Ggplot () function's AES parameter brackets.
2. Draw the Layer-geom_bar ()/geom_line (), etc.
The main work in the previous step was to configure the data for data visualization, and then draw different graphs based on the needs of the business, such as the line chart/histogram/scatter plot, and so on. The specific implementation method will be explained in detail in the following chapters, which focus on the stat parameters in the drawing function . This parameter is the statistic of the conflict sample point, which defaults to the I-Dentity, which means preserving the sample point origin (Y) value, or sum, representing the sum of the (Y) values that appear at this point, and so on.
3. Adjusting data-related graphic elements-scale series functions, some proprietary functions
In Ggplot2, the scale ruler mechanism is specifically responsible for completing the mapping of data to image elements. Perhaps you would ask, "aesthetic features" are not already defined in this mapping? However, the fact is that the "aesthetic feature" only selects the data before the mapping, and does not specify what graphic elements to map to.
For example, if a table records the length, width, and depth of different kinds of pools. Now we need to draw a histogram of the different types of pool lengths and widths, so the initialization is done with this mapping:
The scale function accomplishes this mapping:
Obviously a maps for the red, B-maps for blue.
Perhaps you will also ask, my Code does not use scale, then how is the mapping done? The system has a default mapping, just as the drawing function has default parameters stat=iDentity.
4. Adjust data-independent graphical elements-theme (), some proprietary functions
This section includes setting the image Title format, the text font type and the data itself regardless of the element. You can do this simply by calling the theme () function or some proprietary function, such as the Annovate function, which adds a comment to the picture.
Once a layer is drawn, you can observe the adjustment and then start making the next layer until the entire picture is drawn.
Summary
As the beginning of this series of blogs, this paper introduces the data visualization package Ggplot2 of R language from the general and abstract point of view. If readers feel that some concepts are abstract difficult to understand, do not have to tangle, after reading the series of other articles and then back to look at this article, I believe there will be a new harvest.
The next article will explain how to use the R language Ggplot2 package for various data visualizations from a specific, detailed perspective.
Finally, admire some of the finished drawings made using Ggplot2:
Chapter One: Ggplot2 drawing overview