Objective
Data visualization is a very important part of data mining, which is used not only in the process of accessing and understanding data, but in the whole data mining.
Because data visualization can not only visualize the data, let you have a better understanding of the overall data, but also allows you to clearly express their views. Therefore, not only in the implementation of the project, when talking to the customer needs or writing papers, data visualization can also help you.
But before introducing the detailed drawing of the chart, let's take a look at two basic image plotting functions plot and legend.
drawing Base Functions-plot
In the R language, plot is the basic function for drawing points and line segments.
The most basic method of invocation is plot (x-axis data, y-axis data). However, plot also provides a number of parameters to optimize:
PCH: The coordinates used to display a point, either a character or an integer from 0 to 25. such as: pch= "+", pch=1
lty: line type. such as: Lty=2,lty=1
LWD: line width. such as: lwd=2
Col: Point, Line, text, fill area color settings, Col.axis, Col.sub, Col.main respectively, corresponding to the axis callout, sub-title, main title color. such as col=2, col.sub=2
Font : Fonts settings. Ibid .
CeX: The character expansion rate, which represents the ratio of expected characters (including drawing characters) to the size of the default size.
Xlim and Ylim: Represents the length of the x and Y axes, such as plot (Passign, type= "L", Xlim=c (0,100)), which indicates that the x-axis coordinates are from 0 to 100.
The add=true force function runs as a low-level drawing function, loading new graphic elements on the current diagram (only appropriate for some functions).
Axes=false prohibit generation of axes | It is useful when you want to draw a personalized axis with the function axis (). The default value is Axes=true, which indicates the resulting axis
log:log= "x", log= "y", log= "XY" make the x-axis, y-axis, or both a logarithmic axis, which works for many graphs, but not all.
the type= parameter type= controls the type of output graphics (especially lines):
type= "P" displays only points (default)
type= "L" Display lines
type= "B" (simultaneous) display points and lines
type= "o" to overwrite dots on line
type= "H" draws a vertical line (high density (high-density)) from point to zero axis (x axis )
type= "S"
type= "S" step chart. The first form, the top of the vertical line matches the data point, and the second form, the bottom match.
type= "n" graphic is not displayed. But the axes are still displayed (the default), and the coordinates are still set with the data. This is perfect for drawing later with low-level drawing functions.
xlab=string/ylab=string: Sets the label for the X and Y axes. You can modify the default labels with these parameters. The default label is often the name of the object used in the advanced drawing function.
On the basis of the plot function, you can draw points, draw lines, and add text. The functions of drawing points and drawing lines, respectively, are points and lines functions, which are simpler to call and no longer elaborate.
Another important drawing function is described below.
drawing Base Functions-legend
Legend (x, Y, Legend, ...) Used to add a legend (legend) to a specific position in the current diagram. Identification characters, line formats, colors, and so on are all annotated by the labels in the character vector legend. Another parameter v containing the corresponding value of the drawing unit (a vector consistent with legend length) must be given:
Legend (, Fill=v)
-Fill the color of the box
Legend (, Col=v)
-The color of the dots or lines
Legend (, Lty=v)
-Line Style
Legend (, Lwd=v)
-Line width
Legend (, Pch=v)
-Identifying characters (character vectors)
Histogram
Use the Hist function to draw a histogram of a column variable, as shown. It is a histogram of the amount claimed in the insurance claims database.
The function call code for the graph is:
The Hist function has several parameters:
-First parameter: data vector
-Density: Histogram shadow factor. The higher the value, the greater the shadow.
-Main: Straight-side icon title. such as "Histogram of Freq of Insurance$claims".
-Xlab: Horizontal Name
-Ylab: Longitudinal axis name
-Col,border: The color of the histogram and the color of the border. The tonal style can be customized, but is mutually exclusive with the density parameter.
-Break: Group spacing
Bar chart
The chart is a bar map of the age of the claimant in the insurance claim data set.
It can be seen that the bar chart is similar to the histogram, but it is characterized by the need to customize each interval, because the meaning of the expression between these sections may be without any connection. At the same time, it can handle non-numeric data statistics well. Therefore, the use of such graphs is also very high frequency, should be paid attention to.
In general, the following two steps are required to draw a bar chart:
1. Generate a statistical vector. That is, the value that each bar represents is exactly how much.
The calling code in this example is:
The resulting vector is the claimant's number of people within each interval.
2. Make the drawing.
The calling code is:
The Barplot function has several parameters:
-First parameter: Data vectors, which often need to be customized by self-calculation.
-Names.arg: The name of each bar
-density,main,xlab,ylab,col,denstity: Meaning with histogram. However, the col,denstity parameter is vector format-in order to customize each bar individually.
-besides: whether to draw a grouped bar chart or a stacked bar chart. The specific use of this parameter is stated below.
In particular, the bar chart is more commonly used to draw grouped bars, as shown in:
To draw this bar chart, you need to make the following changes on the basis:
1. Generate two sets of statistical vectors and bind them by function Rbind:
2. In the Draw function Barplot, add the parameter setting Beside=true:
If not added, an overlay style bar chart will be generated automatically:
3. Finally, add the specific meanings of the two different bars at the top left of the graph:
Pie chart
This example shows a 3D figure. It needs to use a new package Plotrix. The pie chart is simple to draw, and the calling code example is as follows:
The main description is the following two parameters:
-Explode: This parameter is the spacing between individual PIE members
-Labelcex: The size of the gap between each pie
The other parameters are almost the same as the drawing functions of the previous graphs, no longer a matter of exhaustion.
Chinese character compatibility solution
When exporting to high-definition PDF format, there are sometimes problems with the incompatibility of Chinese characters. Solutions are:
1. Install the Cario package and load
2. Call function Cairopdf ("PDF full path name") to specify the PDF file save path and file name
3. Add a new parameter family = "Font name" when calling the drawing function. The font parameter table is shown at the end of the article.
4. Statement Dev.off Execution Save
Here's a sample code to save a high-definition PDF:
Library (Cairo) cairopdf ("f:\\1.pdf") Pie3d (claims_age, Labels=c ("<25","25-29","30-35",">35"), explode = 0.1, Labelcex = 0.8, main ="Chinese characters", col = C ("Green","Blue","Orange","Yellow"), family ="SimSun") Dev.off ()
Summary
The R language also supports many types of graphs, and some of the more complex plots even stand alone as a specialized package. You can choose and draw according to your actual needs.
Attached font parameter table:
1 new fine Ming body PMingLiU
2 Fine Ming Body MingLiU
3 standard italics DFKAI-SB
4 Blackbody Simhei
5 Song Body SimSun
6 New song Body Nsimsun
7 Imitation Fangsong
8 italics Kaiti
9 Imitation _gb2312 fangsong_gb2312
10 italics _gb2312 kaiti_gb2312
11 Microsoft is in bold Microsoft Jhenghei
12 Microsoft Jacob Black Microsoft Yahei
13 Script Lisu
14 Young Round Youyuan
15 Chinese fine Black Stxihei
16 Chinese italic Stkaiti
17 XXFarEastFont-Stsong
18 XXFarEastFont-Stzhongsong in Chinese Song Dynasty
19 Chinese imitation Stfangsong
20 founder Shu Body Fzshuti
21 founder Arial Fzyaoti
22 Chinese Choi Wan Stcaiyun
23 Chinese Amber Sthupo
24 Chinese script Stliti
25 Chinese Xingkai Stxingkai
26 Chinese New Wei Stxinwei
Chapter Two: Data visualization-Basic API