Turn http://www.cnblogs.com/jiangmiaomiao/p/6991632.html
0 Introduction
R supports 4 types of graphics: base graphics, grid graphics, lattice graphics, Ggplot2. Where the Base graphics is the default graphics system for R.
1 Basic graphics Functions plot ()
The type parameter in the plot () command is used to clarify how the drawing is drawn, using the following type values:
- "P" for "points"
- "L" for "lines"
- "O" for "overlaid" (e.g., lines overlapping points)
- "S" for "steps"
Type= "N" is a special option that can be used to draw data from multiple sources on an axis.
For example:
Plot (x,y,xlab= "", ylab= "", pch=2,col= "Red")
PCH: Data point shape
Col: Data point color
2 other types of graphics functions
(1) pie chart: Pie ()
(2) Histogram is the most common way to represent the distribution range of digital variables
hist (): Base R, a histogram that records the number of occurrences of each region
Truehist (): MASS package, the normalized value gives an estimate of the probability density.
A density map can be seen as a smoothed histogram, such as line (density ())
One limitation of histogram and density graphs is that it is difficult to see whether the data conforms to the Gaussian distribution (normal distribution)
Use Qqplot () to observe whether the data conforms to the Gaussian distribution (normal distribution)
(3) sunflowerplot () function
Each point in the scatter plot corresponds to one (x, y) pair, and if the same (x, y) pair appears multiple times, the points overlap and cannot be observed in the scatter plot. There are a number of workarounds for this problem, such as jittering (perturbation), which adds small random values to each x and Y, so that the repeating points appear as nearby point cluster sets. Another effective method is the Sunflowerplot () function, where each repeating value is shown by the sun flower and each petal represents one repetition of a data point
(4) boxplot () function
The BoxPlot () function represents the distribution of each unique value of the number variable y corresponding to the variable x. x variables should not have too many unique values, more than 10 will make the graph difficult to observe.
Optional Parameters:
Varwidth allows the width of the box plot to vary with the variable to show the size of different subsets of data.
Log allows logarithmic transformation of y values
Las allows more readable axis labels
# Create a variable width box chart with a y-axis take logarithm and horizontal labels
BoxPlot (y ~ x data = Boston, varwidth = TRUE, log = "Y", Las = 1)
(5) Mosaic image Mosaicplot ()
Mosaic images can be considered as scatter plots between categorical variables, and can also be used to observe the relationship of digital variables.
(6) Bagplot ()
A simple box chart based on five numbers gives a variation of the range of a number variable:
Maximum, minimum, median, top, bottom four-digit.
The standard box chart calculates the nominal range of data by three of the above numbers, marking points beyond that range as extreme values, represented by separate points. The package diagram represents the relationship of two numeric variables, and the two-dimensional package corresponds to the box in the standard box diagram and indicates the extreme value.
(7) Corrplot () function diagram correlation matrix
The correlation matrix is an effective tool for obtaining a preliminary view of relationships between multiple numeric variables.
In the figure, the elongated ellipse indicates a large correlation between the specified variables, and the near-circular representation of the correlation is approximately 0.
# Load The Corrplot library for the Corrplot () function
Library (Corrplot)
# Compute The correlation matrix for these variables
Corrmat <-Cor (data)
# Generate the correlation ellipse plot
Corrplot (corrmat,method= "ellipse")
(8) Construction and drawing Rpart () Model
Decision trees are easy to observe and interpret, and are a common way of predicting models.
# Load the Rpart library
Library (Rpart)
# Fit An rpart model to predict MEDV from all other Boston variables
Tree_model <-Rpart (Medv~.,data=boston)
# PLOT The structure of this decision tree model
Plot (Tree_model)
# ADD labels to this plot
Text (tree_model,cex=0.7)
(9) Use symbol () function to display a relationship between more than two variables.
A scatter plot shows how a number variable changes with the second number variable. Symbols () allows you to extend the scatter plot to show the effects of other variables. The circles parameter is used to create a bubble chart, each data point is represented by a circle, and the radius is based on the third variable value.
# call Symbols () to create the default Bubbleplot
Symbols (Cars93$horsepower, cars93$mpg.city,
circles = cars93$cylinders)
# Repeat, with the inches argument specified
Symbols (Cars93$horsepower, cars93$mpg.city,
Circles = Cars93$cylinders,
inches = 0.2)
(10) Lattice diagram example
# Load the lattice package
Library (Lattice)
# use Xyplot () to construct the conditional scatterplot
Xyplot (calories ~ Sugars | shelf, data = uscereal)
3 Environment Functions par ()
The par () function is used to set the graphics parameters, and the parameters remain valid until the next par () command is reset.
The par () command of the null parameter returns all the current graph parameter values.
Example: Creating a graph array of 2 columns in a row
Par (Mfrow = C (1, 2))
4 Adding details to a graphic
(1) line () Add a line to an existing diagram
# Create the numerical vector x
x <-seq (0, length = 200)
# Compute The Gaussian density for x with mean 2 and standard deviation 0.2
Gauss1 <-dnorm (x, mean = 2, SD = 0.2)
# Compute The Gaussian density with mean 4 and standard deviation 0.5
Gauss2 <-dnorm (x, mean = 4, SD = 0.5)
# Plot the first Gaussian density
Plot (x, gauss1, type = "L", Ylab = "Gaussian probability density")
# ADD lines for the second Gaussian density
Lines (x, gauss2, lty = 2, LWD = 3)
(2) points ()
In plot () or points (), the PCH parameter can be set based on the variables in the data.
# Create An empty plot using type = "N"
Plot (MTCARS$HP, mtcars$mpg, type = "N",
Xlab = "Horsepower", Ylab = "gas mileage")
# Add points with shapes determined by cylinder number
Points (mtcars$hp, mtcars$mpg, pch = mtcars$cyl)
# Create a second empty plot
Plot (MTCARS$HP, mtcars$mpg, type = "N",
Xlab = "Horsepower", Ylab = "gas mileage")
# Add points with shapes as cylinder characters
Points (MTCARS$HP, Mtcars$mpg,
PCH = As.character (mtcars$cyl))
(3) Adding trend lines for linear regression models
Abline () adds a line to an existing drawing. This line is specified by the Intercept parameter A and slope parameter B.
For example, abline (a = 0, B = 1) Adds an equidistant reference line with a intercept of 0.
Parameters can also be defined by a linear regression model
# Build A linear regression model for the Whiteside data
Linear_model <-lm (gas ~ Temp, data = Whiteside)
# Create a gas vs. Temp Scatterplot from the Whiteside data
Plot (whiteside$temp, Whiteside$gas)
# use Abline () to add the linear regression line
Abline (Linear_model, lty = 2)
(4) Use text () Mark Graphic Properties
Parameters:
- x Specifies the value of the X variable
- y Specifies the value of the Y variable
- Labels a label that specifies the X-y key value pair.
Adj Any value between 0-1, less than 0, the word to the right of the X position, greater than 1, the word to the left of the X position
CeX ratio of font size to default value
Font fonts
SRT parameter Rotation font
(5) Legend ()
Add explanatory text to a graphic
Legend ("TopRight", PCH = C (+, 1), Legend = C ("Before", "after"))
(6) Use axis () Add custom Axes
When you need to use your own axis labels, you can set the parameter axes = False in the drawing function to prevent the generation of the default axis, and then call axis to generate the custom axis
Parameters of Axis ():
Side = Axis position, 1 bottom, 2 left, 3 top, 4 right
At which points to draw the scale
Labels labels for each tick
# Create A boxplot of sugars by shelf value, without axes
BoxPlot (sugars ~ shelf, data = uscereal,
axes = FALSE)
# ADD A default y-axis to the left of the BoxPlot
Axis (side = 2)
# ADD An x-axis below the plot, labelled 1, 2, and 3
Axis (side = 1)
# ADD A second x-axis above the plot
Axis (side = 3, at = C (1, 2, 3),
Labels = c ("Floor", "Middle", "top")
(7) with SUPSMU () add a smooth trend curve
Some scatter plots are obviously not linear and need to use curves to highlight the behavior of the data. Parameter bass controls the smoothness of the trend curve, the default value is 0, and a larger value on time (up to 10) produces smoother curves.
# Create a scatterplot of mpg.city vs. Horsepower
Plot (Cars93$horsepower, cars93$mpg.city)
# call SUPSMU () to generate a smooth trend curve, with default bass
Trend1 <-supsmu (Cars93$horsepower, cars93$mpg.city)
# ADD This trend curve to the plot
Lines (TREND1)
# call SUPSMU () to a second trend curve, with bass = 10
Trend2 <-supsmu (Cars93$horsepower, Cars93$mpg.city,
Bass = 10)
# ADD This trend curve as a heavy, dotted line
Lines (trend2, lty = 3, LWD = 2)
5 Determine if scatter plots are too numerous
Matplot () generates multiple scatter plots in the same axis. The points in the scatter plot are represented by a number from 1 to N, and N is the total number of scatter graphs that are included.
# Set up a two-by-two plot array
Par (Mfrow = C (2, 2))
# use Matplot () to generate an array of the Scatterplots
Matplot (Df$calories, df[, C ("protein", "fat")],
Xlab = "Calories", Ylab = "")
# ADD a title
Title ("Scatterplots")
# use Matplot () to generate an array of three scatterplots
Matplot (Df$calories, df[, C ("protein", "fat", "fibre")],
Xlab = "Calories", Ylab = "")
# ADD a title
Title ("Three Scatterplots")
# use Matplot () to generate a array of four scatterplots
Matplot (Df$calories,
df[, C ("protein", "fat", "fibre", "Carbo")],
Xlab = "Calories", Ylab = "")
# ADD a title
Title ("Four Scatterplots")
# use Matplot () to generate an array of five scatterplots
Matplot (Df$calories,
df[, C ("protein", "fat", "fibre", "carbo", "sugars")],
Xlab = "Calories", Ylab = "")
# ADD a title
Title ("Five scatterplots")
6 Determine if the number of words is too large
Wordcloud () displays text of different sizes based on the frequency that appears. Higher-frequency text is larger, and fewer text fonts appear smaller.
First argument: character vector for text
Second argument: number vector for the number of occurrences of each text
Scale: is a two-dollar vector that represents the relative size of the maximum text and the smallest text
MIN.FREQ specifies that the text cloud contains only text that appears at least min.freq times, and the default value is 3.
# Create The Wordcloud of all model names with smaller scaling
Wordcloud (words = names (model_table),
Freq = As.numeric (model_table),
Scale = C (0.75, 0.25),
Min.freq = 1)
7 using multiple graphs to observe data
# Set up a two-by-two plot array
Par (Mfrow = C (2, 2))
# Plot the raw duration data
Plot (geyser$duration, main = "Raw data")
# Plot The normalized histogram of the duration data
Truehist (geyser$duration, main = "histogram")
# Plot The density of the duration data
Plot (density (geyser$duration), main = "density")
# Construct The normal qq-plot of the duration data
Qqplot (geyser$duration, main = "Qq-plot")
8 structure and Presentation layout matrix
1 , using Matrix () to generate a matrix of a graphic position, and then use the layout () Create a graphical array, layout.show () the shape used to validate the graphics array.
# Define Row1, Row2, row3 for plots 1, 2, and 3
Row1 <-C (0, 1)
Row2 <-C (2, 0)
ROW3 <-C (0, 3)
# Use the matrix function to combine these rows into a matrix
Layoutmatrix <-Matrix (C (Row1, Row2, ROW3),
Byrow = TRUE, nrow = 3)
# Call the layout () function to set up the plot array
Layout (Layoutmatrix)
# Show where the three plots'll go
Layout.show (3)
2 Creating a Graphics array
# Set up the plot array
Layout (Layoutmatrix)
# Construct The vectors indexb and indexa
Indexb <-which (Whiteside$insul = = "before")
Indexa <-which (Whiteside$insul = = "after")
# Create Plot 1 and add title
Plot (Whiteside$temp[indexb], Whiteside$gas[indexb],
Ylim = C (0, 8))
Title ("Before Data only")
# Create Plot 2 and add title
Plot (Whiteside$temp, Whiteside$gas,
Ylim = C (0, 8))
Title ("Complete DataSet")
# Create Plot 3 and add title
Plot (Whiteside$temp[indexa], Whiteside$gas[indexa],
Ylim = C (0, 8))
Title ("After data only")
3 , creating arrays of different size shapes
# Create Row1, Row2, and Layoutvector
Row1 <-C (1, 0, 0)
Row2 <-C (0, 2, 2)
Layoutvector <-C (Row1, Rep (Row2, 2))
# Convert Layoutvector into Layoutmatrix
Layoutmatrix <-Matrix (layoutvector, Byrow = TRUE, nrow = 3)
# Set up the plot array
Layout (Layoutmatrix)
# Plot Scatterplot
Plot (Boston$rad, BOSTON$ZN)
# Plot Sunflower Plot
Sunflowerplot (Boston$rad, BOSTON$ZN)
Ix. graphical functions can return useful information
In addition to creating a graph, the Barplot () function can also return a vector of numbers at the center of each bar in the graph.
This return value is useful when we want to place text on bars in a horizontal bar chart. The return value can therefore be obtained and used as the y parameter in the text () function. Allows us to place the text in the middle of each horizontal bar at any x position.
# Create A table of cylinders frequencies
TBL <-table (cars93$cylinders)
# Generate A horizontal barplot of these frequencies
Mids <-Barplot (tbl, Horiz = TRUE,
Col = "Transparent",
Names.arg = "")
# ADD names labels with text ()
Text (mids, names (TBL))
# ADD count labels with text ()
Text (Mids, As.numeric (TBL))
X. Save graphic results as a file
PNG files are easy to share and as email attachments. Use the PNG () function to generate and name a PNG file and create a special environment to get all the graphics output until you exit the environment using the Dev.off () directive.
# call PNG () with the name of the file we want to create
PNG ("Bubbleplot.png")
# Re-create the plot from the last exercise
Symbols (Cars93$horsepower, cars93$mpg.city,
Circles = Cars93$cylinders,
inches = 0.2)
# Save our file and return to our interactive session
Dev.off ()
# Verify that we have created the file
List.files (pattern = "png")
11 Color of the graphic
1 kind of recommended color
Iscolors <-C ("Red", "green", "yellow", "blue", "black", "white", "pink", "cyan", "gray", "orange", "Brown", "purple")
2 using color to enhance bubble chart
# Iliinsky and Steele color name vector
Iscolors <-C ("Red", "green", "yellow", "blue",
"Black", "white", "pink", "cyan",
"Gray", "orange", "Brown", "purple")
# Create The colored Bubbleplot
Symbols (Cars93$horsepower, cars93$mpg.city,
circles = cars93$cylinders, inches = 0.2,
BG = Iscolors[as.numeric (cars93$cylinders)])
3 use color to enhance stacked bar charts
The Barplot function defaults to different segments of each bar using shades of gray
# Create A table of cylinders by Origin
TBL <-table (cars93$cylinders, Cars93$origin)
# Create The default stacked Barplot
Barplot (TBL)
# Enhance this plot with color
Barplot (tbl, col = iscolors)
R language Drawing