"Data Analysis R language Combat" study notes the fourth chapter of the data description

Source: Internet
Author: User
Tags sin ggplot

4.1 R Drawing Overview

The following two functions can show examples of two-dimensional, three-dimensional graphs, respectively:

>demo (Graphics)

>demo (PERSP)

R provides a variety of drawing-related commands that can be divided into three categories:

Advanced Drawing Commands: Create a new plot on a graphics device that may include axes, labels, headings, and so on.

Low-level drawing commands: Add more graphic elements, such as extra points, lines, and labels, to a graphic that already exists.

Interactive Graphics command: Allows you to interactively use the mouse to add graphical information or extract graphical information to an existing graphic.

The use of the R language mapping, mainly in accordance with the following steps:

① take the raw data and prepare the required variables for the drawing.

② the drawing area to be set, split, if necessary.

③ draw a graphic, such as creating an axis well to draw a point chart, curve, or other type of diagram.

④ callout Graphics. Labels the graphic, including adding headings, axis callouts, text callouts, and so on in the drawing.

⑤ format the graphic and add the legend. This includes setting the line width, line style, color in the drawing, the shape, size, color, and axis formatting of the labeled point.

⑥ Save and export graphics. Saves or exports graphics in the specified file format, properties, for later use.

4.2 Plot Area Segmentation

There are three main functions par (), layout (), Spiit.screen () to complete the segmentation of the graphics area.

4.2.1 function par ()

The function par () divides the drawing area into sections of the rule, such as par (Mfrow=c (3,2)), which divides the graphics area into 3x2 multi-frame, each block. Displays a graphic, appears in rows, or uses Mfcol to enter graphics by column.

4.2.2 Function Layout ()

The parameters inside the layout () are a matrix, which, by defining the matrix, flexibly splits the area of the graphics, and the matrix is entered by default by column.

Layout (mat, widths = Rep.int (1,ncol (MAT)), heights = Rep.int (1, Nrow (MAT)), respect = FALSE)

Mat is a matrix, used to set the window partition, the matrix of 0 elements means that the position does not draw, non-0 elements must include a continuous integer value starting from 1, for example,,..., N, the size of non-0 elements to set the order of the graph. Widths is used to set the width of different columns of the window, heights sets the height of the non-peers. For example:

Layout (Matrix (1:4,2,2)) #将绘图区域分成2X2的多重图框.

Layout (Matrix (c (1,3,2,3), 2,2)) #将图形区域分成三个不规则的区域.

Layout (Matrix (c (1,1,2,3,2,3), 2,3)) #将图形区域分成如下的不规则区域.

After the partition is complete, the structure after the area is divided by the instruction Layout.show (3). To cancel the segmentation of the graphics area, enter the command layout (1)

4.2.3 Function Split.screen ()

Split.screen () is also divided by a vector or matrix to control the area flexibly.

>split.screen (c (2,1)) #释将图形区域分成上下两部分显示

[1] 1 2

>split.screen (c (), screen=2) #将第二部分 (lower half) divided into two regions

[1]3 4

>screen (1) #准备在第一个区域绘图

4.3 Two-dimensional graphics

4.3.1 Advanced drawing functions

1. function plot ()

Plot () is the most commonly used advanced drawing function, which is a generic function whose resulting graphics depend on the type of the parameter.

Other advanced drawing functions:

Parameter settings for advanced drawing functions:

Function hist ()

hist (x, breaks = "Sturges", freq = null,probability=!freq,include.lowest = TRUE, right = True,density = NULL, angle = 45,  Col = NULL, border = Null,main = Paste ("Histogram of", xname), Xlim = range (breaks), Ylim = Null,xlab = XName, ylab,axes = True, Plot = true, labels = False,nclass = NULL, warn.unused = True, ...)

4.3.2 Multi-metadata plotting

> Data (Warpbreaks)

> Coplot (breaks~1:54|wool*tension,data=warpbreaks,col= "Red", bg= "Pink", Pch=21,bar.bg=c (fac= "LightBlue"))

4.3.3 Low-level drawing functions

With advanced functions to draw basic graphics, you can add new graphic elements using low-level drawing functions, such as points, legends, markers, etc.

4.3.4 Graphic Beautification

4.3.5 Interactive Drawing Commands

The interactive functions of R allow users to extract and submit information directly on a graph with the mouse, the simplest and most commonly used functions are:

Locator (n,type= "n",...)

> X=rnorm (10)

> Plot (x)

> Locator (5, "O", col= "Red")

$x

[1] 1.929092 4.018157 6.998556 10.034663

[5] 7.945598

$y

[1] 1.21499224 0.97074910 0.43574030

[4] 0.05192964-0.70406106

Another interesting interactive function in R is identify (), which is used to find points in a scatter plot. After entering it, the system reads the coordinate position of the pointer when the mouse is pressed in the graph, and then searches for (x, y) The specified coordinate point, if this is close enough to the position of the pointer, the specified graphic element is returned in the diagram. Identify (x, y, labels, ...)

4.4 Three-dimensional graphics

There are three basic functions for drawing three-dimensional graphs in R, respectively:

Image (x, y, z), which produces a rectangular mesh that represents the z value in different colors.

Contour (x, y, z), which is the value of a contour.

PERSP (x, y, z) produces a 3D surface.

4.5 Lattice Package

Lattice a dataset drawing for multiple variables, most of which use a formula as the primary argument.

For example, Y~x|z represents drawing y about X and drawing multiple graphs with variable z as the basis.

> Library (GGPLOT2)

> Library (Lattice)

> Data (diamonds,package= "Ggplot2")

> Sample=diamonds[sample (Nrow (Diamonds), 1000),]

> Xyplot (Price~carat,data=sample,groups=cut,auto.key=list (Corner=c (1,0)), Type=c ("P", "Smooth"), span=.7,main= " Pricevs. Carat ")

In order to better compare data by a categorical variable, it is sometimes necessary to split the graphics area. When drawing with lattice, it is easy to split the drawing area, as long as you set the parameter layout. Lattice contains functions that draw three-dimensional graphics, where cloud () is used to draw a three-dimensional scatter plot, similar to the Plot3d () effect, but you can group the drawing: Wireframe () is used to plot a 3D surface, similar to the PERSP () effect in the base package.

> X=seq (-pi,pi,len=20)

> Y=seq (-pi,pi,len=20)

> G=expand.grid (x=x,y=y)

> G$z=sin (sqrt (g$x^2+g$y^2))

> wireframe (Z~x*y,data=g,drape=true,aspect=c (3,1), colorkey=true,main=expression (Z=sin (sqrt (g$x^2+g$y^2)))

4.6 Ggplot2 Package

Ggplot2 is a high-level package for drawing in R, which treats the drawing as a mapping-mathematical null asks a map of the graphics meta-space, such as mapping different values to different colors or other graphic properties. Ggplot2 uses a Photoshop-like layer design when drawing, allowing the user to build graphics step-by-step and make it easier for layers to be modified.

4.6.1 Quick Draw

Qplot (x, y = NULL, ..., data, facets =null, margins = False,geom = "Auto", stat = list (null), position =list (null), Xlim = C (na,na), Ylim = C (Na, na), log = "", main = null,xlab= deparse (substitute (x)), Ylab = Deparse (substitute (y)), ASP = NA)

Take the Diamonds data set as an example:

> Sample=diamonds[sample (Nrow (Diamonds), 200),]

> Qplot (Carat,price,data=sample,shape=cut,color)

Adding a smooth curve to the scatter plot above allows you to specify a method for curve fitting by using method parameter, which defaults to method= "loess"-smoothing local regression. Parameter span controls how smooth the curve is, and the larger the value the more smoothly the curve.

> Qplot (Carat,price,data=sample,geom=c ("point", "smooth"), span=.3)

Use Qplot () to draw a more beautiful histogram of the variable carat:

> Qplot (carat,data=diamonds,geom= "Histogram", Binwidth=.1,xlim=c (0,3), Fill=color)

4.6.2 Sub-layer drawing

(1) Data and mappings

Ggplot (Data,mapping=aes (x, Y, <otheraesthetics>))

Where data specifies the dataset: The parameter mapping is used to build the map, usually using the function Aes () to refer to the variable, and other categorical variables, such as color, shape, size, etc., can be specified.

> Sample=diamonds[sample (Nrow (Diamonds), 1000),]

>p=ggplot (Data=sample,mapping=aes (x=carat,y=price,color=clarity)) #定义的第一图层存储于p中

(2) Geometric objects

After the base layer has determined the data source and mapping, the new layer can be added continuously with the plus sign (+). The second layer adds a function of the geometry class, drawing the graphic elements in the diagram other types of graphics, such as histograms, box plots, and so on. such as points, lines, polygons, etc., can also be used to draw.

The basic parameters inside the above function are the same. Take a scatter chart as an example:

Geom_point (mapping=null,data=null,stat= "Identity", position= "Identity", Na.rm=false,...)

The parameter mapping is used to build the map, data specifies the dataset, and if the first layer has been specified, it can be omitted: stat is used for statistical transformations of this layer of data: position is used for the position adjustment of this layer of graphics, often used for bar charts (bar) and histograms, with the value "identity "Dodge" is displayed directly, "stack" is stacked for stacking, "fill" shows relative proportions, "jitter" is used to increase disturbances, and is often applied to scatter plots to prevent overlapping of graphics.

> P+geom_point () +geom_smooth ()

Make the overall smoothing of the above graphic:

> P=ggplot (Data=sample,aes (X=carat,y=price))

> P+geom_point (AES (color=clarity)) +geom_smooth ()

When you do a data map, the function Aes () can be used to set the graphic style, by which vectors the color, shape, and size of the points are set by parameter Color,shape and size, through which a large amount of information can be passed even for a simple scatter plot.

>sample=diamonds[sample (Nrow (Diamonds), 100),]

>p=ggplot (Data=sample,aes (X=carat,y=price))

>p+geom_point (Aes (color=color,shape=cut,size=clarity), alpha=.5,position= "jitter")

(3) Scale

Scale is responsible for controlling the display of graphic properties, mainly including setting the axis scale, modifying color values, legend style, and so on. A function that uses the scale class is equivalent to adding a new layer, so the "+" connection function is still used, except for the basic Layer Ggplot () Other layer settings can be applied to the function Qplot ()

The scale function that sets the axis style generally begins with the "Scales X"

(4) Statistical transformations

Statistical transformation functions, which begin with "stat", can perform some function transformations on the original data, which is a very important function. We can customize the functions, calculate them based on the original data and show them on the graph, or change the default statistical parameters of the Geom_ function drawing.

For example, using Stat_smooth to loess the data, add a nonlinear regression line to the Carat-price scatter plot.

> Sample=diamonds[sample (Nrow (Diamonds), 1000),]

> Ggplot (Sample,aes (X=carat,y=price)) +geom_point () +scale_y_log10 () +stat_smooth ()

The second layer adds a scatter point; the third layer makes a log10 transformation of the y-axis; layer four adds a smooth statistical transformation

(5) Facet

When we want to observe the effect of a categorical variable on the data, the color distinction is not enough by shape alone, and it needs to be grouped and plotted separately based on the different values of the variables. It is necessary to use the number of facets, it controls the data grouping method and the arrangement form, carries on the condition drawing.

Common functions are Facet_wrap (~x, Ncol), where x represents a grouping variable, and Ncol represents the arrangement of the graph, which is divided into columns. You can also use Facet_grid (x~.) Alternative.

> Ggplot (Sample,aes (X=carat,y=price)) +geom_point (Aes (Colour=cut)) +scale_y_log10 () +stat_smooth () +facet_wrap (~ CUT,NCOL=3)

(6) Coordinate system

4.7 Graphics Save

After you complete the drawing, the final step is to save and export the graphic according to the specified file format and properties for later use. Well-drawn graphs can be saved in multiple formats, with the corresponding generating function name, which is its extension. The file formats that can be generated are PNG JPEG and PDF:

PNG (file= "Myplot.png", bg= "Transparent")

JPEG (file= "Myplot.jpeg")

PDF (file= "Myplot.pdf")

After the file is generated, the default is swept in the background, so you need to close the file with Dev.off () before viewing the graphics file

In addition, the function Ggsave () in package Ggplot2 is also used to save graphics, and can be specified as different file types.

Ggsave (Filename=default_name (plot), Plot=last_plot (),

Device=default_device (filename), path=null, scale=1, ...)

filename Specifies the path, name, and extension of the makefile, and the file path can also be set through path; plot fills in the graphic object, which defaults to the last displayed graphic: device specifies which devices to use, automatically extracts file extensions, and scale is a scaling factor. Save the pie chart above as a PDF file, and you can do it with just one simple instruction.

>ggsave (filename= "D:/data/pie.pdf")

This creates a PDF file and saves the graphic in. png format.

"Data Analysis R language Combat" study notes the fourth chapter of the data description

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.