Data visualization of the R language

Last Update:2017-03-09 Source: Internet

Author: User

Tags square root

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There are four frameworks for generating graphs, basic graphs, meshes, grids, and Ggplot2 in R.

Visualization of categorical data using bar, point, column, spine, treemap, pie, and 40 percent charts
Visualization of continuous data Using box plots, histograms, scatter plots and their variants, Pareto graphs

==============================================

I. Visualization of categorical data

1. Bar Chart
The bar chart can be implemented through the Barplot function in the graphics library, or through the Barplot function of the lattice package, using the data in the RSADBE package for example

(1)
> Library (rsadbe)
> Data ("severity_counts")
> Library (lattice)
> Barchart (severity_counts,xlab= "bug count", xlim=c (0,12000))
Load the packet and dataset, Xlab set the chart name, Xlim set the frequency range.

(2)
> Library (lattice)
> Barplot (severity_counts,xlab= "bug count", horiz=true,xlim = c (0,12000))
Horize set to True indicates that a bar chart is generated

(3)
> Data (bug_metrics_software)
> Barplot (bug_metrics_software[,,1],beside=true,col = C ("lightblue", "mistyrose", "lightcyan", "lavender", " Cornsilk "), legend = c (" JDT "," PDE "," Equinox "," Lucene "," Mylyn ")) title (main =" before Release Bug Frequency ", font.main = 4 )
Beside=true meaning is adjacent to the graph, if not set to stacked column chart, col for set color, legend for set legend

(4)
> par (mfrow=c)
> Barplot (bug_metrics_software[,,1],beside = TRUE)
> Barplot (bug_metrics_software[,,2],beside = TRUE)
Par (mfrow=c) means to display two graphs side by side on a single chart

2. Point Chart
Point graph, also known as the Cleveland Point chart, can be implemented using the Dotplot in the Dotchart and lattice packages in the graphics package
(1)
>dotchart (severity_counts,col=15:16,lcolor= "black", pch=2:3,labels=names (severity_counts), main= "Dot Plot for The Before and after Release Bug Frequency ", cex=1.5)
The col=15:16 is used to set the color, Lcolor sets the color of the line through the point, pch=2:3, to set the point representation of the graphic, labels and main for setting the displayed information, cex=1.5 means that the label font is magnified 5 times times

(2)
> par (mfrow=c)
> Dotchart (bug_metrics_software[,,1],gcolor=1:5,col=6:10,lcolor = "black", pch=15:19,labels=names (Bug_Metrics_ software[,,1]), main= "before Release Bug Frequency", xlab= "Frequency Count")
> Dotchart (bug_metrics_software[,,2],gcolor=1:5,col=6:10,lcolor = "black", pch=15:19,labels=names (Bug_Metrics_ software[,,2]), main= "after Release Bug Frequency", xlab= "Frequency Count")
The two sets of point graphs are displayed on a single graph.

3. Line chart
Line charts are available for categorical and continuous data, and if the data is continuous data, use the plot () command, as long as you set the type= option, such as:
> Plot (nile,type= "l")

If the order of the data is arbitrary, then the polyline is also arbitrary so that the trend cannot be observed, so it needs to be sorted so that it has some trend, such as:
> Plot (sort (mf$length), mf$no3,type= "l")

For categorical data, If you use plot () directly, the horizontal axis will not be able to display label values, which you need to add yourself, such as:
> Plot (rain,type= "b", axes = false,xlab= "month", Tlab=rain)
> Month=c ("jan", "feb", "mar", "apr", "may", "june", "jul", "april", "sep", "oct", "nov", "dec")
> Axis (side=1,at=1:length (rain), labels=month)
> Axis (side=2)
> Box ()
First use Axes=false to close the axis, then use axis () to redefine the x-axis, side=1 for which direction you want the axis to be built, 1 for the bottom, 2 for the left, 3 for the top, and 4 for the Right. At= indicates the position of the coordinate points to be specified in the form of 1:n, labels is the sign Signature. finally, It is encapsulated in the box () graphics box.

3. Spine Chart
Unlike bar charts, the spine chart is the same length and the width varies according to the frequency, and the spine chart can be implemented using the Spineplot Function.
> Shiftoperator <-matrix (c (+, b, d, d, d, a, 1, a, b), nrow=3,dimnames=list (c ("shift", "shift 2", "shift 3 "), C (" Operator 1 "," opereator 2 "," Operator 3 ")), Byrow=true)
> Spineplot (shiftoperator)
> Abline (h=0.33,lwd=3,col= "red")
> Abline (h=0.67,lwd=3,col= "red")
> Abline (v=0.33,lwd=3,col= "green")
> Abline (v=0.67,lwd=3,col= "green")

4. Mosaic Diagram
Mosaic images can be implemented using the Mosaicplot function, and we use Titanic data for example
> Xtabs (freq~class,data=titanic)
> prop.table (xtabs (freq~class+survived,data=titanic), margin=1)
> Xtabs (freq~sex,data=titanic)
> prop.table (xtabs (freq~sex+survived,data=titanic), margin=1)
> Xtabs (freq~age,data=titanic)
> prop.table (xtabs (freq~age+survived,data=titanic), margin=1)
> Mosaicplot (titanic,col=c ("red", "green"))

5. Pie Chart
Pie charts are simple, but sometimes not conducive to analysis and observation, pie chart using the pie () function to implement
> Pie (severity_counts[1:5])
> Title ("Severity Counts post-release of JDT Software")

6.40 percent figure
The 40 percent figure is a way of showing the 2*2*k's three-dimensional list, which is a pie chart of the list of k-2*2, the frequency of the four subregions of the list is expressed in One-fourth circles, the radius and the square root of the frequency are proportional, and the radius of the 40 percent figure is different compared to the pie Chart. 40 percent graphs can be implemented using the Fourfoldplot function
> Fourfoldplot (ucbadmissions,mfrow=c (2,3), space=0.4)

===================================================

second, The visualization of continuous type variable data

1. Box line diagram
The box plot is based on the minimum, bottom four, median, four, and maximum values, and can be implemented using the BoxPlot function in the graphics package and the Bwplot function in the lattice package, as

(1)
> Library (rsadbe)
> Data (resistivity)
> BoxPlot (resistivity, range=0)

(2)
> Library (lattice)
> Resistivity2 <-data.frame (rep (names (resistivity), each=8), c (resistivity[,1],resistivity[,2]))
> Names (resistivity2) <-c ("Process", "resistivity")
> Bwplot (resistivity~process, Data=resistivity2,notch=true)

We can optimize the chart by adding some settings, such as
> BoxPlot (fw$count,fw$speed,names = C ("count", "speed"), xlab= "var", ylab= "value", range = 0,col= "gray90")
The name is the data label, Xlab and Ylab are the coordinate value labels, range is the extended range of the box line to the maximum and minimum, and Col is the box Color.

You can adjust the box plot to a horizontal format by setting horizontal=true, and if the data is a data frame and is in the form of a response variable and a predictive grouping variable, you can use the form of a public announcement Syntax-response variable ~ Predictor variable, as
> BoxPlot (grass$rich~grass$graze,data=grass,horizontal=true,range=0)

Response variable rich on the left, the Predictor (grouping Variable) graze on the right side and set as the horizontal box line Chart.

2. histogram
Histograms can be implemented using the Hist function and the histogram function, and we use the Galton data as an example
> Data (galton)
> par (mfrow=c (2,2))
> hist (galton$parent,breaks= "FD", xlab= "Height of the parent", main= "histogram for the parent Height with Freedman-diaconis Breaks ", xlim=c (60,75))
> hist (galton$parent,xlab= "Height of the parent", main= "histogram for the parent Height with Sturges Breaks", xlim=c (60,75))
> hist (galton$child,breaks= "FD", xlab= "Height of child", main= "histogram for child Height with Freedman-diaconis Breaks ", xlim=c (60,75))
> hist (galton$child,xlab= "Height of child", main= "histogram for child Height with Sturges Breaks", xlim=c (60,75))

In addition, there are options for setting the histogram, which apply to most graphical commands
Col: graphic Color
Main: Graphics Title
Xlab:x Axis Title
Ylab:y Axis Title
XLIM:X Shaft Range
YLIM:Y Shaft Range
Break: sets the split range of the histogram
Freq: logical option, True to generate frequency data, false to generate probability density data

3. Scatter chart
Histograms can be used to understand the nature of variables, scatter plots can be used to understand the relationship between variables, two variables can use the plot (x, y) function, x, y is two vectors, if the data is two columns of data frame, the first column as x, the second column as y, if it is a multi-column data frame, a scatter chart matrix You can use the pairs function to make a scatter chart matrix, for example:

(1)
> Data (DCD)
> Plot (dcd$drain_current, dcd$gts_voltage,type= "b", xlim=c (1,2.2), ylim=c (0.6,2.4), xlab= "current Drain", ylab= " Voltage ")
> Points (dcd$drain_current,dcd$gts_voltage/1.15,type= "b", col= "green")
The points function is used to add other points to the chart, and plot () can also use Xlab and Ylab to customize the axis Labels.

You can define a scatter plot by setting the pch= option, the value of the PCH can be set to a number of 0-25, and each number corresponds to a symbol, such as:
> Plot (0:25,rep (1,26), pch=0:25,cex=2)
In this command, we set X to 0-25, y to 1 and repeat 26 times with the rep command, cex= adjust the character size.
In addition to the 0-25 number, you can also enter the desired symbol directly, such as:
> Plot (fw$count,fw$speed,pch= "+", cex=4,col= "gray90")
In this example, we set the symbol to +, the character size CeX to 4, and the color col to Gray

Plot () calculates the best scale for coordinate values based on the data, or can be customized via xlim= (start,end) and ylim= (start,end), such as:
> Plot (fw$count,fw$speed,xlim=c (0,30), ylim=c (0,50))

Plot () can also use the formula syntax, ~ Left is the dependent variable, the right side is an argument, such as:
> Plot (count~speed,data=fw)

You can use Abine () to add a line to a scatter plot, abline () is actually a command to draw a line, lwd= the width of the set line, lty= set the line type, and a value of 0-6, such as:
> Abline (lm (count~speed,data=fw), lty=3,lwd=2,col= "gray90")

If the data is a multi-column data frame, the scatter graph matrix of all the variables is generated, or only some variables can be selected, but you cannot use the $ at this point, and you want to use the formula form, such as:
Plot (~length+speed+no3,data = Mf)
Pairs (~length+speed+no3,data = Mf)
At this point, the ~ left side can not need dependent variables.

(2)
If there are many variables, using the scatter graph matrix will show all the graphs, but the scatter graph matrix is symmetrical, we only need to display half of it, so we have to customize two functions, such as:

> Panel.hist <-function (x, ...) {
+ Usr<-par ("usr"); On.exit (par (usr))
+ par (usr = c (usr[1:2], 0, 1.5))
+ H <-hist (x, plot = FALSE)
+ breaks<-h$breaks; Nb<-length (breaks)
+ y <-h$counts;
+ y <-y/max (y)
+ rect (breaks[-nb], 0, breaks[-1], y, col= "cyan", ...)
+ }
> Panel.cor <-function (x, y, digits=2, prefix= "", cex.cor, ...) {
+ Usr<-par ("usr"); On.exit (par (usr))
+ par (usr = c (0, 1, 0, 1))
+ R <-abs (cor (x,y,use= "complete.obs"))
+ Txt<-format (c (r, 0.123456789), digits=digits) [1]
+ txt<-paste (prefix, txt, sep= "")
+ If (missing (cex.cor)) cex.cor<-0.8/strwidth (txt)
+ text (0.5, 0.5, txt, CeX = Cex.cor * R)
+ }
> Data (gasoline)
> Pairs (gasoline,diag.panel=panel.hist,lower.panel=panel.smooth,upper.panel=panel.cor)
If there are many variables, using the scatter graph matrix will show all the graphs, but the scatter graph matrix is symmetrical, we only need to display half of it, so we have to customize two Functions.

4. Pareto Chart
The Pareto rule, also known as the 80-20 rule, is based on the Pareto diagram and can be implemented using the Pareto.chart function of the QCC Package.
> Library (QCC)
> reject_freq = C (9,22,15,40,8)
> Names (reject_freq) = C ("No Addr.", "illegible", "Curr. Customer "," No sign. "," other ")
> Reject_freq
> Options (digits=2)
> Pareto.chart (reject_freq)

Data visualization of the R language

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More