Getting Started from Qplot _ggplot2 package

Source: Internet
Author: User

The dataset used in this article is the GGPLOT2 packet Diamonds dataset, which contains information about the price and quality of about 54000 diamonds. This set of data covers four "C"-carat weight (carat), which reflects the quality of diamonds, cut, color and clarity (clarity), and five physical indicators-depth (depth), diamond width (table), x, Y, Z. The following figure:

Another dataset used in this article is a random sample with a capacity of 100 for the original data

Set.seed (1410) #使样本可重复

Dsmall<-diamonds[sample (Nrow (Diamonds), 100),] 1 basic usage

Format: Qplot (X,Y,DATA=DATA1)

Example: Qplot (carat,price,data=diamonds)

Qplot (log (carat), log (price), data=diamonds)

Color, size, shape, and other properties

Qplot (Carat,price,data=dsmall,colour=color)

Qplot (Carat,price,data=dsmall,shape=cut)

Qplot (Carat,price,data=diamonds,aplha=i (1/100)) 2 geometric object Geom

Geom= "Point" to draw a scatter plot

Geom= "Smooth" will fit a smooth curve, and the curve and standard will be mistakenly shown in the diagram

Geom= "BoxPlot" can draw the box line beard map, can summarize a series of points distribution

geom= "path" and geom= "line" can draw lines between data points. A line chart can only create lines from left to right, and the path map may be in any direction.

For one-dimensional distributions, the selection of geometric objects is specified by the type of the variable:

For continuous variables, geom= "Histogram" plots the histogram, geom= "Freqploy" to draw the frequency polygon, geom= "density" to draw the density curve.

For discrete variables, geom= "Bar" draws a bar chart.

Example: adding a smooth curve to a diagram

Qplot (Carat,price,data=dsmall,geom=c ("point", "smooth"))

Qplot (Carat,price,data=diamonds,geom=c ("point", "smooth"))

Do not draw standard error

Qplot (Carat,price,data=dsmall,geom=c ("point", "smooth"), Se=false)

Method parameter Select a different smoothing device (LOESS/GAM/LM/RLM)

Method= "Loess", when n is smaller, is the default option, using the method of local regression. The smoothness of the curve is controlled by the span parameter, and its value range is from 0 (very uneven) to 1 (very smooth)

Example: Qplot (Carat,price,data=dsmall,geom=c ("point", "smooth"), span=0.2)

Loess is not very useful for large data, so when n is more than 1000, the default is another smoothing algorithm.

Method= "Gam", Formula=y~s (x) calls the MGCV package to fit a generalized additive model. For large data, you can use Y~s (x,bs= "CS"), which is the default option when the amount of data exceeds 1000.

method= "LM" fits a linear model by default, but you can specify Formula=y~poly (x,2) to fit a two-time polynomial or load a splines package to use a natural spline: Formula=y~ns (x,2). The second parameter is freedom: the greater the degree of freedom, the greater the fluctuation of the curve.

Example: library (splines)

Qplot (Carat,price,data=dsmall,geom=c ("point", "smooth"), method= "LM", Formula=y~ns (x,5))

Method= "RLM" uses a more robust fitting method to make the result less sensitive to outliers. This method is a part of the mass package, which needs to be loaded before the mass package is used.

Box line diagram and disturbance map

If a dataset contains a category variable and one or more consecutive variables, the above two graphs can describe how the continuous variable will change with the level of the classification variable.

Example: The following figure shows how the price of a diamond varies with the color per carat

Qplot (color,price/carat,data=diamonds,geom= "jitter")

Qplot (color,price/carat,data=diamonds,geom= "BoxPlot")

Histogram and density graphs

Histograms and density graphs can show the distribution of a single variable, as shown in the following two graphs showing the histogram and density of the diamond's weight.

Example: Qplot (carat,data=diamonds,geom= "density")

Qplot (carat,data=diamonds,geom= "histogram")

Where the histogram can set the size of the group spacing by setting the Binwidth parameter.

To compare distributions across groups, simply add a graphical map, such as:

Qplot (carat,data=diamonds,geom= "density", colour=color)

Qplot (carat,data=diamonds,geom= "Histogram", Binwidth=0.1,xlim=c (0,3), Fill=color)

Bar chart

In the case of discrete variables, the bar title is similar to the histogram, and the method of drawing is geom= "bar". The bar Geometry object calculates the number of observations under each level, so there is no need to summarize the data beforehand. You can use the weight geometry object if the data has been summarized, or if you want to group the data in a different way, such as by grouping consecutive variables.

Example: The left picture is a plain bar of diamond color, and the right is a weighted bar chart by weight

Qplot (color,data=diamonds,geom= "Bar")

Qplot (color,data=diamonds,geom= "Bar", Weight=carat) +

Scale_y_continuous ("carat")

Line and path diagrams in time series

The x-axis of a line chart is generally time, showing the case of a single variable over time, and the path map shows the two variables that are linked over time, and the time is reflected in the order of the dots.

Using the economics dataset, which contains economic data for the past 40 years in the United States, the left figure shows the change in unemployment, and the right figure indicates the median number of unemployed weeks.

Qplot (date,unemploy/pop,data=economics,geom= "line")

Qplot (date,uempmed,data=economics,geom= "line")

The figure below shows the path of changes in the length of time between unemployment and unemployment. There are many intersections on the left, the direction of time change is not obvious, the right picture of the year mapped to the colour attribute properties, the direction of time is more obvious.

Qplot (Unemploy/pop,uempmed,data=economics,geom=c ("point", "path"))

Year<-function (x) as. Posixlt (x) $year +1900

Qplot (Unemploy/pop,uempmed,data=economics,geom=c ("point", "path"), Colour=year (date)

3-Faceted

Using graphical attributes (colors and shapes) to compare different groupings before, it can draw all the groups on the same diagram. A facet is another way to do this: it divides the data into subsets, and then creates a matrix of graphics, drawing each subset into the panes of the graphics matrix.

Qplot () The default facet aspect is to split the graph into panes, which can be specified by an expression such as Row_var-col_var. You can specify any number of row and column variables, but when the number of variables exceeds two, the resulting graph can be so large that it is not suitable for display on the screen. If you want only one row of columns, you can use the. As a placeholder, such as row_var~. Creates a single row of multiple-row graphics matrices.

Example: The histogram of the weight of the color condition, the right figure is plotted in proportion, which makes the comparison of the distribution of different groups is not affected by the size of the group sample.

Qplot (carat,data=diamonds,facets=color~.,geom= "histogram"),

Binwidth=0.1,xlim=c (0,3))

Qplot (Carat,.. Density..,data=diamonds,facets=color~.,

Geom= "Histogram", Binwidth=0.1,xlim=c (0,3)

4 Other options

There are other options in Qplot to control the appearance of the graphic. These parameters are the same as those in plot.

Xlim,ylim: Sets the display interval for the x and Y axes, example xlim=c (0,20)

LOG: A character vector that indicates which axis should be logarithmic, example log= "X" for the x axis, and log= "XY" for both the x-axis and the y-axis.

Main: The theme of the graph, the argument can be a string, or it can be an expression

Xlab,ylab: Sets the label text for the x and Y axes, either as a string or as a mathematical expression

Cases:

Qplot (

Carat,price,data=dsmall,

Xlab= "Price ($)", ylab= "weight (carats)",

main= "Price-weight Relationship"

)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.