Fifth: Scatter plot of the R language data visualization

Source: Internet
Author: User
Tags ggplot

Introduction to Scatter plots

A scatter chart is often used to describe the relationship between two contiguous variables, each of which represents each sample in the target dataset.

At the same time, some lines are often fitted in the scatter plot to represent certain models.

plot A basic scatter plot

This example uses the following test data set:

The drawing method is to first call the Ggplot function to select the dataset and indicate the horizontal axis in the AES parameter. Then call the scatter graph function geom_point () to draw a basic scatter plot. The example code for the R language is as follows:

# base function Ggplot (Ah, aes (x = ageyear, y = heightin)) +   # Scatter graph function  geom_point ()

Operation Result:

grouping data based on color and point shapes

This example uses the following test data set:

The drawing method is to set an aesthetic variable in the set of aesthetic parameters of the base function on top of the base scatter graph. You can specify colour or shape two parameters, respectively, different groups are expressed in different color/dot form. The R language sample code (based on color grouping) is as follows:

# Base functions: Colour set Group Ggplot (SAH, aes (x = ageyear, y = heightin, colour = Sex)) +  # Scatter graph function  geom_point ()

Operation Result:

The example code for the R language (based on the dot-form grouping) is as follows:

# Base function: Shape set grouping Ggplot (SAH, aes (x = ageyear, y = heightin, shape = sex)) +  # Scatter graph function  geom_point ()

Operation Result:

Description: Customizable point shape, a total of about 36 kinds of points to choose from. Please refer to the R language Ggplot2 manual for details.

Map a continuous type variable

This example uses the following test data set:

In the previous example, the variable mapped to the grouping was a discrete variable. For continuous variables other than the vertical axis, you can also map to the color depth and point size of the scatter plot. The R Language sample code (binding color) is as follows:

# Base function: Colour bind continuous variable ggplot (SAHW, aes (x = ageyear, y = heightin, colour = weightlb)) +  # Scatter graph function  geom_point ()

Operation Result:

The R Language sample code (binding size) is as follows:

# base function: size bound continuous variable ggplot (SAHW, aes (x = ageyear, y = heightin, size = weightlb)) +  # Scatter graph function  geom_point ()

Operation Result:

Handling Scatter overlaps

This example uses the following test data set:

If the scatter overlap in the figure is more serious, you can visualize the scatter in the scatter plot by setting its transparency. The example code for the R language is as follows:

# Base functions: Size, colour bind continuous variable ggplot (SAHW, aes (x = ageyear, y = heightin, size = weightlb, colour = Sex) +  # Scatter chart function: Alpha setting Scatter transparency  geom_point (alpha =. 5) +  # makes the area of the scatter proportional to the variable value  scale_size_area () +  # Ruler function: Palette sets the color scheme  scale_ Colour_brewer (palette = "Set1")

Operation Result:

add regression model fitting line

This example uses the following test data set:

If you need to add regression model fitting lines in a scatter plot, the most important thing is to call the Stat_smooth () function. The example code for the R language is as follows:

# base functions: Sex bound discrete variable ggplot (SAH, aes (x = ageyear, y = heightin, colour = Sex)) +  # scatter plot  geom_point () +  # Ruler function: Palette Set color scheme  scale_colour_brewer (palette = "Set1") +  # fit regression line segment and confidence field (default 0.95/level parameter customizable)  Geom_smooth ()

Operation Result:

The segment is a curve because the participating fitting model is a local linear regression model. Add "method = LM" to the Geom_smooth () function to fit the classic linear regression. Results such as:

To add a custom model fitting line

This example uses the following test data set:

The above section shows how to fit a sample point with a global/local regression model and show the fitted segment, which is automatically fitted with the Geom_smooth () function provided by Ggplot2 to complete the drawing.

But more often than not, we use the model of other packages (not the GGPLOT2 built-in model) to fit. For this scenario, we need to customize a function. The function accepts parameters such as the model, the longitudinal axis name, the horizontal axis range, the horizontal axis sample point number, and outputs a data frame containing predictor variables and predicted values. The R language implementation code is as follows:

# function function: Output model prediction Result # parameter Description: # model   : Models variable #   Xvar: Predictor Set #   Yvar: Actual set of variables #   xrange: Predictor Range #   samples: Number of Predictor Variables # Function output: Actual value-Predictive value DataSet Predictvals = function (model, Xvar, yvar, xrange = NULL, samples = 100, ...) {    # model is lm/glm/loess one of the words can be automatically generated xrange  if (Is.null (xrange)) {    if (any (model)%in% C ("LM", "GLM"))      xrange = range (Model$model[[xvar])    else if (any (class model)%in% "loess")      xrange = Range (model$x)  }  # Generate and return actuals-Predictive value DataSet  NewData = data.frame (x = seq (xrange[1], xrange[2], length.out = samples))  names (newdata) = X var  newdata[[yvar]] = predict (model, NewData = NewData, ...)  NewData}

   After the model is modeled with other models, the parameters such as the new model are passed into the above function, and then the prediction result data set is obtained. Finally, the new dataset is output as a line chart.

A slightly more complex example is presented below, which divides the data set into two groups according to gender, establishing regression models and plotting their fitting lines. The R language implementation code is as follows:

# modeling Function: Set the model here Make_model = function (data) {  loess (heightin ~ ageyear, data)}# cut the dataset by sex and return to the model list models = dlply (SAH, "sex ",. Fun = Make_model) # make predictions for different datasets (male/female) Predvals = Ldply (models,. Fun = predictvals, Xvar =" Ageyear ", Yvar =" Heightin ") # Painted Data hubs diagram and model fitting line Ggplot (SAH, aes (x = ageyear, y = heightin, colour = Sex)) + geom_point (  ) +  geom_line (data = Predvals )

Operation Result:

adding marginal carpets to scatter plots

This example uses the following test data set:

The method is simple to add the marginal carpet function on the basis of the original scatter plot function. The R language implementation code is as follows:

# base function Ggplot (Faithful, AES (x = eruptions, y = waiting)) +   # Scatter graph function  geom_point () +   # Marginal carpet function  Geom_rug ()

  Operation Result:

add a label to a scatter plot

This example uses the following test data set:

The method of adding labels to scatter plots is also simple, adding text functions on the basis of the original scatter graph function. The R language implementation code is as follows:

# base function Ggplot (Cty_1, aes (x = healthexp, y = infmortality)) +  # Scatter graph function  geom_point () +  # text function: AES parameter: Y offsets the original vertical axis value upward, The label sets the bound text  # to offset the y-axis so that the text is displayed above the sample point instead of the Middle  geom_text (Aes (y = infmortality +. 2, label = Name))

Operation Result:

PS: In this example, we redefine the aesthetic feature set in the text-drawing function. The text draw function will then use the new aesthetic feature set, but the rest of the drawing functions are unchanged.

Fifth: Scatter plot of the R language data visualization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.