R language-Explore two variables

Last Update:2017-12-25 Source: Internet

Author: User

Tags square root ggplot

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective:

Learn two variable analysis flow by exploring file PSEUDO_FACEBOOK.TSV data

Knowledge Points:

1.ggplot syntax

2. How to make a scatter chart

3. How to optimize scatter plots

4. Condition mean value

5. Correlation of variables

6. Sub-hubs diagram

7. Smoothing

Brief introduction:

If you are exploring a single variable using a histogram to represent the relationship between the value and the whole, then using a scatter plot is more appropriate to explore the relationship between the two variables when exploring two variables

Case Analysis:

1. Make a scatter plot based on age and number of friends

#导入ggplot2绘图包
Library (Ggplot2) SETWD ('d:/udacity/Data Analysis Advanced/R')
# Load Data file PF<-Read.csv ('PSEUDO_FACEBOOK.TSV', sep='\ t')
# Use qplot syntax to make scatter plots  qplot (x=age,y=friend_count,data=PF)
# make a scatter plot using the Ggplot syntax   , using Ggplot to make the syntax clearer
   Ggplot (AES (x=age,y=friend_count), DATA=PF) +Geom_point ()

Figure 2-1

2. Transition drawing, because the majority of the points in Figure 2-1 overlap, not very good to distinguish between the age and the number of friends, so use alpha and geom_jitter to adjust

# Geom_jitter Elimination of coincident points # ALPHA=1/20 represents 20 values of 1 points # Xlim (13,90) represents the x-axis value from 13,90Ggplot (Aes (X=age,y=friend_count), DATA=PF) +  geom_jitter (Alpha=1/ +  Xlim (13,90)

Figure 2-2

The use of the 3.coord_trans function can be applied to the axis to make it more visually effective.

# give y-axis friends The number of square root, make it more visual effect Ggplot (Aes (X=age,y=friend_count), DATA=PF) +  geom_point (Alpha=1/20) +   Xlim (13,90) +  coord_trans (y="sqrt")

Figure 2-3

4. Conditional mean value, grouped according to field and then grouped to calculate new Dataframe

# 1. Import Dplyr Package # 2. Use Group_by to group age fields # 3. Use summarise to count the average and median # 4. Then use arrange to sort the library ('dplyr'<-pf    %>% %>%  summarise (friend_count_mean=mean (friend_count),            = Median (friend_ Count),            n=n ())%>%  Arrange (age)

5. The data and the original data are superimposed, according to the graph, we can draw a trend, from the 13-26-year-old friends number in the increase, starting from 26 slowly the number of friends began to decline

#1. Make a scatter plot of age and friends by limiting the value of x, y#2. Asymptote to make the median value#3. Make 0.9 of Asymptote#4. Make 0.5 of Asymptote#5. Make 0.1 of AsymptoteGgplot (Aes (X=age,y=friend_count), DATA=PF) +Geom_point (Alpha=1/10, Position= Position_jitter (h=0), color='Orange')+Coord_cartesian (Xlim= C (13,90), Ylim = C (0,1000)) +Geom_line (Stat='Summary', Fun.y=mean) +Geom_line (Stat='Summary', Fun.y=quantile,fun.args=list (probs=.9), Linetype=2,color='Blue')+Geom_line (Stat='Summary', Fun.y=quantile,fun.args=list (probs=.5), Color='Green')+Geom_line (Stat='Summary', Fun.y=quantile,fun.args=list (probs=.1), Color='Blue', linetype=2)

Figure 2-4

6. Calculating correlations

# using the Cor.test function for calculations, you can actually partition the data set
#Pearson represents the parameter of the correlation strength between two variables, the closer the 1 the correlation is to 'Pearson')
With (subset (PF,AGE<=70), cor.test (Age,friend_count,method = ' Pearson ')

7. Strong correlation parameters, by making www_likes_received and likes_received scatter plots to determine the correlation degree of two variables, we can see that the correlation of two values is very large

# use quantile to limit some extreme values # filtering via Xlim and Ylim # also add a asymptote to see the overall value Ggplot (Aes (x=www_likes_received,y=likes_received), DATA=PF) +  geom_point () +  Xlim (0,quantile (pf$www_likes_received,0.95) + ylim (0,quantile  (pf$likes_received, 0.95)  +'lm', color='red' )

Figure 2-5

8. Make three line charts about the relationship between age and friend number by calculating the average age, average age and age distribution of a month

From this figure we can find the most detail of P1, P2 shows the number of friends for each age group, P3 shows the general trend of age and number of friends

#Library (Gridextra) Pf$age_with_month<-Pf$age + (12-pf$dob_month)/12pf.fc_by_age_months<-PF%>%group_by (age_with_months)%>%Summarise (Friend_count_mean=mean (friend_count), Friend_count_median=median (friend_count), n=n ())%>%Arrange (age_with_months) P1<-Ggplot (Aes (x=age_with_month,y=friend_count_mean), Data=subset (pf.fc_by_age_months,age_with_month<71)) +Geom_line ()+Geom_smooth () P2<-Ggplot (Aes (x=age,y=friend_count_mean), Data=subset (pf.fc_by_age,age<71)) +Geom_line ()+Geom_smooth () P3<-Ggplot (Aes (X=round (AGE/5) *5,y=friend_count), Data=subset (pf,age<71)) +Geom_line (Stat='Summary', fun.y=mean) grid.arrange (P1,p2,p3,ncol=1)

Exercises:

R language-Explore two variables

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More