The addition of p-value and significance markers in the R language Visual learning notes

Source: Internet
Author: User
Tags ggplot

The addition of p-value and significance markers in the R language Visual learning notes

Http://www.jianshu.com/p/b7274afff14f?from=timeline

In the previous article, I mentioned how to add the GGPUBR package to the ggplot diagram p-value and the significance of the markup, this article will be described in detail. Demo with Data set Toothgrowth

#先加载包library(ggpubr)#加载数据集ToothGrowthdata("ToothGrowth")head(ToothGrowth)
##    len  supp  dose## 1  4.2   VC   0.5## 2  11.5  VC   0.5## 3  7.3   VC   0.5## 4  5.8   VC   0.5## 5  6.4   VC   0.5## 6  10.0  VC   0.5
Comparison method

The main methods of comparison used in R are the following:

Method R function Description
T-test T.test () Compare two groups (parameters)
Wilcoxon test Wilcox.test () Compare two groups (non-parametric)
ANOVA AoV () or ANOVA () Compare multiple groups (parameters)
Kruskal-wallis Kruskal.test () Compare multiple groups (non-parametric)

A variety of comparison methods followed by time one by one to explain.

Add to p-value

The main use of the GGPUBR package of two functions:

    • compare_means(): One or more groups can be compared
    • stat_compare_mean(): Auto Add p-value , significant mark to Ggplot chart
Compare_means () function

The function is mainly used as follows:

compare_means(formula, data, method = "wilcox.test", paired = FALSE,  group.by = NULL, ref.group = NULL, ...)

Comments:

    • formula: Shape x~group , where x is a numeric variable, group is a factor, can be one or more
    • data: Data Set
    • method: The method of comparison, the default is "wilcox.test" , the other optional methods are: "t.test" , "anova" ,"kruskal.test"
    • paired: Do you want to paired test ( TRUE or FALSE )
    • group_by: Whether to group when comparing
    • ref.group: Whether you need to specify a reference group
Stat_compare_means () function

Main usage:

stat_compare_means(mapping = NULL, comparisons = NULL hide.ns = FALSE,                   label = NULL,  label.x = NULL, label.y = NULL,  ...)

Comments:

    • mapping: aes() a set of aesthetic mappings created by
    • comparisons: Specify groups that need to be compared and added p-value , marked with significance
    • hide.ns: Do you want to display the significance mark NS
    • label: The type of the significance tag, Optional: p.signif (significant mark), p.format (display p-value )
    • label.x, label.y : Significant label adjustment
    • ...: Other parameters
Compare two separate sets of
compare_means(len~supp, data=ToothGrowth)

Results explained:

  • .y: The y variable used in the test
  • p:p-value
  • p.adj: Adjusted p-value . Default isp.adjust.method="holm"
  • p.format: After rounding thep-value
  • p.signif: Level of significance
  • method: A method for statistical testing to draw a box line diagram
    p <- ggboxplot(ToothGrowth, x="supp", y="len", color = "supp", palette = "jco", add = "jitter")#添加p-valuep+stat_compare_means()
    #使用其他统计检验方法p+stat_compare_means(method = "t.test")

    The above significance marks can be label.x adjusted by, label.y , hjust and vjust
    The significance tag can be aes() changed by mapping:
    • aes(label=..p.format..)Or aes(lebel=paste0("p=",..p.format..)) : Show only p-value , do not show statistical test method
    • aes(label=..p.signif..): Show only significant levels
    • aes(label=paste0(..method..,"\n", "p=",..p.format..)): p-value displayed with a significant level of the branch

Give me a chestnut:

p+stat_compare_means(aes(label=..p.signif..), label.x = 1.5, label.y = 40)


You can also specify the label as a character vector, do not map, just p.signif both ends of the. Remove and then

p+stat_compare_means(label = "p.signif", label.x = 1.5, label.y = 40)
Comparison of two paired sample
compare_means(len~supp, data=ToothGrowth, paired = TRUE)
Visualize with ggpaired ()
ggpaired(ToothGrowth, x="supp", y="len", color = "supp", line.color = "gray", line.size = 0.4, palette = "jco")+ stat_compare_means(paired = TRUE)
Multi-Group comparison

Global Test

compare_means(len~dose, data=ToothGrowth, method = "anova")
Visualization of
ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+stat_compare_means()
#使用其他的方法ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+ stat_compare_means(method = "anova")

Pairwise comparisons: If the grouping variable contains more than two levels, the pairwise test is automatically performed, and the default method is "Wilcox.test"

compare_means(len~dose, data=ToothGrowth)
#可以指定比较哪些组my_comparisons <- list(c("0.5", "1"), c("1", "2"), c("0.5", "2"))ggboxplot(ToothGrowth, x="dose", y="len", color = "dose",palette = "jco")+stat_compare_means(comparisons=my_comparisons)+ # Add pairwise comparisons p-value stat_compare_means(label.y = 50) # Add global p-value


You can change the position of the label by modifying the parameter Label.y

ggboxplot(ToothGrowth, x="dose", y="len", color = "dose",palette = "jco")+stat_compare_means(comparisons=my_comparisons, label.y = c(29, 35, 40))+ # Add pairwise comparisons p-value stat_compare_means(label.y = 45) # Add global p-value


As for the two groups that are connected by adding lines to the comparison, this feature has been implemented by the package ggsignif

##设定参考组compare_means(len~dose, data=ToothGrowth, ref.group = "0.5",  #以dose=0.5组为参考组 method = "t.test" )
#可视化ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+ stat_compare_means(method = "anova", label.y = 40)+ # Add global p-valuestat_compare_means(label = "p.signif", method = "t.test", ref.group = "0.5") # Pairwise comparison against reference

The reference group can also be set to. All. That is, all averages

compare_means(len~dose, data=ToothGrowth, ref.group = ".all.", method = "t.test")
#可视化ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+stat_compare_means(method = "anova", label.y = 40)+# Add global p-valuestat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.")#Pairwise comparison against all


Next, use the dataset myeloma in the Survminer package to explain why sometimes we need to set the Ref.group to. All.

library(survminer)#没安装的先安装再加载data("myeloma")head(myeloma)

We will draw the expression profile of the DEPDC1 gene according to the patient's group, to see if there are significant differences between the groups, we can compare between the 7 groups, but there are so many combinations of group comparisons, so we can compare each group in the 7 group with the full average. See if the DEPDC1 gene is expressed or low in different groups.

compare_means(DEPDC1~molecular_group, data = myeloma, ref.group = ".all.", method = "t.test")
#可视化DEPDC1基因表达谱ggboxplot(myeloma, x="molecular_group", y="DEPDC1", color = "molecular_group", add = "jitter", legend="none")+ rotate_x_text(angle = 45)+ geom_hline(yintercept = mean(myeloma$DEPDC1), linetype=2)+# Add horizontal line at base mean stat_compare_means(method = "anova", label.y = 1600)+ # Add global annova p-value stat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.")# Pairwise comparison against all

It can be seen that the DEPDC1 gene is significantly expressed in the proliferation group, while in hyperdiploid and low bone disease significantly lower expression

We can also remove the non-significant mark NS by simply hide.ns=true the parameters

ggboxplot(myeloma, x="molecular_group", y="DEPDC1", color = "molecular_group", add = "jitter", legend="none")+rotate_x_text(angle = 45)+ geom_hline(yintercept = mean(myeloma$DEPDC1), linetype=2)+# Add horizontal line at base mean stat_compare_means(method = "anova", label.y = 1600)+ # Add global annova p-value stat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.", hide.ns = TRUE)# Pairwise comparison against all
Multiple grouping variables

A statistical test is performed after grouping by another variable, such as grouping by variable dose:

compare_means(len~supp, data=ToothGrowth, group.by = "dose")
#可视化p <- ggboxplot(ToothGrowth, x="supp", y="len", color = "supp", palette = "jco", add = "jitter", facet.by = "dose", short.panel.labs = FALSE)#按dose进行分面#label只绘制p-valuep+stat_compare_means(label = "p.format")
#label绘制显著性水平p+stat_compare_means(label = "p.signif", label.x = 1.5)
#将所有箱线图绘制在一个panel中p <- ggboxplot(ToothGrowth, x="dose", y="len", color = "supp", palette = "jco", add = "jitter")p+stat_compare_means(aes(group=supp))
#只显示p-valuep+stat_compare_means(aes(group=supp), label = "p.format")
#显示显著性水平p+stat_compare_means(aes(group=supp), label = "p.signif")
进行paired sample检验compare_means(len~supp, data=ToothGrowth, group.by = "dose", paired = TRUE)
#可视化p <- ggpaired(ToothGrowth, x="supp", y="len", color = "supp", palette = "jco", line.color="gray", line.size=0.4, facet.by = "dose", short.panel.labs = FALSE)#按dose分面#只显示p-valuep+stat_compare_means(label = "p.format", paired = TRUE)
Other graphical bar and line graphs (a grouping variable)
#有误差棒的条形图,实际上我以前的文章里有纯粹用ggplot2实现ggbarplot(ToothGrowth, x="dose", y="len", add = "mean_se")+ stat_compare_means()+ stat_compare_means(ref.group = "0.5", label = "p.signif", label.y = c(22, 29))
#有误差棒的线图ggline(ToothGrowth, x="dose", y="len", add = "mean_se")+stat_compare_means()+ stat_compare_means(ref.group = "0.5", label = "p.signif", label.y = c(22, 29))
Bar and Line charts (two grouping variables)
ggbarplot(ToothGrowth, x="dose", y="len", add = "mean_se", color = "supp", palette = "jco", position = position_dodge(0.8))+ stat_compare_means(aes(group=supp), label = "p.signif", label.y = 29)
ggline(ToothGrowth, x="dose", y="len", add = "mean_se", color = "supp", palette = "jco")+ stat_compare_means(aes(group=supp), label = "p.signif", label.y = c(16, 25, 29))
Sessioninfo
Sessioninfo () # # R version 3.4.0 (2017-04-21) # # platform:x86_64-w64-mingw32/x64 (64-bit) # # Running Under:windows 8.1 x64 (Build 9600) # # # Matrix products:default## # locale:## [1] lc_collate=chinese (Simplified) _china.936 # # [2] Lc_ctype=ch Inese (Simplified) _china.936 # # [3] Lc_monetary=chinese (Simplified) _china.936## [4] lc_numeric=c # # [5] LC_TIME=Chinese (Simplified) _china.936 # # # # attached base packages:## [1] stats graphics grdevices utils datasets Methods Base # # # # Othe R attached packages:## [1] survminer_0.4.0 ggpubr_0.1.3 magrittr_1.5 ggplot2_2.2.1 # # # Loaded via a namespace (and not a ttached): # # [1] rcpp_0.12.11 compiler_3.4.0 plyr_1.8.4## [4] tools_3.4.0 digest_0.6.12 evaluate_0.10 # # [7] tibble_1.3.3 G  table_0.2.0 nlme_3.1-131 # [16] lattice_0.20-35 rlang_0.1.1 matrix_1.2-10 # [psych_1.7.5] ggsci_2.4 DBI_0.6-1 Cmprsk_2.2-7 yaml_2.1.14 parallel_3.4.0 # # [[] gridextra_2.2.1 dplyr_0.5.0 stringr_1.2.0 # # [] knitr_1.16 survMisc_0.5 .4 rprojroot_1.2 # # [25]grid_3.4.0 data.table_1.10.4 kmsurv_0.1-5 # # [[] r6_2.2.1 km.ci_0.5-2 survival_2.41-3 # # [] foreign_0.8-68 rmarkdown_1 .5 reshape2_1.4.2 # # [tidyr_0.6.3] purrr_0.2.2.2 splines_3.4.0 # # [PNS] backports_1.1.0 scales_0.4.1 htmltools_0.3.6 # # [assertthat_0.2.0] mnormt_1.5-5 xtable_1.8-2 # # [] colorspace_1.3-2 ggsignif_0.2.0 labeling_0.3 # # [+] stringi_1.1. 5 lazyeval_0.2.0 munsell_0.4.3 # # [broom_0.4.2] zoo_1.8-0

R Language Visual Learning notes add p-value and significance tags

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.