The addition of p-value and significance markers in the R language Visual learning notes
Http://www.jianshu.com/p/b7274afff14f?from=timeline
In the previous article, I mentioned how to add the GGPUBR package to the ggplot
diagram p-value
and the significance of the markup, this article will be described in detail. Demo with Data set Toothgrowth
#先加载包library(ggpubr)#加载数据集ToothGrowthdata("ToothGrowth")head(ToothGrowth)
## len supp dose## 1 4.2 VC 0.5## 2 11.5 VC 0.5## 3 7.3 VC 0.5## 4 5.8 VC 0.5## 5 6.4 VC 0.5## 6 10.0 VC 0.5
Comparison method
The main methods of comparison used in R are the following:
Method |
R function |
Description |
T-test |
T.test () |
Compare two groups (parameters) |
Wilcoxon test |
Wilcox.test () |
Compare two groups (non-parametric) |
ANOVA |
AoV () or ANOVA () |
Compare multiple groups (parameters) |
Kruskal-wallis |
Kruskal.test () |
Compare multiple groups (non-parametric) |
A variety of comparison methods followed by time one by one to explain.
Add to
p-value
The main use of the GGPUBR package of two functions:
compare_means()
: One or more groups can be compared
stat_compare_mean()
: Auto Add p-value
, significant mark to Ggplot chart
Compare_means () function
The function is mainly used as follows:
compare_means(formula, data, method = "wilcox.test", paired = FALSE, group.by = NULL, ref.group = NULL, ...)
Comments:
formula
: Shape x~group
, where x is a numeric variable, group
is a factor, can be one or more
data
: Data Set
method
: The method of comparison, the default is "wilcox.test"
, the other optional methods are: "t.test"
, "anova"
,"kruskal.test"
paired
: Do you want to paired test
( TRUE
or FALSE
)
group_by
: Whether to group when comparing
ref.group
: Whether you need to specify a reference group
Stat_compare_means () function
Main usage:
stat_compare_means(mapping = NULL, comparisons = NULL hide.ns = FALSE, label = NULL, label.x = NULL, label.y = NULL, ...)
Comments:
mapping
: aes()
a set of aesthetic mappings created by
comparisons
: Specify groups that need to be compared and added p-value
, marked with significance
hide.ns
: Do you want to display the significance mark NS
label
: The type of the significance tag, Optional: p.signif
(significant mark), p.format
(display p-value
)
label.x
, label.y
: Significant label adjustment
- ...: Other parameters
Compare two separate sets of
compare_means(len~supp, data=ToothGrowth)
Results explained:
.y
: The y variable used in the test
p
:p-value
p.adj
: Adjusted p-value
. Default isp.adjust.method="holm"
p.format
: After rounding thep-value
p.signif
: Level of significance
method
: A method for statistical testing to draw a box line diagramp <- ggboxplot(ToothGrowth, x="supp", y="len", color = "supp", palette = "jco", add = "jitter")#添加p-valuep+stat_compare_means()
#使用其他统计检验方法p+stat_compare_means(method = "t.test")
The above significance marks can be label.x
adjusted by, label.y
, hjust
and vjust
The significance tag can be aes()
changed by mapping:
aes(label=..p.format..)
Or aes(lebel=paste0("p=",..p.format..))
: Show only p-value
, do not show statistical test method
aes(label=..p.signif..)
: Show only significant levels
aes(label=paste0(..method..,"\n", "p=",..p.format..))
: p-value
displayed with a significant level of the branch
Give me a chestnut:
p+stat_compare_means(aes(label=..p.signif..), label.x = 1.5, label.y = 40)
You can also specify the label as a character vector, do not map, just p.signif both ends of the. Remove and then
p+stat_compare_means(label = "p.signif", label.x = 1.5, label.y = 40)
Comparison of two paired sample
compare_means(len~supp, data=ToothGrowth, paired = TRUE)
Visualize with ggpaired ()
ggpaired(ToothGrowth, x="supp", y="len", color = "supp", line.color = "gray", line.size = 0.4, palette = "jco")+ stat_compare_means(paired = TRUE)
Multi-Group comparison
Global Test
compare_means(len~dose, data=ToothGrowth, method = "anova")
Visualization of
ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+stat_compare_means()
#使用其他的方法ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+ stat_compare_means(method = "anova")
Pairwise comparisons: If the grouping variable contains more than two levels, the pairwise test is automatically performed, and the default method is "Wilcox.test"
compare_means(len~dose, data=ToothGrowth)
#可以指定比较哪些组my_comparisons <- list(c("0.5", "1"), c("1", "2"), c("0.5", "2"))ggboxplot(ToothGrowth, x="dose", y="len", color = "dose",palette = "jco")+stat_compare_means(comparisons=my_comparisons)+ # Add pairwise comparisons p-value stat_compare_means(label.y = 50) # Add global p-value
You can change the position of the label by modifying the parameter Label.y
ggboxplot(ToothGrowth, x="dose", y="len", color = "dose",palette = "jco")+stat_compare_means(comparisons=my_comparisons, label.y = c(29, 35, 40))+ # Add pairwise comparisons p-value stat_compare_means(label.y = 45) # Add global p-value
As for the two groups that are connected by adding lines to the comparison, this feature has been implemented by the package ggsignif
##设定参考组compare_means(len~dose, data=ToothGrowth, ref.group = "0.5", #以dose=0.5组为参考组 method = "t.test" )
#可视化ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+ stat_compare_means(method = "anova", label.y = 40)+ # Add global p-valuestat_compare_means(label = "p.signif", method = "t.test", ref.group = "0.5") # Pairwise comparison against reference
The reference group can also be set to. All. That is, all averages
compare_means(len~dose, data=ToothGrowth, ref.group = ".all.", method = "t.test")
#可视化ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+stat_compare_means(method = "anova", label.y = 40)+# Add global p-valuestat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.")#Pairwise comparison against all
Next, use the dataset myeloma in the Survminer package to explain why sometimes we need to set the Ref.group to. All.
library(survminer)#没安装的先安装再加载data("myeloma")head(myeloma)
We will draw the expression profile of the DEPDC1 gene according to the patient's group, to see if there are significant differences between the groups, we can compare between the 7 groups, but there are so many combinations of group comparisons, so we can compare each group in the 7 group with the full average. See if the DEPDC1 gene is expressed or low in different groups.
compare_means(DEPDC1~molecular_group, data = myeloma, ref.group = ".all.", method = "t.test")
#可视化DEPDC1基因表达谱ggboxplot(myeloma, x="molecular_group", y="DEPDC1", color = "molecular_group", add = "jitter", legend="none")+ rotate_x_text(angle = 45)+ geom_hline(yintercept = mean(myeloma$DEPDC1), linetype=2)+# Add horizontal line at base mean stat_compare_means(method = "anova", label.y = 1600)+ # Add global annova p-value stat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.")# Pairwise comparison against all
It can be seen that the DEPDC1 gene is significantly expressed in the proliferation group, while in hyperdiploid and low bone disease significantly lower expression
We can also remove the non-significant mark NS by simply hide.ns=true the parameters
ggboxplot(myeloma, x="molecular_group", y="DEPDC1", color = "molecular_group", add = "jitter", legend="none")+rotate_x_text(angle = 45)+ geom_hline(yintercept = mean(myeloma$DEPDC1), linetype=2)+# Add horizontal line at base mean stat_compare_means(method = "anova", label.y = 1600)+ # Add global annova p-value stat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.", hide.ns = TRUE)# Pairwise comparison against all
Multiple grouping variables
A statistical test is performed after grouping by another variable, such as grouping by variable dose:
compare_means(len~supp, data=ToothGrowth, group.by = "dose")
#可视化p <- ggboxplot(ToothGrowth, x="supp", y="len", color = "supp", palette = "jco", add = "jitter", facet.by = "dose", short.panel.labs = FALSE)#按dose进行分面#label只绘制p-valuep+stat_compare_means(label = "p.format")
#label绘制显著性水平p+stat_compare_means(label = "p.signif", label.x = 1.5)
#将所有箱线图绘制在一个panel中p <- ggboxplot(ToothGrowth, x="dose", y="len", color = "supp", palette = "jco", add = "jitter")p+stat_compare_means(aes(group=supp))
#只显示p-valuep+stat_compare_means(aes(group=supp), label = "p.format")
#显示显著性水平p+stat_compare_means(aes(group=supp), label = "p.signif")
进行paired sample检验compare_means(len~supp, data=ToothGrowth, group.by = "dose", paired = TRUE)
#可视化p <- ggpaired(ToothGrowth, x="supp", y="len", color = "supp", palette = "jco", line.color="gray", line.size=0.4, facet.by = "dose", short.panel.labs = FALSE)#按dose分面#只显示p-valuep+stat_compare_means(label = "p.format", paired = TRUE)
Other graphical bar and line graphs (a grouping variable)
#有误差棒的条形图,实际上我以前的文章里有纯粹用ggplot2实现ggbarplot(ToothGrowth, x="dose", y="len", add = "mean_se")+ stat_compare_means()+ stat_compare_means(ref.group = "0.5", label = "p.signif", label.y = c(22, 29))
#有误差棒的线图ggline(ToothGrowth, x="dose", y="len", add = "mean_se")+stat_compare_means()+ stat_compare_means(ref.group = "0.5", label = "p.signif", label.y = c(22, 29))
Bar and Line charts (two grouping variables)
ggbarplot(ToothGrowth, x="dose", y="len", add = "mean_se", color = "supp", palette = "jco", position = position_dodge(0.8))+ stat_compare_means(aes(group=supp), label = "p.signif", label.y = 29)
ggline(ToothGrowth, x="dose", y="len", add = "mean_se", color = "supp", palette = "jco")+ stat_compare_means(aes(group=supp), label = "p.signif", label.y = c(16, 25, 29))
Sessioninfo
Sessioninfo () # # R version 3.4.0 (2017-04-21) # # platform:x86_64-w64-mingw32/x64 (64-bit) # # Running Under:windows 8.1 x64 (Build 9600) # # # Matrix products:default## # locale:## [1] lc_collate=chinese (Simplified) _china.936 # # [2] Lc_ctype=ch Inese (Simplified) _china.936 # # [3] Lc_monetary=chinese (Simplified) _china.936## [4] lc_numeric=c # # [5] LC_TIME=Chinese (Simplified) _china.936 # # # # attached base packages:## [1] stats graphics grdevices utils datasets Methods Base # # # # Othe R attached packages:## [1] survminer_0.4.0 ggpubr_0.1.3 magrittr_1.5 ggplot2_2.2.1 # # # Loaded via a namespace (and not a ttached): # # [1] rcpp_0.12.11 compiler_3.4.0 plyr_1.8.4## [4] tools_3.4.0 digest_0.6.12 evaluate_0.10 # # [7] tibble_1.3.3 G table_0.2.0 nlme_3.1-131 # [16] lattice_0.20-35 rlang_0.1.1 matrix_1.2-10 # [psych_1.7.5] ggsci_2.4 DBI_0.6-1 Cmprsk_2.2-7 yaml_2.1.14 parallel_3.4.0 # # [[] gridextra_2.2.1 dplyr_0.5.0 stringr_1.2.0 # # [] knitr_1.16 survMisc_0.5 .4 rprojroot_1.2 # # [25]grid_3.4.0 data.table_1.10.4 kmsurv_0.1-5 # # [[] r6_2.2.1 km.ci_0.5-2 survival_2.41-3 # # [] foreign_0.8-68 rmarkdown_1 .5 reshape2_1.4.2 # # [tidyr_0.6.3] purrr_0.2.2.2 splines_3.4.0 # # [PNS] backports_1.1.0 scales_0.4.1 htmltools_0.3.6 # # [assertthat_0.2.0] mnormt_1.5-5 xtable_1.8-2 # # [] colorspace_1.3-2 ggsignif_0.2.0 labeling_0.3 # # [+] stringi_1.1. 5 lazyeval_0.2.0 munsell_0.4.3 # # [broom_0.4.2] zoo_1.8-0
R Language Visual Learning notes add p-value and significance tags