For more information, please pay attention to the public number "Big Data wind Control"
The code is encapsulated in the function Plotks_n, Pred_var is the prediction result, can be a score or probability form; Labels_var is a good or bad label, with a value of 1 or 0, 1 for a bad customer, 0 for an excellent customer, and a descending for controlling data in descending order of default probabilities, If the Pred_var is a score, then descending=0, if Pred_var is a probability form, then descending=1;n indicates that the KS value is calculated after dividing n parts after the data is sorted in descending order of risk.
The Plotks_n function returns a list of the elements in the list that are the KS maximum, the person with the maximum value of KS Hundreds of points, the KS curve object, and the KS data frame.
The code is as follows:
#################### Plotks_n ################################
Plotks_n<-function (Pred_var, Labels_var, Descending, N) {
# Pred_var is prop:descending=1
# Pred_var is score:descending=0
Library (DPLYR)
df<-data.frame (Pred=pred_var, Labels=labels_var)
if (descending==1) {
Df1<-arrange (DF, desc (Pred), labels)
}else if (descending==0) {
Df1<-arrange (DF, Pred, labels)
}
Df1$good1<-ifelse (df1$labels==0,1,0)
Df1$bad1<-ifelse (df1$labels==1,1,0)
Df1$cum_good1<-cumsum (DF1$GOOD1)
Df1$cum_bad1<-cumsum (DF1$BAD1)
Df1$rate_good1<-df1$cum_good1/sum (DF1$GOOD1)
Df1$rate_bad1<-df1$cum_bad1/sum (DF1$BAD1)
if (descending==1) {
Df2<-arrange (DF, desc (Pred), desc (labels))
}else if (descending==0) {
Df2<-arrange (DF, Pred, desc (labels))
}
Df2$good2<-ifelse (df2$labels==0,1,0)
Df2$bad2<-ifelse (df2$labels==1,1,0)
Df2$cum_good2<-cumsum (DF2$GOOD2)
Df2$cum_bad2<-cumsum (DF2$BAD2)
Df2$rate_good2<-df2$cum_good2/sum (DF2$GOOD2)
Df2$rate_bad2<-df2$cum_bad2/sum (DF2$BAD2)
rate_good<-(DF1$RATE_GOOD1+DF2$RATE_GOOD2)/2
rate_bad<-(df1$rate_bad1+df2$rate_bad2)/2
Df_ks<-data.frame (Rate_good,rate_bad)
Df_ks$ks<-df_ks$rate_bad-df_ks$rate_good
l<-Nrow (Df_ks)
if (n>l) n<-L
df_ks$tile<-1:l
qus<-quantile (1:l, probs = seq (0,1, 1/n)) [-1]
qus<-Ceiling (Qus)
df_ks<-Df_ks[df_ks$tile%in%qus,]
df_ks$tile<-df_ks$tile/l
Df_0<-data.frame (rate_good=0,rate_bad=0,ks=0,tile=0)
Df_ks<-rbind (Df_0, Df_ks)
M_ks<-max (Df_ks$ks)
Pop<-df_ks$tile[which (Df_ks$ks==m_ks)]
M_good<-df_ks$rate_good[which (Df_ks$ks==m_ks)]
M_bad<-df_ks$rate_bad[which (Df_ks$ks==m_ks)]
Library (GGPLOT2)
Plotks<-ggplot (Df_ks) +
Geom_line (Aes (Tile,rate_bad), colour= "Red2", size=1.2) +
Geom_line (Aes (Tile,rate_good), colour= "Blue3", size=1.2) +
Geom_line (Aes (TILE,KS), colour= "Forestgreen", size=1.2) +
Geom_vline (xintercept=pop,linetype=2,colour= "Gray", size=0.6) +
Geom_hline (yintercept=m_ks,linetype=2,colour= "Forestgreen", size=0.6) +
Geom_hline (yintercept=m_good,linetype=2,colour= "Blue3", size=0.6) +
Geom_hline (yintercept=m_bad,linetype=2,colour= "Red2", size=0.6) +
Annotate ("text", x = 0.5, y = 1.05, Label=paste ("ks=", Round (M_ks, 4), "at Pop=", round (Pop, 4)), size=4, alpha=0.8) +
Scale_x_continuous (Breaks=seq (0,1,.2)) +
Scale_y_continuous (Breaks=seq (0,1,.2)) +
Xlab ("of total Population") +
Ylab ("of total bad/good") +
Ggtitle (label= "Ks-chart") +
THEME_BW () +
Theme
Plot.title=element_text (colour= "gray24", size=12,face= "bold"),
Plot.background = Element_rect (fill = "gray90"),
Axis.title=element_text (size=10),
Axis.text=element_text (colour= "gray35")
)
Result<-list (M_ks=m_ks,pop=pop,plotks=plotks,df_ks=df_ks)
Return (Result)
}
######################### over #######################################
Next, take the actual data as an example to see the result of the function's operation.
Pred_train is the predicted result of modeling, here is the probability form:
> Pred_train
[1] 0.40418112 0.35814193 0.45220572 0.53482002 0.12923573 ...
Labels_train is good or bad tag:
> Labels_train
[1] 0 0 0 0 0 ...
The results of the function run are stored in the Train_ks:
Train_ks<-plotks_n (Pred_train, Labels_train, 1, 100)
Let's look at each element in Train_ks:
1, KS maximum value
> Train_ks$m_ks
[1] 0.4492765
2, KS to take the maximum number of people hundreds of points
> Train_ks$pop
[1] 0.3803191
3. KS Curve Object
R language to draw KS curve; R Language Implementation KS curve