R language to draw KS curve; R Language Implementation KS curve

Source: Internet
Author: User

For more information, please pay attention to the public number "Big Data wind Control"

The code is encapsulated in the function Plotks_n, Pred_var is the prediction result, can be a score or probability form; Labels_var is a good or bad label, with a value of 1 or 0, 1 for a bad customer, 0 for an excellent customer, and a descending for controlling data in descending order of default probabilities, If the Pred_var is a score, then descending=0, if Pred_var is a probability form, then descending=1;n indicates that the KS value is calculated after dividing n parts after the data is sorted in descending order of risk.

The Plotks_n function returns a list of the elements in the list that are the KS maximum, the person with the maximum value of KS Hundreds of points, the KS curve object, and the KS data frame.

The code is as follows:

#################### Plotks_n ################################
Plotks_n<-function (Pred_var, Labels_var, Descending, N) {
# Pred_var is prop:descending=1
# Pred_var is score:descending=0
Library (DPLYR)

df<-data.frame (Pred=pred_var, Labels=labels_var)

if (descending==1) {
Df1<-arrange (DF, desc (Pred), labels)
}else if (descending==0) {
Df1<-arrange (DF, Pred, labels)
}

Df1$good1<-ifelse (df1$labels==0,1,0)
Df1$bad1<-ifelse (df1$labels==1,1,0)
Df1$cum_good1<-cumsum (DF1$GOOD1)
Df1$cum_bad1<-cumsum (DF1$BAD1)
Df1$rate_good1<-df1$cum_good1/sum (DF1$GOOD1)
Df1$rate_bad1<-df1$cum_bad1/sum (DF1$BAD1)

if (descending==1) {
Df2<-arrange (DF, desc (Pred), desc (labels))
}else if (descending==0) {
Df2<-arrange (DF, Pred, desc (labels))
}

Df2$good2<-ifelse (df2$labels==0,1,0)
Df2$bad2<-ifelse (df2$labels==1,1,0)
Df2$cum_good2<-cumsum (DF2$GOOD2)
Df2$cum_bad2<-cumsum (DF2$BAD2)
Df2$rate_good2<-df2$cum_good2/sum (DF2$GOOD2)
Df2$rate_bad2<-df2$cum_bad2/sum (DF2$BAD2)

rate_good<-(DF1$RATE_GOOD1+DF2$RATE_GOOD2)/2
rate_bad<-(df1$rate_bad1+df2$rate_bad2)/2
Df_ks<-data.frame (Rate_good,rate_bad)

Df_ks$ks<-df_ks$rate_bad-df_ks$rate_good

l<-Nrow (Df_ks)
if (n>l) n<-L
df_ks$tile<-1:l
qus<-quantile (1:l, probs = seq (0,1, 1/n)) [-1]
qus<-Ceiling (Qus)
df_ks<-Df_ks[df_ks$tile%in%qus,]
df_ks$tile<-df_ks$tile/l
Df_0<-data.frame (rate_good=0,rate_bad=0,ks=0,tile=0)
Df_ks<-rbind (Df_0, Df_ks)

M_ks<-max (Df_ks$ks)
Pop<-df_ks$tile[which (Df_ks$ks==m_ks)]
M_good<-df_ks$rate_good[which (Df_ks$ks==m_ks)]
M_bad<-df_ks$rate_bad[which (Df_ks$ks==m_ks)]

Library (GGPLOT2)
Plotks<-ggplot (Df_ks) +
Geom_line (Aes (Tile,rate_bad), colour= "Red2", size=1.2) +
Geom_line (Aes (Tile,rate_good), colour= "Blue3", size=1.2) +
Geom_line (Aes (TILE,KS), colour= "Forestgreen", size=1.2) +

Geom_vline (xintercept=pop,linetype=2,colour= "Gray", size=0.6) +
Geom_hline (yintercept=m_ks,linetype=2,colour= "Forestgreen", size=0.6) +
Geom_hline (yintercept=m_good,linetype=2,colour= "Blue3", size=0.6) +
Geom_hline (yintercept=m_bad,linetype=2,colour= "Red2", size=0.6) +

Annotate ("text", x = 0.5, y = 1.05, Label=paste ("ks=", Round (M_ks, 4), "at Pop=", round (Pop, 4)), size=4, alpha=0.8) +

Scale_x_continuous (Breaks=seq (0,1,.2)) +
Scale_y_continuous (Breaks=seq (0,1,.2)) +

Xlab ("of total Population") +
Ylab ("of total bad/good") +

Ggtitle (label= "Ks-chart") +

THEME_BW () +

Theme
Plot.title=element_text (colour= "gray24", size=12,face= "bold"),
Plot.background = Element_rect (fill = "gray90"),
Axis.title=element_text (size=10),
Axis.text=element_text (colour= "gray35")
)

Result<-list (M_ks=m_ks,pop=pop,plotks=plotks,df_ks=df_ks)
Return (Result)
}
######################### over #######################################

Next, take the actual data as an example to see the result of the function's operation.

Pred_train is the predicted result of modeling, here is the probability form:

> Pred_train

[1] 0.40418112 0.35814193 0.45220572 0.53482002 0.12923573 ...

Labels_train is good or bad tag:

> Labels_train

[1] 0 0 0 0 0 ...

The results of the function run are stored in the Train_ks:

Train_ks<-plotks_n (Pred_train, Labels_train, 1, 100)

Let's look at each element in Train_ks:

1, KS maximum value

> Train_ks$m_ks

[1] 0.4492765

2, KS to take the maximum number of people hundreds of points

> Train_ks$pop

[1] 0.3803191

3. KS Curve Object

R language to draw KS curve; R Language Implementation KS curve

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.