R language to draw KS curve; R Language Implementation KS curve

Last Update:2018-05-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

For more information, please pay attention to the public number "Big Data wind Control"

The code is encapsulated in the function Plotks_n, Pred_var is the prediction result, can be a score or probability form; Labels_var is a good or bad label, with a value of 1 or 0, 1 for a bad customer, 0 for an excellent customer, and a descending for controlling data in descending order of default probabilities, If the Pred_var is a score, then descending=0, if Pred_var is a probability form, then descending=1;n indicates that the KS value is calculated after dividing n parts after the data is sorted in descending order of risk.

The Plotks_n function returns a list of the elements in the list that are the KS maximum, the person with the maximum value of KS Hundreds of points, the KS curve object, and the KS data frame.

The code is as follows:

#################### Plotks_n ################################
Plotks_n<-function (Pred_var, Labels_var, Descending, N) {
# Pred_var is prop:descending=1
# Pred_var is score:descending=0
Library (DPLYR)

df<-data.frame (Pred=pred_var, Labels=labels_var)

if (descending==1) {
Df1<-arrange (DF, desc (Pred), labels)
}else if (descending==0) {
Df1<-arrange (DF, Pred, labels)
}

Df1$good1<-ifelse (df1$labels==0,1,0)
Df1$bad1<-ifelse (df1$labels==1,1,0)
Df1$cum_good1<-cumsum (DF1$GOOD1)
Df1$cum_bad1<-cumsum (DF1$BAD1)
Df1$rate_good1<-df1$cum_good1/sum (DF1$GOOD1)
Df1$rate_bad1<-df1$cum_bad1/sum (DF1$BAD1)

if (descending==1) {
Df2<-arrange (DF, desc (Pred), desc (labels))
}else if (descending==0) {
Df2<-arrange (DF, Pred, desc (labels))
}

Df2$good2<-ifelse (df2$labels==0,1,0)
Df2$bad2<-ifelse (df2$labels==1,1,0)
Df2$cum_good2<-cumsum (DF2$GOOD2)
Df2$cum_bad2<-cumsum (DF2$BAD2)
Df2$rate_good2<-df2$cum_good2/sum (DF2$GOOD2)
Df2$rate_bad2<-df2$cum_bad2/sum (DF2$BAD2)

rate_good<-(DF1$RATE_GOOD1+DF2$RATE_GOOD2)/2
rate_bad<-(df1$rate_bad1+df2$rate_bad2)/2
Df_ks<-data.frame (Rate_good,rate_bad)

Df_ks$ks<-df_ks$rate_bad-df_ks$rate_good

l<-Nrow (Df_ks)
if (n>l) n<-L
df_ks$tile<-1:l
qus<-quantile (1:l, probs = seq (0,1, 1/n)) [-1]
qus<-Ceiling (Qus)
df_ks<-Df_ks[df_ks$tile%in%qus,]
df_ks$tile<-df_ks$tile/l
Df_0<-data.frame (rate_good=0,rate_bad=0,ks=0,tile=0)
Df_ks<-rbind (Df_0, Df_ks)

M_ks<-max (Df_ks$ks)
Pop<-df_ks$tile[which (Df_ks$ks==m_ks)]
M_good<-df_ks$rate_good[which (Df_ks$ks==m_ks)]
M_bad<-df_ks$rate_bad[which (Df_ks$ks==m_ks)]

Library (GGPLOT2)
Plotks<-ggplot (Df_ks) +
Geom_line (Aes (Tile,rate_bad), colour= "Red2", size=1.2) +
Geom_line (Aes (Tile,rate_good), colour= "Blue3", size=1.2) +
Geom_line (Aes (TILE,KS), colour= "Forestgreen", size=1.2) +

Geom_vline (xintercept=pop,linetype=2,colour= "Gray", size=0.6) +
Geom_hline (yintercept=m_ks,linetype=2,colour= "Forestgreen", size=0.6) +
Geom_hline (yintercept=m_good,linetype=2,colour= "Blue3", size=0.6) +
Geom_hline (yintercept=m_bad,linetype=2,colour= "Red2", size=0.6) +

Annotate ("text", x = 0.5, y = 1.05, Label=paste ("ks=", Round (M_ks, 4), "at Pop=", round (Pop, 4)), size=4, alpha=0.8) +

Scale_x_continuous (Breaks=seq (0,1,.2)) +
Scale_y_continuous (Breaks=seq (0,1,.2)) +

Xlab ("of total Population") +
Ylab ("of total bad/good") +

Ggtitle (label= "Ks-chart") +

THEME_BW () +

Theme
Plot.title=element_text (colour= "gray24", size=12,face= "bold"),
Plot.background = Element_rect (fill = "gray90"),
Axis.title=element_text (size=10),
Axis.text=element_text (colour= "gray35")
)

Result<-list (M_ks=m_ks,pop=pop,plotks=plotks,df_ks=df_ks)
Return (Result)
}
######################### over #######################################

Next, take the actual data as an example to see the result of the function's operation.

Pred_train is the predicted result of modeling, here is the probability form:

> Pred_train

[1] 0.40418112 0.35814193 0.45220572 0.53482002 0.12923573 ...

Labels_train is good or bad tag:

> Labels_train

[1] 0 0 0 0 0 ...

The results of the function run are stored in the Train_ks:

Train_ks<-plotks_n (Pred_train, Labels_train, 1, 100)

Let's look at each element in Train_ks:

1, KS maximum value

> Train_ks$m_ks

[1] 0.4492765

2, KS to take the maximum number of people hundreds of points

> Train_ks$pop

[1] 0.3803191

3. KS Curve Object

R language to draw KS curve; R Language Implementation KS curve

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

R language to draw KS curve; R Language Implementation KS curve

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

R language to draw KS curve; R Language Implementation KS curve

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support