The reason and solution of Shuangfeng distribution in credit score card model development

Source: Internet
Author: User

The reason and solution of Shuangfeng distribution in credit score card model development 

Text: Zheng Shang Liu Chaoli

Turn from: A few letters of mutual integration

In the process of credit scorecard model development, normality is an important index to check whether the model credit score distribution is effective. Normally, the standard normal distribution is a single-peak distribution, but in the actual modeling process, you will encounter the Shuangfeng of the credit score distribution.

When the Shuangfeng distribution appears, the hypothesis of consistency of data law is broken, we need to examine the reason of its Shuangfeng distribution from different angles, adjust the model to accurately reflect the laws in the business and data, so that the model can be applied accurately.

Based on the practical experience of establishing scorecard model for dozens of internet financial enterprises, we summarize some factors that cause Shuangfeng distribution.

This article will analyze the Shuangfeng situation of credit score distribution from three angles of business model, input system change and variable, and explain the reason of its appearing. Through the actual case, through the specific solution sharing, with the reader to discuss the exchange discussion.

1

Reference Model: Default model (mixed model of first-interest and equal-interest)

Explain the angle: business model (or good or bad customer defined mode)

According to the initial quality of the project customer definition, the score distribution 1 is shown. There is a Shuangfeng distribution.

Figure 1 Distribution histogram of credit score

Taking into account the good or bad customer definition and the actual business model is large difference and bad customer account is too low, to take a redefinition of good or bad customer. The new rating distribution is shown in histogram 2.

Figure 2 Histogram of credit score distribution after changing good or bad customer definition

It can be seen that the problem of Shuangfeng of scoring distribution is solved by changing the redefinition of good and bad customers.

2

Reference Model: Default model (mixed model of first-interest and equal-interest)

EXPLANATION angle: Input system change

Score Distribution 3, Figure 4 shows that there are Shuangfeng problems in two models.

Fig. 3 The distribution histogram of credit score of the model after the first interest

Figure 4 The distribution histogram of credit score of the equal-interest model

After comparing and analyzing the variables in the two credit scorecard models, we found that one of the common variables, that is, the spouse knew the loan, scored very low in two models. We analyzed the variable in the original data and found that the variable had no missing value after December 27, 2014. At the same time, we also found that the Living class variable does not have a missing value after that point. So we judged that at that point in time the company has undergone a major change. After contact with each other, confirmed at this point in time the company did do the application interface, required fields, wind control control and other related adjustments. In the case of a spouse knowing the loan, there is a missing option before the system is adjusted, and after the adjustment, there is no missing option and the system default is yes.

Since the entry system has changed significantly, the data at the time before and after the change is different, and we decided to model the data after that point in time.

The new rating distribution histogram 5, Figure 6 shows:

Fig. 5 The distribution histogram of credit score for the new model after the first interest

Figure 6 A histogram of credit scoring distributions for a new equal-interest model

By discovering the changes in the input system, we modeled the changed data to re-adjust the Shuangfeng distribution to a single-peak distribution.

3

Reference Model: Application Model

Interpreting Angles: Variables

The credit score distribution Histogram 7 shows that there is a bimodal situation.

Figure 7 Distribution histogram of credit score

When Shuangfeng is found, we consider whether a variable has a significant effect on the prediction of variables, which results in a significant impact on the overall credit score distribution. In order to find this variable, when calculating the overall score, according to the model variable IV value, from large to small to delete the variable, observe the distribution of credit score after deleting a variable. found that when the bank card monthly turnover variable, the credit score distribution of the Histogram 8, found that Shuangfeng disappeared, which confirms that the Shuangfeng problem is due to the bank card monthly water variables caused.

Figure 8 Deleting a bank card credit score distribution histogram after average monthly running water variable

Since the credit score distribution after the average monthly turnover of the bank card is not a standard normal distribution or approximate normal distribution, further analysis is given: the scores and corresponding sample quantities under each classification of the monthly water variables of the bank card are counted. found that in the case of equal frequency, running water is less than or equal to 39000 yuan and more than 39000 of the difference between the two types of scores. Then continue to consider whether because the bank card monthly water variables of less than or equal to the 390,001 class score is too low, and other classification of the score difference, making the overall credit score does not meet the standard normality.


Therefore, the sample is divided into two categories according to the bank card monthly water variables greater than 39000, and less than 39000, the distribution of the respective overall score, respectively 9, Figure 10 shows.

Fig. 9 The credit scoring histogram of the bank card's monthly running variables greater than 39000

Figure 10 Credit scoring histogram for bank card monthly average water variable less than or equal to 39000

As can be found in Figures 9 and 10, according to the bank card average monthly flow variable is greater than or less than or equal to 39000 after dividing the sample, the credit score distribution under two data sets shows a certain normality, the peak of 10 and 430 in Figure 450 is the cause of the small peak of the overall score distribution.

However, through multiple adjustment of the bank card monthly water variable grouping, Shuangfeng problem still can not be solved, analysis whether due to unknown external factors, resulting in the bank card monthly running water in [0,39000] and 39000+ two range, the sample between the large difference.

We will then observe whether the input time is different due to: through the verification found that the bank card monthly flow variable types, are distributed in the input time from September 19, 2014 to May 15, 2015 in the interval, there is no significant difference.

Further consideration is caused by the difference in the lending products:

Further analysis found that under the same product category, the frequency distribution of the monthly running variables of the bank cards is not significantly different from the distribution of bad customers.

It is determined that the external factors that generate Shuangfeng are not the difference between the time of arrival and the loan product, but the possibility that when the bank card monthly running water is smaller (less than 39000), the business to the successful loan requirements more stringent, so that the sample of this interval is a bad customer ratio is very high, so that the interval credit score is very low, Finally, the credit score distribution appears Shuangfeng.

When the corresponding fractional adjustment of the bank card monthly water variables, the credit score distribution of 11 is shown.

Figure 11 Adjusting the bank card the histogram of credit score after monthly average water variable

Thus, we adjust the Shuangfeng distribution to a single-peak distribution by adjusting the variable fractions.

In the industry to tell the development of today, the Internet Financial Enterprise management system and the wind control strategy is also constantly updated and perfected, the corresponding internal system constantly improve the situation, the business model, system upgrades, personnel changes will cause the data has the implied law changes, the data consistency hypothesis is broken. The data Modeler first confirms each time node of the change of business, system, person, etc. with the other party in order to make a pre-judgment and logically validate the possible data changes.

After the data is determined, it is necessary to determine the number of models, as much as possible to maintain the unity of the customer base and product, to avoid confusing different client groups or different products in a model. After determining the data to take the time, the model number, to combine the business model and customer needs to give a good or bad customer definition, so that the model development smoothly.

After the completion of the credit score card development, if the score distribution appears Shuangfeng, we want to identify the reasons from various aspects, such as whether the other business model has been a large change, the entry system has been updated, good or bad customer definition is appropriate, variable grouping is reasonable and so on. For models of different business backgrounds, we have to take an effective approach to the inspection. I hope that after reading this article you can have a visual understanding of the distribution of Shuangfeng, and the actual work encountered in the situation is ever-changing, the specific situation needs to be from a practical point of view, according to the relevant business background to identify the cause.

The reason and solution of Shuangfeng distribution in credit score card model development

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.