How to customize the personalized credit scoring model in the era of mutual gold

Source: Internet
Author: User

How to customize the personalized credit scoring model in the era of mutual gold

Compared with the traditional financial institutions, internet finance to the personalized service to win, thus also spawned a variety of deep-rooted market segments of the internet finance Company. However, the original experience of traditional wind control has been difficult to meet the emerging internet finance Company's wind control needs.

How to use the most time-saving method to create a personalized wind control model that fits your own business characteristics? How to use advanced big data means to rapidly improve the efficiency of wind control, reduce the cost of wind control? Let's introduce the seat today.

Wind control will become the Internet financial development "pain point"

After 2013-2014 years of explosive development, the Internet finance industry gradually stepped into the stage of standardization development. 2016 is the year of industry supervision, with the constant clarity and standardization of the regulation policy of Internet financial industry, risk management has become the "pain point" in the development of internet finance. Only a good risk management, in order to enable enterprises in the new round of shuffle in the fore, access to a broader market and better quality users.

>>>>

Let's take a look at two examples:

Case One

A company used to have mortgages, but now it is turning to consumer loans, and without historical data, there is no way to establish a reasonable risk management model.

Case Two

B Company designed a product, the main guest group for the university students, for this product, the general wind control model is obviously not applicable, but the establishment of a new model and the need for a very large human and material resources.

For the general Internet financial companies, they serve the time is not long, the limited number of services, access to information can not form a huge amount of data, and the domestic company has only a few of adult data, and the barriers between the various companies are obvious, "data island" effect is particularly significant. As a purely third-party organization, the cost of acquiring large amounts of data is high, and it is obvious that a set of wind control systems will cost a great deal of manpower, material resources and energy. In order to solve this problem, we need to cooperate with each other, actively break the "data island" pattern, and jointly establish a risk management model.

Create a customized big data risk control model

It is not easy for a single company to independently develop a personalized wind-control model that aligns with its own business characteristics. The biggest difficulty is that you may lack the user's historical default data, or the lack of financial-related dimension data in your customer data. If there is a data platform that can help you to supplement the corresponding customer group of financial labels and historical default data, will not solve the vast number of mutual gold enterprises personalized wind control needs of a good way? To meet these requirements, at least the following six steps are required:

1

Data extraction

First, we pick out the variables related to the customer and further generate many derivative variables as a pool of variables for the modeling platform. According to the business knowledge, the variables are divided into seven dimensions, such as performance ability, identity characteristics, behavioral characteristics, consumption preference, risk of losing faith, growth potential and social credit.

2

Data supplement for similar guest groups

Most of the customers in the small loan company have fewer samples, in order to better build the model, use more variables. We will own samples, through advanced sampling methods and similar customer group matching algorithm, add part of the sample data into the customers to break the "Data island".

3

Missing value padding

Data that is modeled may be missing due to a lack of acquisition or a customer's absence of data for a particular business sequence. For this, we use different missing value filling methods to fill the missing values of different types of variables, including median padding, function relation fill and Bayesian network fill, etc. It has brought great convenience to the subsequent modeling work.

4

Feature Engineering

Feature engineering mainly consists of two parts: one is to generate derivative variables and the other is to filter variables. In model training, you can use feature sets produced by different methods to eventually mix the resulting models. The derivative variables are mainly made by the actual business meaning of the features, or by the machine learning algorithm and principal component analysis method. After generating the derived variables, the variables need to be filtered to improve the efficiency of the model and prevent overfitting. The final set of characteristic variables is a business-oriented, technology-supported aggregate.

5

Establish a decision model

Using professional data processing methods for data analysis, quantitative data indicators, fusion of cutting-edge big data decision-making technology for data modeling. Includes recursive decision Trees (GBDT), deep learning, Bayesian networks, etc. According to different customer needs, research the combination of customization model, build the final model output in the way of mixed model, and optimize the model regularly to improve the prediction ability of the model.

6

Model Tuning

For random Forest and GBDT model, we need to select the optimal parameter in the large parameter space, and its parameter types can be divided into two main types: tree-specific parameter and boosting parameter. Tree-specific parameters are those that affect a single tree, and the boosting parameter refers to the parameters that affect the global integration algorithm. Adjusting these parameters can improve the accuracy of the model under the precondition of preventing overfitting.

Verifying the effectiveness of custom joint modeling

In order to verify the effectiveness of the model, the seat invited a small loan company to be tested, using data to detect the actual effect.

In general, the customer base is determined by the product characteristics. For example, a short-term lending scenario of the product, the corresponding customer base is often higher risk, they may be temporary capital turnover, only need to borrow within a 1-month cycle. The small loan company provided us with the corresponding customer base for the product, and we supplemented the customer's historical default data and financial attribute tags and modeled them. The proportions using the number of dimension variables are as follows:

In general, different categories of data have different performance capabilities in predicting default risk. Shows the significance of each dimension variable to the default label (KS as the indicator).

It can be seen that the performance of the most variable number of performance and identity characteristics of the dimension, the model is the best, it is also in line with the belief that personal identity information, financial information in the prediction of default is the most important information of the industry consensus.

In the modeling process, we tested the logistic regression, the random forest, the iterative decision Tree (GBDT), and the composite model (Stack models) in turn. The KS values of the models on the original data and the fused data are as follows:

From the KS value, the model shows a significant improvement of 19% compared with the full use of customer data, and the combined model has a 10% improvement over the best performing single model-the recursive decision tree-in the fusion data. AUC performance improvements using converged data and composite models are consistent with KS ascension.

In addition, we not only comprehensively evaluate the customer credit, but also calculate the seven sub-dimensional scores reflecting different credit conditions, and calculate the correlation between different sub-dimension scores. Given the correlation of sub-dimensional score, the darker the color, the higher the value, the stronger the linear correlation between the two variables. It can be seen that there is a certain correlation between the sub-dimensions, but the correlation is not strong, and there is still a complementary value.

Business real-world scenario tracking verification

According to the above model test results, we apply this model to a large peer-to network loan company, and select the April 2016 real default customers and normal customers to do model validation, the pink area on behalf of default customers, blue area represents normal customers. It can be seen that the model sensitivity can still reach more than 0.21 in the real environment, thus verifying the robustness of the model is very high.

Summary

From our actual tracking results, the above model in the actual business scenario has achieved a very significant improvement, the main reason is due to two aspects: first, the richness of big data information, we make full use of their own data information, can greatly complement the customer's own lack of data. The second is the Advanced Modeling analysis method, which ensures that these big data are fused and the predictive features that the business scenario really needs are refined, so that the risk credit of the target customer group can be predicted very accurately.

The seat here also does not suspense, the above model is the pre-Sea credit launched modeling platform, want to experience a smart wind control experts accompanying the convenience? In the background to the small message it!

Notes

In order to promote learning and exchange among industry practitioners, 1 has been opened. Pre-loan wind control, 2. Anti-fraud, 3. After-loan bad asset disposal, 4.FinTech big data technology and other four communication groups. Scan the following QR code, add the administrator as a friend, and reply to the Administrator your favorite areas of interest in the keywords, the administrator will pull you into the corresponding 500-person communication group.

About "Chaoyang 35 place"

We are a professional big data mining team from former Hai Zhengxin. We are not only data scientists, but also data players. We use data mining technology to solve professional problems, but also like to find data in all areas of fun, in a fun and understandable way to provide you with fresh and reliable content. We will be a weekly push to send an original dry goods, welcome to the bottom of this post of the "write a message" with us and the vast number of big data enthusiasts to communicate and discuss in real time.

How to customize the personalized credit scoring model in the era of mutual gold

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.