Random forest-life insurance Customer information analysis

Source: Internet
Author: User

Note: Due to professional requirements, all the figures in this article have been modified, not real numbers, I am sorry to not post the source code

Goal:

Analysis of customer characteristics of risk

Background:

At present the marketing department uses the promotion analysis system only for the customer survey return information analysis, and only has the age/gender/marital status/Income four dimensions, the forecast precision is not high. The marketing department wants to analyze the key factors that affect their choice of insurance products from existing life insurance customer information, thereby improving marketing activities more specifically

Modeling process:

Input: From the existing tens of millions of customer information to extract their personal information, after cleaning left more than 100 characteristics, including marriage, age, income, height and weight, occupational risk degree, residential areas. Use the categories of existing products as classified information, including savings insurance, life insurance, term insurance, investment insurance, etc.

Algorithm:

First, use decision trees to make rough predictions to verify the validity of input data, and use random forest to output important features

The advantage of the decision tree is that it is intuitive, easy to implement, and can handle both discrete and continuous variables, and the process of adding variable changes is not small. A year of customer information was extracted from the data as a training set, and a decision tree was established to predict the category of insurance products selected by the customer.

Results Analysis:

The first run hit rate is only 40%, analyzing its confusion matrix:

It can be seen that the decision tree in the last classification of the effect is very poor, can be said to have no effect, in the third and fourth classification is not high degree of distinction.

The last classification is the investment insurance, indicating that the existing customer characteristics do not meet the difference between the classification of the investment insurance, need to add the characteristic value

Thirdly, the four categories are in fact a periodic insurance, one is the payment of a regular period of time, the other is insured by the age of the regular, essentially the difference is not small, can be combined

Temporarily filter out investment insurance customer information, combined with periodic insurance customer information, re-run confusion matrix

You can see that the classification has improved, and the hit rate can reach 60%.

234 classification of the degree of distinction seems to be good, only the first type of savings insurance category is not high, the part of the customer information filtered out, you can achieve a good hit rate.

In addition to the accuracy of the decision tree, the advantages of the random forest are more important than the feature . And that's exactly what the market segment needs.

The end result shows that over the past 10 years, the customer's marital status/age/Height weight has contributed the most to the customer's choice of insurance products.

The results of the model will eventually appear on tableau:

such as the characteristic value contribution degree trend

Statistics of policy number under important feature classification

Random forest-life insurance Customer information analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.