Data Mining Application Examples: RFM model analysis and customer segmentation

Source: Internet
Author: User

Just to help a telecommunications industry to complete a data mining work, in which the RFM model is still a certain representative, and then the data mining RFM model of modeling ideas to share with you the details! Mobile top-up business is a major form of telecommunications business, customer recharge behavior Records just meet the RFM model transaction data requirements.

According to Arthur Hughes of the American Institute of Database Marketing, there are three magical elements in the customer database , the three elements that make up the best indicators of data analysis : The most recent consumption (Recency), the frequency of consumption ( Frequency), consumption amount (monetary).

My early two blog posts have detailed the RFM idea and IBM Modeler operation Process, and interested friends can read!

The RFM model: R (Recency) indicates how long the customer's last purchase was, and F (Frequency) represents the number of times the customer purchased in the most recent period, and M (monetary) represents the amount the customer purchased in the most recent period. General raw data for 3 fields: Customer ID, purchase time (date format), purchase amount, with data mining software processing, weighted (consider the weight) to get RFM score, and then can be customer segmentation, customer classification, customers level value score ranking, etc., to achieve database marketing!

Here again, the RFM customer RFM Classification Chart of the @ Data mining and data analysis is borrowed.

Software tools for this analysis: IBM SPSS Statistics 19,ibm spss modeler14.1,tableau7.0,excel and PPT

Because RFM analysis is only a small part of the project, it also faces the ability to handle massive amounts of data, which is required for both the memory and hard disk capacity of the computer.

Let's start with a little bit of experience with the massive data mining and processing: (refers to the personal computer operating platform only)

  • Generally we get the data are compressed format text files, need to decompress, all in the G-byte storage units, generally preferably external power to move the hard disk storage; If the customer does not tell, you probably do not know how many records and fields;
  • The default installation of Modeler mining software usually requires the data exchange with the C drive, at least 100G space reservation, otherwise the reading data will cause the lack of space
  • Massive data processing to have patience, wait for more than 30 minutes to run the results are often the phenomenon, especially in the sampling, merging data, data reconstruction, neural network modeling process, to have toughness, or a minute break on the tragedy, huh, hehe;
  • The preparation phase and data preprocessing time for data mining accounted for 70% of the total project, and I said here that if it was a large data set, it might take more than 90% of the time. On the one hand is the processing time, on the other hand may only this computer processing, not several computers simultaneously operation;
  • Many different, this is the experience I have been emphasizing. So a lot of data need to use sampling technology to view data and pre-operation, remember: sometimes even if the sample data is normal, it may be all the data is problematic. Recommended data delimiter with "|" Storage
  • How to emphasize a data mining project and mining engineers to the industry's understanding and business insight is not too, good data mining must be market-oriented, of course, IT staff and marketing personnel have a good communication mechanism;
  • Data mining will face data dictionary and semantic layer meaning understanding, in metadata metadata management and understanding of the effort will be more effective, otherwise, such as data reconstruction to complete the discovery of problems and re-start, tragedy;
  • Every time the massive big data mining work is I on the microblog most time Hou, it really did not I calculate fast, had to on Weibo and so on it, haha!

traditional RFM analysis transforms into the telecom business RFM analysis main thinking:

the RFM model here and then the customer segmentation is only a small part of the data mining project, assuming we have a one-month customer recharge behavior data set (actually six months of data), we We first build an analysis flow with IBM Modeler software:

The data structure fully satisfies the RFM analysis requirements, and the one-month record has 30 million transactions!

We first generate R (Recency), F (Frequency), M (monetary) with the RFM Rollup node and RFM analysis node of the RFM model of the mining tool.

Then we use RFM Analysis node to reconstruct and reorganize the RFM model base data.

Now we have Recency_score, Frequency_score, Monetary_score, and Rfm_score of the RFM model, where the RFM score is cut by five, using 100, 10, A 1 weighted RFM score indicates 125 RFM cube blocks.

The traditional RFM model has been completed, but there are too many 125 market segments to identify customer characteristics and behavior, and it is necessary to further subdivide the customer base;

In addition: The RFM model is actually just a data processing method, the use of reconstruction technology can also be done, but this hardening of the RfM module is more straightforward, but we can use RFM to build data in a way that is not RFM can also be used for data reconstruction of the module.

We can import the resulting data into Tableau software for descriptive analysis: (Data mining software is very mentally retarded in terms of descriptive and tabulation output, haha)

We can also perform different blocks of comparison analysis: mean value analysis, block category analysis, etc.

At this point we can see the convenience of the Tableau visualizer

Next, we continue to use mining tools to cluster R, F, M three fields, clustering analysis mainly uses: Kohonen, K-means and two-step algorithm:

At this time we have to consider whether the direct use of R (Recency), F (Frequency), M (monetary) three variables or transformations, because the R, F, m three fields of the measurement scale is best to standardize the three variables, for example: Z-score (the actual situation can choose linear interpolation method , comparative method, standardization of standard method, etc.)! Another consideration: is the R, F, m three indicators of the weight of how to consider, in the real marketing of these three indicators of the importance of the obvious difference!

A data study shows that, for the index weights of the RFM variables, Hughes,arthur that RFM weights are consistent on the measurement of a problem, and thus does not give different divisions. and stone,bob through the empirical analysis of credit card , that the weight of each index is not the same, should be given the highest frequency, near the second, the lowest value of the weight;

Here we use the Weighted method: wr=2 wf=3 wm=5 Simple Weighting method (the actual situation needs to be determined by experts or marketers); The specific choice of clustering method and number of clusters requires repeated testing and evaluation, but also to compare three methods which is more ideal!

Is the result of using fast clustering:

And the clustering results of the Kohonen neural algorithm:

Next we want to identify the meaning and class of clustering results: Here we can use C5.0 rules to identify the characteristics of different clusters:

One of the two-step two phase clustering feature graphs:

The Evaluation Analysis node is used to judge the C5.0 rule's model recognition ability:

The results are good, we can choose three different clustering methods, or choose a more easily explained clustering results, here Select Kohonen Clustering results to write the cluster field to the dataset, in order to facilitate our data into the SPSS software for mean analysis and output to Excel software!

after outputting the result, import the data into Excel, compare R, F, m three field classification with the mean of the field, and use the conditional format of Excel software to give the trend of comparing with the mean value! Identify customer types in combination with RFM cube blocks: by RFM analysis , customer groups are divided into six levels that are important to keep customers, important development customers, important retention customers, general important customers, general customers, worthless customers, etc. (there may be a level that does not exist);

Another consideration is that the standardized scores for R, F, m three indicators are weighted by cluster results, then the overall score is ranked to identify the level of customer value in each category;

At this point, if we are satisfied with the analysis and customer segmentation of the RFM model, we may end this analysis! If we also have a customer background information repository, you can use clustering results and RFM scores as arguments for other data mining modeling work!

Transferred from: http://shenhaolaoshi.blog.sohu.com/201923838.html

Data Mining application case: RFM model Analysis and Customer segmentation (RPM)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.