With the intensification of market competition, China Telecom is facing more and more pressure, customer churn is also increasing. From the statistics, the number of fixed-line PHS this year has exceeded the number of accounts. In the face of such a grim market, the urgent task is to make every effort to reduce the loss of customers. Therefore, it is necessary to establish a set of models that can predict customer churn rate in time by using data mining method.
(i) Determining the target of a customer churn model: Predicting the potential loss of a customer list. After the analysis of the market, we found that the loss rate of fixed-line PHS is relatively large, and broadband and other data services are still in the growth period, the loss rate is relatively small. Therefore, we limit the forecast product range to fixed and PHS. In addition, we do not consider those who because of the cost of mandatory sales customers, because these customers have no value. Also, customers who have been added to a certain set of time limit packages and have not yet expired will not be considered. In this way, the target scope of our modeling becomes more explicit.
(ii) Obtain data for modeling. Modeling data can be extracted from various operating systems. Customer data can be extracted from the IBSS system, service data, product data, package data, business data: from the billing and accounting system to extract local telephone billing data, long-distance billing data, intelligent network billing data, provincial data service billing data, extracting channel data from CMMS system, extracting address data from resource system, resource data , extracting call data from the Exchange system, and so on. In addition, some data need to be obtained through market research, such as to investigate which areas are other operators have wiring of fixed-line competition areas. Can be in the area of the junction box data plus the "competition area" logo.
(iii) The data is cleaned, formatted and converted into a modeling dataset. A customer may have multiple landline and PHS, the pin refers to the fixed-line PHS dismantling machine, rather than the customer no longer use all telecommunications products. So the real loss of customer forecasts is not worth it. After analysis, we determined that the modeling object is the service entity, that is, fixed and PHS. The row of the model set represents a landline or PHS, and the billing data corresponds to each column. Other than that. To bring the forecast closer to reality, we took the billing data for the last 12 months. Next, we want to eliminate some invalid variables, such as ID number, phone number, absolute date, address data, and so on. These offsets are not useful for modeling. The last is to add the derivative variables. This process requires us to conduct in-depth analysis of the telecommunications business and give full play to creativity in order to produce a set of derived variables that are meaningful to modeling. such as according to a fixed-line junction box, we extract the "whether in the competition region" variables, from the call date can be extracted from the "days", "whether Holiday" and other variables, in addition, can also be combined to generate all the months of the sum of variables and their variance, the monthly variables accounted for the sum of the variables. With these cleaning and transformation work, we generate a dataset for modeling.
(iv) Establishment of models. We choose the SAS EM Package as the modeling tool and choose the decision tree algorithm in the mining algorithm. The decision tree algorithm can handle hundreds of fields, has exploratory function and is highly automated. Considering the big difference between the fixed and PHS products, it is necessary to establish a forecast model separately. Next, we categorize the customers. The average monthly consumption is divided into High-value customers and low value customers. Another two categories of special customers, the recent accounts and customers have application packages, the four types of customers to establish models, and then the model to merge.
(v) Model assessment. The evaluation of the model was also divided into four categories of clients scoring separately. That is to generate four types of customer scoring data sets, respectively, input model, to obtain the forecast results. The results are compared with the actual situation to evaluate the effectiveness of the model.
(vi) Model prediction results are used to support decision making. After the customer churn rate forecast model is established, we can predict the probability of the loss of a customer in time. When the likelihood of loss is higher than a certain point, we think he is likely to lose customers, you can timely launch of targeted marketing package to leave the customer