How to exert the effectiveness of data mining in Enterprise Informatization (collection)

Source: Internet
Author: User
Tags variables variable
Data China's banking, securities, telecommunications, insurance industry are talking about "data concentration", hope on this basis to achieve customer relationship management and business intelligence. The new job title, "Data mining engineer," is also vaguely present in the company's recruiting posts.

Is data mining going to work? Some business leaders have misgivings about this. Data mining people are talking about strange technical terms, their origins are complex, that is not entirely computer science, and not like statisticians, not even marketing planners, they are not easy to understand the results of their work for my business development in the end what it means? Some tech-born managers may be hot on data mining in the hope of finding new business models as quickly as possible and finding new ways to make money, while business-intuitive managers tend to be resistant to this precise quantitative approach, and the flaws in data mining itself can cause it to be vulnerable.

In order to give full play to the effectiveness of data mining, it needs the understanding of enterprise managers and more efforts of data mining personnel. Based on the experience of previous data mining projects, the author tries to explain some confusing problems.

1. Application of the results

Problem: The results of data mining are partly submitted in the form of probabilistic data, which is the most likely place for criticism. Business executives may ask, I want you to make predictions about my customer churn, why can't you tell me exactly which clients are going to lose next month? And can only tell me the probability of each customer loss. I want you to predict which customers will have insurance fraud, and you are still submitting the customer's probability of cheating. How do I use this probability value, do I dare?

Explanation: The prediction model of data mining is approximate to the real world, because the behavior information of the customers stored in the Enterprise customer database is impossible to be exhaustive, and the information that may not be collected from the customer is precisely related to the customer's loss or fraud. Therefore, the prediction model based on the existing information is imprecise, the deterministic result is not, but the probability value. Such a result is still useful, because the prediction of those who lost high probability of customers, the actual loss is often particularly many, enterprises focus on this part of the customer retention and maintenance, targeted particularly strong, can save the enterprise resource costs. Similarly, among customers with higher probability of fraud, the rate of actual fraud is much higher than that of other customer groups, so specialized investigators can focus on these clients and tend to do more with less. The savings of resources mean the growth of benefits.

2. Selection of variables

Problem: Building a predictive model is a fascinating idea. The goal of the forecast is better to determine, you have to predict customer churn, then "customer is Lost" (binary variable) is the target variable; If you want to predict the ups and downs of the stock, then the "closing price rises" is the target variable. But it's a lot of trouble to decide which variables to use as arguments (think back to the definition of functions in high school algebra). In other words, it is often a matter of opinion to determine which factors are related to the target variable. If this problem is not solved, it will directly affect the performance of the predictive model. So, should it be business people to decide, or data mining personnel to decide?

Explanation: The best way is to combine the two sides. Long-term business experience of business people makes them acutely aware of what factors are closely related to the target variable. But experience is limited, even if people think, business people will omit many seemingly unrelated but actually important factors, and because the human brain has limited ability to deal with, sometimes have to ignore a number of factors and the complex and subtle interplay of factors, which is where data mining people can play a role. There are a number of mature methods in statistics that can help us pick the right variables to construct our prediction model.

There is also a common phenomenon: a variable chosen by a data mining person who later finds it beneficial to improve the accuracy of the model, but may not get a reasonable explanation of the business, at which point the enterprise business person will ask to delete the variable. In fact, the results of data mining are often beyond our imagination, and our instinct is to tend to reject incomprehensible things and even risk damaging models to predict performance-a practice that is harmful because the current inability to explain does not mean that it cannot be explained later (Wal-Mart's "beer and diapers" are said to be) The rule discovery is also afterwards supplemented by the market research to be explained); Data mining results are not derived from the imagination, but through the thousands of years of human development in the mathematical theory of countless proven effective complex algorithm based on, can not be simply denied; and, if this variable goes into the predictive model, proved to be beneficial to the accuracy of the model, it is a pity to remove it. Don't forget "practice is the test of true
The basic principle of "the sole criterion of reason".

3. The superstition of the "Ascension degree (lift)"

Problem: After understanding the performance evaluation of a predictive model, business people may often ask a data mining engineer: "What is your model's ascension level?" "It seems that less than 3.0 is a bad model. So how much can be achieved to accept it?

Explanation: Ascension is an important indicator of the predictive model, but not the only one. We also have mixed matrices, response capture rates, ROC curves, Threshold based diagnostics, and more. Model promotion in different industries is different, and different regions of the same industry may be different. We have tried to predict the loss of mobile phone users with roughly the same independent variables, and the model in Guangdong is only 2.2 better, and the model is up to 5.2 when it is applied at another time, and in Hubei province it can reach 7.0. Therefore, the acceptance of the model can not only be based on the degree of ascension, but should be measured by the results of its prediction can be created to calculate the return on investment. However, data mining personnel should take the initiative to find ways to try different enhancements, without causing the model to "over fit" (Overfitting) premise, as far as possible to improve the prediction accuracy of the model, because the model precision of 1% rise, it may mean that the merchant's millions of revenue.

4. The purpose of subdivision

Problem: The customer segmentation produced by data mining, compared with traditional experience segmentation, can consider more behavior attribute of customers, get richer subdivision possibility, each customer group has more distinct behavior characteristic. But what kind of customer segmentation results are good? How many groups is the best fit to divide the customer? Is the disparity between the groups of people a very poor result of segmentation?

Explanation: The predictive model has many metrics, but the model performance of customer segmentation does not have a certain measurement standard. We do not know in advance which group a customer should belong to. Customer segmentation model is good or bad, more from the business point of view to judge. Divided into hundreds of groups of customers, can indeed achieve a more detailed understanding of the purpose of each group of customers, but our account manager to take care of it? Can the existing customer management system support the processing of so many customer groups? If not, we should be less divided into several groups. The number of people between groups is sometimes very different, it is possible that the overall customer is indeed some large group of customers behavior is very close, at the same time, some small groups of customers show the same behavior characteristics, the small number of customer groups may be a group of abnormal behavior, for example, the group with the characteristics of fraudulent behavior. If the business deals with relationships (for example, by requiring each account manager to be responsible for roughly equal numbers of customers), the enterprise often requires that the number of people in each group be subdivided more evenly, at which point the similarity of customer characteristics in the same group is compromised.

In addition, because of the power of data mining tools, data mining personnel may be fascinated by a large number of segmentation results, while ignoring the purpose of segmentation, and business personnel may think that these subdivisions are conclusive, can not be adjusted. The best approach should be the close interaction between business people and data mining personnel, identify the segments according to business requirements, and try a variety of adjustments to choose a reasonably appropriate solution and result. For example, if you want to focus on the customer's long-distance call behavior to subdivide, you can select the long-distance-related factors as subdivision variables, and even multiply these variables by a weight factor, more emphasis on their role.

5. Choice of tools

Problem: The expensive nature of data mining tools is well known. The expensive one has millions of yuan to hire two years, the cheap has hundreds of thousands of yuan buys. How to choose?

Explanation: Should according to Enterprise's demand, the budget, the use personnel quality and so on aspect to determine. If you need to build hundreds of models per year, data and model Management is very complex, data mining is expected to benefit very much, users have a good theoretical foundation and application level, you should choose powerful, flexible and efficient mining tools; otherwise, you should consider those features relatively simple, suite-style tool products. Enterprises can pay attention to some consulting agencies launched the Mining Software evaluation report. It is worth mentioning that some of the popular foreign free software, such as ADE-4, Lisp-stat, R, etc., are also gradually recognized by the domestic people and use. The R is a kind of independent programming software, has numerous packages (Packages) to call, its development flexibility is almost as much as SAS such as giant commercial software, but the user has higher requirements.

6. Not "Digging" can solve the problem

Problem: Due to the long-standing lack of quantitative analysis, the business needs of analysts are not classified according to whether it belongs to the category of data mining. For example, enterprises may propose how to optimize their own network resources, how to have a number of random factors in the uncertain system (logistics, factory supply chain, queuing system, etc.) to put forward the optimal operation plan, how to deduce the future market share change and competitive advantage according to the status quo. Can data mining be qualified for these jobs?

Explanation: In academic sense, these are not in the field of data mining, but belong to the field of operational research, discrete event simulation and system dynamics simulation respectively. The application of these technologies is very few in our country at present, data mining personnel should expand their position, push forward the ability of statistic analysis and data modeling to meet the new needs of enterprises. For example, the telecommunications industry often talk about "marketing rehearsal", that is, before the implementation of the marketing plan can predict the results, so that the adjustment of the plan in advance in order to achieve the best results, in fact, is a typical competitive dynamics simulation problem. Such problems need to consider the time factor, consider the positive and negative feedback between the factors, the interaction of various factors to establish a structured model, after verification, for the actual scenario prediction. Because it is a model that runs on a computer, an Enterprise Manager can test any idea on the model without risk, test the effect of the adjustment of various factors on the benefit, test the appropriateness of the reaction to the competitor, and what impact the behavior will have on the market environment.

In a word, data mining, along with other mathematical modeling methods, will play a more and more significant role in the innovation and efficiency of our business. This will depend on the painstaking exploration of enterprise business personnel and data mining personnel and other analyst groups.
Author: Yue Aden


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.