Misunderstanding of Data Mining

Source: Internet
Author: User

For ordinary people, data mining may be a mysterious process. When inexperienced enterprises implement data mining projects, incorrect understanding often becomes an important obstacle for successful project development. Therefore, timely correction of these errors has become an important task before project implementation.

 

All data mining content is aboutAlgorithm

When it comes to algorithms, we will think of building models through historical data. data mining algorithms are a mechanism for creating mining models and are decisive for the final output results of the mining. With the emergence of new technologies in data mining and the maturity and perfection of commercial data mining products. For the same business problem, there are usually multiple algorithms available in the product, and choosing the right algorithm for a specific task is very challenging.

You can use different algorithms to execute the same business task. Each algorithm generates different results. In addition, algorithms can be used in combination. In a Data Mining solution, some algorithms can be used to analyze data, while other algorithms can be used to predict specific results based on the data. For example, you can use a clustering analysis algorithm to identify the pattern, divide the data into several similar groups, and then use the grouping results to create a better decision-making model.

You can also use multiple algorithms in a solution to execute different tasks. For example, you can use a regression tree algorithm to obtain financial prediction information and a rule-based algorithm to perform Market Basket Analysis.

From this we can see that in data mining projects, after clarifying the mining objectives and understanding the characteristics of various algorithms, how to correctly select and use algorithms and obtain the expected results is the key link.

In the process of data mining project implementation, the industry has a recognized methodology CRISP-DM (cross industry standard process for data Ming, Cross Industry Data Mining Standard Process), from the name can be seen, this model is commonly defined and can be used in different industries to solve business problems.

CRISP-DM Process Model consists of six steps, covering the entire process of data mining, which are: Business understanding, data understanding, data preparation, modeling, model evaluation, model deployment.

In these six steps, the process of applying data mining algorithms is mainly concentrated in the modeling stage. Obviously, algorithms are not all about data mining. The preparation of the data used for modeling largely determines the success or failure of the data mining project.

Therefore, in a successful data mining project, 60%-80% of the time is concentrated in the business understanding, data understanding, and data preparation stages. In addition, in data mining projects, the close combination of data mining algorithms and actual services is particularly emphasized. Otherwise, there may be "garbage in garbage out" in data mining).

 

In a data mining project, the only criterion for verifying a model is prediction accuracy.

The prediction accuracy of a model is an important indicator to check whether the model is good or bad, but it is not the only indicator. A good data mining model needs to be evaluated in many aspects before it is put into practical application, so as to determine that it fully meets the business objectives. There are many indicators to evaluate the merits of the data mining model, such as accuracy, lift, Roc, and gain graphs.

Accuracy is the most basic and simple indicator. However, to allow users to accept the results of a model, it is not enough to rely solely on these evaluation indicators. We also need to further elaborate on the availability of the model results, that is, the business value that the data mining model can bring. This is actually the interpretability of the data mining model. In actual data mining projects, the interpretability of models is often more important than the evaluation indicators.

When evaluating a model, the evaluation criteria should be taken into account, as well as the criteria for business goals and business success. The one-sided pursuit of predictive accuracy ignores the original intention of data mining. We do not mine to build a perfect mathematical model, but to solve practical business problems. Therefore, the interpretability and practicability of the mining results are the most fundamental criteria. For example, in solving the customer churn problem, the more customers are captured by the prediction model, the less likely it is to help retain more customers. The key lies in how much help the prediction results can help to retain marketing activities.

 

Data mining requires a data warehouse

In terms of definition, data mining is also known as the knowledge discovery in Database (KDD ), it is an extraordinary process of getting effective, novel, potentially useful, and ultimately understandable patterns from a large amount of data. Simply put, data mining is to extract or "mine" knowledge from a large amount of data.

A good data source is an important guarantee for the success of Data Mining. Therefore, data mining requires a data mart. Generally, data warehouses are mainly built for decision-making support systems, data may lose some useful information for data mining in the ETL process.

This is especially true in the data validation phase. When a dataset is matched and duplicate exceptions are found, some records are deleted or multiple records are merged into a record with more complete information, for data mining, it is likely that the hidden information is lost. Repeated records may be of no use to decision-making support systems, but data mining may be an important source for discovering implicit patterns.

 

Data mining should be completed by technical experts

Data Mining is a business application process that uses a large amount of data to discover regular rules and apply them in enterprise business activities to generate commercial value. It is composed of multiple elements.

Among them, a very important part is to have high-quality data mining personnel, including: People who know data, such as database administrators, who are very clear about the data storage location; people who understand the business can ask questions in a timely manner, assist analysts in turning business problems into data mining problems, and understand the data mining results, it can also transform data mining results into actual business operations of enterprises to create value. Analysts need to understand the algorithms and functions of data mining and be proficient in using related data mining software products, it can work with business personnel to convert business problems into data mining problems and solve data mining problems.

Therefore, successful data mining projects are jointly completed by business experts and technical experts. Excellent data mining tools should help business experts to participate in data mining projects. Only by integrating business knowledge into data mining projects can the data mining results truly serve commercial applications.

 

Massive Data

In the data mining process, the originally set business goals are easily drowned in massive data. During the project, the business problems to be solved should always be clarified to ensure the final completion of the project results. If you simply start to analyze a bunch of data without a project plan, it will be easy to get lost in the data and a waste of time.

Do not let projects be solely driven by a large amount of data and focus on business goals. You may not need to use all the data in the system. You only need to use project-related data.

 

Ensure successful implementation of Data Mining Projects

Corrected the misunderstanding. How can we implement it next? The CRISP-DM methodology mentioned above is a good method.

Beginning with the end

To get the expected ROI at the end of the project, you should have determined the criteria for evaluating the final results before the project starts (for example, what business assessment indicators are used, ).

Set expectations

Ensure that project investors understand that data mining is not a magic tool to solve business problems. Data Mining is a way to solve business problems with the help of computer technology. Just like any business problem, investors need to first propose the problem that can be solved, and then find a solution.

For example, if you plan to segment customers for the company's marketing department, you should work with colleagues in the marketing department to determine what results are ultimately expected. For example: "We use product information and demographic data, therefore, we hope to be subdivided based on the customer's income, age, and other information, so that we can display the product preferences of customers at different levels ".

Limit the scope of the initial project

Start with a realistic goal and calendar. When you succeed, move to a more complex project. For example, if you try to increase the value of a new customer immediately, you do not have to concentrate on smaller and more practical targets, such as cross-selling a region, and the customer keeps the project.

Ensure teamwork

A Data Mining Project is a team work. Data mining requires commercial users to understand actual problems and data, data analysts to provide analysis solutions, and database managers to provide permissions. They often come from different departments and have different interests. Therefore, it is very important to find feasible cooperation methods.

Avoid data Spam

in the project, always identify the business problems to be solved and ensure that the project results are completed. If you simply start to analyze a bunch of data without a project plan, you will easily get lost in the data and waste time. Do not let projects be solely driven by a large amount of data and focus on business goals. You may not need to use all the data in the system. You only need to use project-related data. You may even find that the existing data cannot solve real business problems. Even massive data cannot guarantee that you have accurate data for modeling. For example, using the latest information to predict customer behavior is often more accurate than using a large amount of historical data.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.