CRISP-DM basis of Data Mining Standard Specification

Source: Internet
Author: User

I. Preface

Every time we talk about data mining, some people come up with ETL, algorithms, and mathematical models. It is a headache for me to implement engineering. In fact, as for data mining, algorithms are only the means of implementation, tools, and implementation. We are not creating algorithms (except for foreign research ), we are only using algorithms. In other words, we are the engineering practitioners of algorithms. Data Mining is nothing more than today. Big Data Mining is not an isolated concept. The essence is to use traditional data mining methods, but its implementation tools have changed, the essence is still there. Introduced the release of nearly 20 years ago CRISP-DM Data Mining standard specification model for everyone to share, hope someone like.

Ii. Framework

Iii. Details

3.1 business understanding)

The initial phase focuses on understanding the project objectives and the needs from the business perspective. At the same time, this knowledge is transformed into the definition of data mining problems and the preliminary plan for achieving the objectives.

3.2 data understanding)

The data understanding stage starts from the initial data collection. Through the processing of some activities, the goal is to familiarize yourself with the data, identify the data quality problems, and discover the internal attributes of the data for the first time, or a subset of interest to form the assumption of Implicit Information.

3.3 Data Preparation)

The data preparation phase includes all the activities for constructing the final dataset from unprocessed data. The data is the input value of the model tool. Tasks in this phase can be executed multiple times without any prescribed sequence. Tasks include selecting tables, records, and attributes, and converting and cleansing data for model tools.

3.4 Modeling)

At this stage, you can select and apply different model technologies, and the model parameters are adjusted to the optimal value. Generally, some technologies can solve the same type of data mining problems. Some technologies have special requirements on data formation, so they often need to jump back to the data preparation stage.

3.5 Evaluation)

At this stage, you have created a high-quality display model from the perspective of data analysis. Before the final deployment of the model, it is important to thoroughly evaluate the model, check the steps for constructing the model, and ensure that the model can fulfill the business objectives. The key purpose of this phase is to determine whether important business problems are not fully considered. After the end of this stage, the decision on the use of a Data Mining result must be fulfilled.

3.6 deployment)

Generally, creating a model is not the end of a project. The role of a model is to find knowledge from the data, and the acquired knowledge needs to be easily re-organized and displayed by the user. At this stage, you can generate a simple report or implement a complex and reproducible data mining process. In many cases, customers rather than data analysts are responsible for the deployment.

Iv. Summary

From the above procedures and specifications, we can see that the algorithm implementation process is actually only part of data mining. To achieve data mining and achieve the goal of data mining, there are still many things we need to do.

  Mochou road without confidant, night secretly from early pedestrians. For data mining technical exchanges, please add: Big Data architect technical exchanges: 347018601

CRISP-DM basis of Data Mining Standard Specification

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.