General steps of Data Mining

Source: Internet
Author: User
Tags knowledge base

General steps of Data Mining

From the perspective of data itself, data mining usually requires eight steps: information collection, data integration, data conventions, data cleaning, data transformation, data mining implementation process, model evaluation, and knowledge representation.

STEP (1) Information Collection: Abstract The feature information required in data analysis based on the determined data analysis object, and select an appropriate information collection method to save the collected information to the database. For massive data, it is vital to select a suitable data storage and management data warehouse.

STEP (2) Data Integration: logically or physically centralize data of different sources, formats, and characteristics to provide comprehensive data sharing for enterprises.

Step (3) Data Conventions: if a majority of data mining algorithms are executed, even a small amount of data takes a long time, and the data size is usually very large when doing business operation data mining. The data protocol technology can be used to obtain the representation of a dataset. It is much smaller, but is still close to maintaining the integrity of the original data, the data mining results after the Statute are the same or almost identical as those before the statute.

Step (4) data cleanup: some data in the database is incomplete (some interested attributes lack attribute values) and contain noise (including incorrect attribute values ), and it is inconsistent (the same information has different Representation Methods). Therefore, you need to clean up the data and store the complete, correct, and consistent data information into the data warehouse. Otherwise, the mining results will be unsatisfactory.

STEP (5) Data Transformation: convert data into a form suitable for data mining by means of smooth aggregation, data generalization, and standardization. For some real-number data, it is also an important step to convert the data through conceptual hierarchy and data discretization.

STEP (6) Data Mining Process: select an appropriate analysis tool based on the data information in the data warehouse, and apply statistical methods, case-based reasoning, decision trees, rule-based reasoning, and fuzzy sets, even Neural Networks and genetic algorithms are used to process information and obtain useful analysis information.

STEP (7) model evaluation: from the business perspective, industry experts are responsible for verifying the correctness of data mining results.

STEP (8) Knowledge Representation: visualize the analysis information obtained from data mining to the user, or store the new knowledge in the knowledge base for other applications.

The data mining process is a repeating process. If each step fails to meet the expected goal, you need to go back to the previous step and re-adjust and execute it. Not every data mining task requires every step listed here. For example, if multiple data sources do not exist in a job, step (2) can be omitted.

Step (3) Data conventions, step (4) data cleaning, and step (5) data transformation and co-called data preprocessing. In data mining, at least 60% of the cost may be spent in step (1) information collection, and at least 60% of the energy and time are spent in data preprocessing.

 

This article is excerpted from new Internet: Big Data Mining

Tan lei

Published by Electronic Industry Publishing House

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.