Common Data Mining Methods

Source: Internet
Author: User
Common methods of data mining basic concepts data mining is to extract hidden, unknown, and the process of potentially useful information and knowledge.

Common methods of data mining basic concepts data mining is to extract hidden, unknown, and the process of potentially useful information and knowledge.

Common Data Mining Methods

Basic Concepts

Data Mining extracts hidden information and knowledge from a large number of incomplete, noisy, and fuzzy data that people do not know beforehand, but is potentially useful. process. Specifically, as a broad application-oriented cross-discipline, data mining integrates mature tools and technologies in many disciplines, including data warehouse technology, statistics, machine learning, model recognition, artificial intelligence, and neural networks.

Process Model

For enterprises, data mining is to find the hidden "Knowledge gold block" in the "data mine" to help enterprises reduce unnecessary investment while increasing the return on funds. The most widely used Data Mining Process model is CRISP-DM (Cross-Industry Data Mining Process Standard, Cross-industry standard Process for Data Mining ). CRISP-DM divides the entire Data mining period into six stages: Business understanding (BusinessUnderstanding), Data understanding (DataUnderstanding), Data preparation (Data preparation), Modeling (Modeling), Evaluation (Evaluation) deployment ). CRISP-DM Data Mining Process model such:


Common Methods

Most methods in data mining are not designed to solve a specific problem, and they are not mutually exclusive. It cannot be said that a problem must adopt some method, but nothing else can be done. In general, there is no such thing as the best method for a specific data analysis topic. before deciding which model or method to choose, try the Hong Kong virtual host and various models, then select a better one. Different methods may have different advantages and disadvantages in different data environments.

Data mining methods include association analysis, clustering analysis, prediction, time series mode analysis, and Deviation Analysis.

Common and widely used algorithms and models include:

1. traditional statistical methods: sampling technology, multivariate statistical analysis, and statistical prediction methods.

2. Visualization Technology: use charts and other methods to intuitively express data features.

3. Decision Tree: Create a tree chart using a series of rules to represent decision sets. It can be used for classification and prediction, common algorithms include CART, CHAID, ID3, C4.5, and C5.0.

4. Artificial Neural Network: simulates the neural function of a human being. It imitates the biological neural network in the structure, adjusts and computes the data through the input layer, hidden layer, and output layer, and finally obtains the result, A virtual host is a non-linear prediction model learned through training. It can complete various data mining tasks such as classification, clustering, Feature Mining, and regression analysis.

5. Genetic Algorithms: an optimization technology based on the concept of biological evolution, Hong Kong Space, it includes a series of processes such as gene combination, crossover, mutation, and natural selection. Through these processes, an optimization technology is used to simulate the process of gene combination, mutation, and selection.

6. Association Rule Mining Algorithm: Association Rules are rules that describe the relationship between data in the form of "A1, A2... Choose An → B1 then B2 then... ∧ Bn ". Generally, there are two steps: Step 1: Find the frequent data set; Step 2: Use the frequent data set to generate association rules.

7. Nearest Neighbor technique: this technique identifies new records by combining identified historical records. It can be used for clustering and Deviation Analysis.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.