Common Data Mining Methods

Source: Internet
Author: User

Common Data Mining Methods

Basic Concepts

Data Mining is fromMassive, incomplete, noisy, and fuzzyThe process of extracting potentially useful information and knowledge hidden in the data that people do not know beforehand. Specifically, as a broad application-oriented cross-discipline, data mining integrates mature tools and technologies in many disciplines, includingData Warehouse Technology, statistics, machine learning, model recognition, artificial intelligence, and Neural NetworksAnd so on.

Process Model

For enterprises, data mining is to find the hidden "Knowledge gold block" in the "data mine" to help enterprises reduce unnecessary investment while increasing the return on funds. The most widely used data mining process model is CRISP-DM (Cross-Industry Data Mining Process Standard, cross-industry standard process for data mining ). CRISP-DM divides the entire data mining period into six stages: Business understanding (businessunderstanding), data understanding (dataunderstanding), data preparation (data preparation), modeling (modeling), evaluation (Evaluation) deployment ). CRISP-DM Data Mining Process model such:


Common Methods

Most methods in data mining are not designed to solve a specific problem, and they are not mutually exclusive. It cannot be said that a problem must adopt some method, but nothing else can be done. Generally, there is no such thing as the best method for a specific data analysis topic. before deciding which model or method to choose, try all the models, then select a better one. Different methods may have different advantages and disadvantages in different data environments.

Data mining methods include:Association Analysis,Cluster Analysis,Prediction and time series Mode AnalysisAndDeviation Analysis.

Common and widely used algorithms and models include:

1,Traditional statistical methods: Sampling technology, multivariate statistical analysis, and statistical prediction methods.

2,Visualization Technology: Intuitively expresses data features using charts and other methods.

3,Decision tree: A tree chart is created based on a series of rules. A decision set is represented by a tree structure and can be used for classification and prediction. common algorithms include cart, chaid, ID3, C4.5, and c5.0.

4,Artificial Neural Network: Simulates the neuron function of a human, simulates the biological neural network in the structure, adjusts and computes the data through the input layer, hidden layer, and output layer, and finally obtains the result, it is a non-linear prediction model learned through training. It can complete data mining tasks such as classification, clustering, Feature Mining, and regression analysis.

5,Genetic Algorithm: An optimization technique designed based on the concept of biological evolution based on the theory of natural evolution. It includes a series of processes such as gene combination, crossover, variation, and natural selection, through these processes, an optimization technique is used to simulate the process of gene combination, mutation, and selection.

6,Association Rule Mining Algorithm: An association rule is a rule that describes the relationship between data in the form of "A1, A2... Choose an → B1 then B2 then... ∧ Bn ". Generally, there are two steps: Step 1: Find the frequent data set; Step 2: Use the frequent data set to generate association rules.

7,Nearest Neighbor Technology: This technique identifies new records by combining identified historical records. It can be used for clustering and Deviation Analysis.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.