Introduction to Data mining technology

Source: Internet
Author: User

Absrtact: Data mining is a new and important research field at present. This paper introduces the concept, purpose, common methods, data mining process and evaluation method of data mining software. This paper introduces and forecasts the problems faced in the field of data mining.

Keywords: Data Mining data collection

1. Introduction

Data Mining (Mining) is a process of extracting information and knowledge that is implied in it from a large, incomplete, noisy, fuzzy, random data, but is potentially useful in advance. With the rapid development of information technology, the amount of data accumulated by people is increasing rapidly, so it is urgent to extract useful knowledge from massive data. Data mining is a kind of data processing technology which is developed in order to conform to this need. is a critical step in knowledge discovery (Knowledge Discovery in Database).

2. The task of data mining

The main tasks of data mining are relevance analysis, clustering analysis, classification, prediction, timing pattern and deviation analysis.

⑴ Association Analysis (Association)

Association rules mining is first proposed by Rakesh Apwal and others. There is a certain regularity between the values of two or more than two variables, which is called an association. Data Association is a kind of important and discovered knowledge which exists in the database. Associations are divided into simple associations, sequential associations and causal associations. The purpose of association analysis is to find out the hidden network of links in the database. In general, the correlation rules are measured by the two thresholds of support and credibility, and some parameters such as interest degree and correlation are introduced to make the mining rules more accord with the requirement.

⑵ cluster analysis (clustering)

Clustering is based on the similarity of the data into several categories, the same type of data similar to each other, not homogeneous data. Cluster analysis can establish the macroscopic concept, discover the data distribution pattern, and the relationship between the possible data attributes.

⑶ Classification (classification)

Classification is to find a category of conceptual description, which represents the overall information of such data, that is, the description of the class, and use this description to construct the model, generally in the rule or decision tree model. Classification is the use of training data sets through a certain algorithm to obtain classification rules. Classifications can be used for rule description and prediction.

⑷ Prediction (predication)

The prediction is to use historical data to find the change law, to establish the model, and to predict the kinds and characteristics of future data. Predictions are concerned with precision and uncertainty, usually measured by predictive variance.

⑸ Sequential mode (time-series pattern)

The time series pattern is a pattern of high repetition probability which is searched by the time-series. As with regression, it also predicts future values with known data, but the difference between the data is the time at which the variable is.

⑹ deviation Analysis (deviation)

There are a lot of useful knowledge in the deviation, there are many anomalies in the data in the database, it is very important to find the abnormal situation of the data in the database. The basic method of deviation test is to find the difference between the observation result and the reference.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.