Keywordsdata warehouse data mining data warehouse vs data mining
1
Data Mining 1.1 difference between data mining and traditional data analysis
The essential difference between data mining and traditional data analysis, such as query, report and online application analysis, is that data mining is to mine information and find knowledge without clear assumptions. The information obtained from data mining should have three characteristics: previously unknown, effective and practical. That is to say, data mining is to discover information or knowledge that cannot be discovered by intuition, or even counterintuitive information or knowledge. The more unexpected the information mined, the more valuable it may be. The traditional trend of data analysis is to grab the data from large database and use the special computer analysis software. Therefore, data mining is quite different from traditional analysis methods.
1.2 application value of data mining
(1) Classification: first, select the training set that has been classified from the data, use the technology of data mining classification on the training set, establish the classification model, and classify the data without classification.
(2) Estimation: similar to classification, the difference is that classification describes the output of discrete variables, while estimation deals with the output of continuous values; classification is to determine the number, and estimation is uncertain.
(3) Clustering: grouping records. The difference between clustering and classification is that clustering does not depend on pre-defined classes and does not need training sets. China Mobile uses the advanced data mining tool Markway analysis system to cluster the user WAP online behavior, and to conduct accurate marketing through customer clustering.
(4) The discovery of association rules and sequence patterns: association is a kind of connection that occurs when something happens. For example, people who buy beer every day are also likely to buy cigarettes. The proportion can be described by the degree of support and credibility. Unlike Association, sequence is a vertical association. For example: the bank adjusts the interest rate today and the stock market changes tomorrow. (5) Prediction: a model derived from classification or valuation, which is used to predict unknown variables. (6) Deviation detection: the description of a few and extreme special cases of the analysis object to reveal the internal causes. In addition, it is widely used in customer analysis, operation research, optimization of enterprise resources, anomaly detection and management of enterprise analysis model.
(1) A collection of subject oriented data. The data warehouse is organized around topics such as customers, suppliers, products, and sales. Data warehouse focuses on the data modeling and analysis of decision makers, rather than the daily operation and transaction processing of organizations.
(2) Integrated data collection. The data in the data warehouse is processed, summarized and sorted out systematically on the basis of extracting and cleaning the original scattered database data. The inconsistency in the source data must be eliminated to ensure that the information in the data warehouse is the consistent global information about the whole enterprise.
(3) Data set of time variant. Data storage provides information from a historical perspective. Data in data warehouse usually contains historical information, through which we can make quantitative analysis and forecast on the development process and future trend of enterprises.
(4) Non volatile data set. The data of data warehouse is mainly used for enterprise decision-making and analysis. The data operations involved are mainly data query, with few modification and deletion operations, which usually only need regular loading and refreshing. Data in a data warehouse usually only needs two operations: initial loading and data access, so its data is relatively stable with little or no update.
2.2 type of data warehouse
According to the types of data managed by data warehouse and the scope of enterprise problems they solve, data warehouse can be generally divided into the following three types: Enterprise Data Warehouse (EDW), operational database (ODS) and data marts. Enterprise data warehouse is a general data warehouse, which contains not only a large number of detailed data, but also a large number of cumbersome or aggregated data, which is not easy to change and history oriented. This kind of data warehouse is used to make strategic or tactical decisions covering a variety of enterprise fields. Operational database can not only be used to make decision support for working data, but also be used as a transition area when loading data into data warehouse. Compared with EDW, ODS is subject oriented, synthesis oriented and changeable. It only contains current and detailed data, and does not contain accumulated and historical data. Data mart is a part of data which is independent from the data warehouse for a specific application purpose or application scope. It can also be called Department data or subject data. Several data marts can form an EDW.
2.3 comparison between data warehouse and traditional database
The relationship between them is not only related but also different. The emergence of data warehouse is not to replace database. At present, most data warehouses are managed by RDBMS. It can be said that databases and data warehouses complement each other and have their own advantages. The differences between them can be compared from the following aspects:
(1) Different starting points: database is transaction oriented design; data warehouse is subject oriented design.
(2) The data stored is different: the database generally stores online transaction data; the data warehouse generally stores historical data.
(3) Different design rules: database design is to avoid redundancy as much as possible, and generally adopts the rules in line with the normal form
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.