Data mining, data warehousing, and OLAP relationships [favorites]

Source: Internet
Author: User

Reprinted from: http://blog.csdn.net/zdhsnail/archive/2008/02/21/2111248.aspx

If data warehousing is used as a mining pit, data mining is used to mine the pit. After all, data mining is not an out-of-the-box magic, nor an alchemy. If it is not enough to enrich the complete data, it is hard to expect data mining to dig out meaningful information.

To convert large data into useful information, you must collect information efficiently first. With the advancement of science and technology, a fully functional database system has become the best data collection tool. "Data warehousing" is simply to collect useful data from other systems and store it in an integrated storage zone. Therefore, it is actually a relational database with a very large capacity that is processed and integrated to store the data required by the design support system for decision-making support or data analysis. From the information technology perspective, the goal of data warehousing is to deliver the correct data to the right person at the right time in the Organization.

Many people often confuse Data Warehousing and Data Mining and do not know how to distinguish them. In fact, data warehousing is a new topic of database technology. With the increasing popularity of data technology, we can use computer systems to help us operate, compute, and think, change the way jobs are made, and change the way decisions are made.

Data warehousing itself is a very large database, which stores the data integrated from the Organization's job database, especially from the online transaction system OLTP (on-line transactional processing) the obtained data. Place the integrated data in the data warehouse, and the company's decision makers use the data for decision-making. However, this process of data conversion and integration, is the biggest challenge for building a data warehouse. This is because converting the data in the job into useful strategic information is the focus of the entire data warehouse. To sum up, data warehousing should have such data: integrated data, detailed and summarized data (detailed and summarized data), historical data, and interpreted data. Mining useful information and knowledge from data warehouse is the biggest purpose of establishing data warehouse and using data mining. The essence and process of both are two things. In other words, data warehousing should be established first, and data mining can be carried out efficiently, because the data contained in the Data Warehousing itself is clean (No erroneous data involved) and complete, and integrated. Therefore, the relationship between the two may be interpreted as "Data Mining is a process and technology for finding useful information from Massive Data Warehouses 」.

The so-called OLAP (Online Analytical Process) refers to the online query and analysis linked by the database. Program . Some may say, "I already have OLAP tools, so I don't need data mining .」 In fact, the two are completely different. The main difference is that data mining is used to generate assumptions, while OLAP is used to verify assumptions. In short, OLAP is dominated by users. Users first have some assumptions and then use OLAP to verify whether the assumptions are true. Data Mining is used to help users generate assumptions. Therefore, when using OLAP or other query tools, users are doing Exploration on their own, but data mining is using tools to help with exploration.
For example, a market analyst may assume that baby diapers and baby milk powder are often purchased together when planning product racks for supermarkets, then we can use OLAP tools to verify whether this assumption is true, and the evidence is obvious, but data mining is not, after the data mining operator sorts out a large amount of checkout data, it does not need to assume or expect possible results. The mining technology can be used to find potential rules in the data, as a result, we may find that diapers and beer are often purchased at the same time. This is something OLAP cannot do.

Data mining can often mine relationships beyond the scope of induction, but OLAP can only use manual queries and visualized reports to confirm certain relationships, data Mining is a feature that automatically identifies data patterns and relationships that are not even suspected. In fact, it has exceeded the limitations of our experience, education, and imagination. OLAP can complement data mining, however, this feature cannot be replaced by OLAP.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.