Talking about the nature of data warehouse and data mining

Source: Internet
Author: User

Data warehouse and data mining are two big concepts. They are very mature in foreign countries. In China, with the accumulation of enterprise data and the maturity of ERP in the past few years, data warehouse and data mining have started.

How to establish a data warehouse and data mining is a problem that deserves constant discussion and optimization, not only in terms of technology, but also in commercial applications. With the continuous introduction of new technologies and concepts, the traditional data warehouse technology and methods have greatly changed, and the Application Based on Data Warehouse has also developed. The data warehouse of each enterprise can adopt a flexible selection method for selection design and implementation based on the characteristics of the enterprise. Based on some new technical features, this article describes the data warehouse and data mining architecture.

Starting from the mature Data Warehouse architecture, establishing EDWEnterprise Data Warehouse is a good choice. EDW is an enterprise data warehouse that provides a complete, non-redundant view of key historical changes to the entire enterprise data. Based on EDW, you can create Data Mart for different topics. Different enterprises in the Data Mart have different requirements, which can be divided into user topics, business flows, and product topics, there can be a lot of theme applications on the basis of EDW. It is very important to establish EDW.

The key to establishing EDW is to grasp the essence of the data warehouse, provide key historical changes information, and truly restore the key historical views of enterprise data. That is to say, there are many DSS layers. Under the DSS layer, the ODS layer is used as the current data view, and the buffer layer is used as the incremental data view, by combining the incremental data view with the data view at the DSS layer and using the slow change dimension or proxy key technology, any historical data changes at the DSS layer can be realized theoretically. However, when implementing the DSS layer, a database that provides high-performance computing and an incremental time frequency that suits the business development are two key points.

The slow change dimension starts from the maintainability of the specific implementation process. a more unified and common method can be used to increase the snapshot start time and snapshot end time, combined with the primary key of the business system, you can complete the key history snapshot view of the DSS layer real enterprise data. In the implementation process, the key is to grasp the three sets of data in the incremental data buffer layer, purely New Data A, and key historical information changed data B, key historical information does not change data C. Snapshot records must be added for set A and Set B, and snapshot records must be updated for set C. This process focuses on high-performance computing and business needs.

On the DSS layer, you can create a data mart based on different database types of EDW solutions. The data mart can basically be created using a star model to facilitate multidimensional analysis.

Mature EDW is used to support commercial applications. An important application on EDW is data mining. it searches for useful information from the massive data of EDW and supports the development of enterprises. This is not specific to Data Mining vendors. It is more important to understand the general concept. Data Mining requires EDW to be able to provide a critical historical view of enterprise data based on the DSS layer, reassemble them into a set of commercial decision-making factors. Combined with some mature data mining algorithms, the massive data processing in EDW becomes the Information Decision source. During data mining, it is important to note that models must be developed based on the business of the enterprise. Any theoretical model that is separated from the business of the enterprise is likely to fail in the specific implementation process.

After talking about the nature of data warehouses and data mining, what are the vendors and technologies that are worth choosing?

As business intelligence and data warehouse mature, more and more vendors are entering this field, and the evaluation also has its own merits. In terms of EDW, Microsoft SQL Sserver can be considered at the low end based on the capacity, computing complexity, and real-time requirements of the Data Warehouse. Currently, SQL Sserver 2005 is significantly enhanced in business intelligence. In the middle end, you can consider the Oracle DW solution. The performance of Sybase IQ is good when the data volume is not very large. If you want to create a super-large DW, you need to consider some high-end professional DW solutions. Currently, the high-end data warehouse in the Share Nothing architecture of TERADATA and IBM processes massive data and complex business computing, real-time data processing is highly scalable, but investment is huge.

Data warehouse and data mining have gradually become clear from the abstract layers a few years ago, and play an increasingly important role in enterprise decision-making and business process optimization, A good data warehouse and data mining solution is the beginning. More importantly, it promotes business thinking from traditional head-hunting to refined marketing and relying on data for decision-making. Business and Technology are two driving factors for mutual promotion. Business is driving, but good technology can also promote the development of business. Only suitable technologies can play a good role in the commercial driving.

  1. Database development
  2. How jingkelon optimizes the system to break the O & M Black Box
  3. CiscoWorks2000 Service Management Solution

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.