Implementation method of lightweight Data Warehouse

Source: Internet
Author: User

1 Introduction

Database has become an indispensable part of large software, database is playing a more and more important role in software system, and database design is becoming an important factor affecting software performance and robustness. As the complexity of the software architecture grows higher, developers have to design more tables to store the data they need. The more tables, the more complex the database. Complex databases don't have any problems writing data, but if we want to get the data out, it will be extremely difficult. This is intolerable in some software that responds to high demand.

Data Warehouse is a very active research direction in database field in recent years. The main problem of data Warehouse is to provide decision support through data mining in the massive historical data. The design goal of the Data warehouse is to provide support for the decision makers, but its theme-oriented, integrality and stability provide a way for us to solve the above problems.

2 Problem description

Now take the network management software as an example to illustrate the necessity of building a lightweight data warehouse. The large-scale network management system is very complex, has the management equipment many, the network level complex, the response demand high characteristic. Network management client system can be roughly divided into two parts, management GUI and report system. Network management GUI part of the main function is to monitor and data acquisition equipment. The main function of the report system is to analyze the collected data and provide the detailed report of the network operation, which may include daily report, weekly Report, Monthly return, quarterly report and annual Report form. Each report may also be divided according to rules such as the network hierarchy, the area (which may be geographic or possibly logical). Developers of network management software may also be able to deposit inappropriate data types (such as time) due to performance, language, complexity, and so on, and many developers may prefer to use long data types rather than datetime or string-type data types, because it makes it easier for programmers to manipulate, compare, Performance is a little better). However, the improper data type may give the report system, the database more pressure, causes the performance to deteriorate seriously. The correlation of multiple tables is also one of the main factors that affect database performance. Because sometimes in order to obtain the desired results, we have to choose from a number of tables according to the relevant conditions, and many of the data are increased with the combined series. The following is an example of a table cluster in Figure 1, which is queried with a SQL Server database and the record number and response time tables are as follows:

As you can see from the table above, when the data volume reaches millions, the response time of the bad data type reaches the minute level, the appropriate data type response time is also up to 30 seconds, and this is in the database query tool in the local query, in the actual application of more distributed system, The database and query reports are distributed across different hosts. Considering the transmission of the network and the interface between the database, the response speed will be worse, any good software, this response speed is intolerable. The amount of data collected in network management is huge, the number of packets sent by the millions in the statistics IP layer, the number of bytes, the monitoring of the important or the status information of the core equipment is common. In the case of network management software, the same problem is encountered in software design that requires frequent high density sampling.

3 Data Warehouse

3.1 Data Warehouse Introduction

The Data Warehouse concept began in the 1980s, and was first found in the book "Building Data Warehouses" by the father of the Data Warehouse, William H.inmon. With the extensive research on data warehouse in these years, the concept of data warehouse is becoming clearer. Data Warehouse is a subject-oriented, integrated, time-dependent, and not-modifiable set of business management and decision making, which is used to support the decision-making process in the operation. Compared with the traditional online transaction processing system, the data in the Data Warehouse has the following characteristics:

3.1.1 Theme-oriented

Theme-oriented is one of the most important features of data warehousing. The traditional data is application oriented, data and application are closely connected, and data Warehouse is a topic-oriented, the subject is the standard of classifying data at a higher level. The subject-based data is logically not intersecting.

Integration of 3.1.2

The data in many enterprises today is decentralized rather than integrated. The main reason of this kind of dispersion is the dispersibility of transaction processing, the inconsistency of data, the external data and the unstructured data. The data in the data warehouse is derived from these existing business systems or management information systems, which are independent of each other in terms of data dictionaries, coding rules, naming methods and keywords, and even conflicting. Before entering the Data warehouse, it is necessary to reorganize, transform, clean and integrate these data reasonably, so that the original data structure will be applied to the theme-oriented transformation to meet the requirements of the data Warehouse theme-oriented.

3.1.3 Stability

Data Warehouse data is mainly for information analysis and management decision-making services, the need for long-term accumulation, usually through a large number of raw data after cleaning, processing and integration into the data warehouse, so fundamentally or rarely modified, with stability.

3.1.4 emphasizes time series

The general application of the data can not contain the time factor, but only to represent the current situation, is the enterprise's current situation of the instantaneous image. The data in the Data warehouse is the instantaneous image of the enterprise at each time point, it is the dynamic process with the time variable, so that we can discover and excavate the internal law of the transaction in the dynamic change process, and provide the support for the decision.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.