With the gradual improvement of the Business Intelligence (BI) system of a provincial mobile company, it has stimulated the vigorous application requirements of local branches, specifically, the requirements for data granularity are more refined, the requirements are more flexible, and the requirements are more operable. In order to make the provincial mobile company's Business Analysis System
To play a greater role in daily production and operation, the provincial mobile company decided to build a "data mart" platform that better suits local characteristics for local companies.
As a subset of the provincial operation analysis data warehouse, this platform ensures data consistency with the Provincial Operation Analysis System
The application of the camp analysis system extends to the key links of market operation, fully supporting the daily production and operation of each department of the municipal branch. "Data mart" allows you to centralize targeted, timely, flexible, and refined data analysis.
Analysis, can quickly guide market operation and production activities, greatly promote the refined operation of local branches, more in line with regional characteristics of marketing analysis and management. At the same time, it further extends and improves the provincial mobile
Value chain of the analysis system.
Two flexible options
In the previous case, we mentioned data mart, data warehouse, and business analysis system. Data Warehouse
Warehouse is a collection of data collected from multiple data sources and stored in a consistent manner. One of the founders of data warehouse, inmon, is defined as: data warehouse is
A collection of theme, integrated, stable, and historical data. It is used to support decision making in management. When constructing a data warehouse, the data must be cleaned, extracted, converted, integrated, and loaded.
. To meet different needs, clean the data to ensure the correctness of the Data. Then, extract and convert the data into the form required by the data warehouse, and load the data to the data warehouse.
A mobile company's business analysis system can be understood as a data warehouse. Data Warehouse is usually an enterprise-level application, so the scope and investment involved are huge, making some enterprises unable to afford it. Therefore, they hope to build a customized data warehouse subset for their own applications in the key departments that are most needed. This demand makes the data market emerge.
Data mart focuses on the selected topics and is a Department-wide product. In mobile companies, group and provincial companies use data warehouses, while cities use data marketplaces for regionalized and personalized data analysis.
In the telecom industry, data warehouses are usually built first and then data markets are built, that is, top-down approaches are adopted. However, this is not the only method.Sybase
Lu Dongming, the technical director of the company, told reporters that American Express once set up a data mart for the credit card fraud protection system, because other applications have been linked to the data mart, later, the data mart was developed into an enterprise-level data warehouse with 50 TB data. The company adopts a bottom-up approach, that is, creating a data mart first and then developing a data warehouse.
There is always a debate over the top-down and bottom-up methods. Here we will mention two people-Ralph Kimball and Bill inmon, who have been the innovator in the Business Intelligence field and developed and tested new technologies and architectures, I have also written many books on data warehouses.
Inmon believes that data in data warehouses should be organized around topics, such as customers, suppliers, and products. Each topic
The region only contains information about the topic. A topic should be added to a data warehouse at a time, and when it is easy to access multiple topics, a data mart from the data warehouse is created. In other words, a specific data
All data in the marketplace should come from topic-oriented data storage. Inmon
Methods include more of the above work to reduce the initial access to information. However, he believes that this centralized architecture will continue to provide greater consistency and flexibility, and in the long run, it will truly save
Resources and work.
Kimball's point is:
"A data warehouse is only a combination of data marketplaces ". He believes that "data warehouses can be built progressively through a series of data marketplaces with the same dimension ". Each data mart is associated with multiple data sources to meet specific requirements.
Business needs. By using the "consistent" dimension, you can view the information in different data marketplaces, which means that they have a public defined element. Kimball
The method will provide integrated data to answer the business problems that enterprises urgently need to solve, and it will be faster than the inmon method. Inmon
The method is to create a data mart only after several single-topic areas are built in a centralized data warehouse. While
Kimball believes that this method lacks flexibility and takes too long in the current business environment.
In fact, the method to choose depends on the main commercial drive of the project. If enterprises are enduring bad data management and inconsistent numbers
Data, or hope to lay a good foundation for the future, then the inmon method will be better. Kimball
. Once the urgent information needs are met, a data architecture transformation plan containing an independent data warehouse should be considered. The data warehouse will enable the data mart and legacy systems and
The OLTP system is isolated and supports faster creation of future data marketplaces.
Zhang Jian, Senior solution designer of AsiaInfo technology, said: "In the actual project construction process, enterprises will not completely follow the absolute top-down or bottom-up approach. "
Liu Qing, a contributor to this newspaper, said that most of the ideas of mature foreign data warehouse manufacturers are based on top-down and adopt
Bill
The inmon method first establishes a perfect set of enterprise data warehouses. Generally, they have designed abstract conceptual models for the industry and can generate logical and physical models according to the actual environment.
When building a perfect data warehouse, designers will consider which applications will ultimately be available and make trade-offs based on the applications.
Independent or dependent
Lu Dongming said: "In the vendor's product line, the data warehouse and data mart are not distinguished, because the basic technologies of the two are the same. You cannot simply divide data warehouses and data marketplaces by capacity. The capacity of some foreign data marketplaces has reached 20 TB, which is larger than that of many data warehouses ."
Data mart can be divided into two types: independent and dependent. In an independent dataset, data comes from one or moreOperating System
Or external information provider, or data from a specific department or region. The data in the dependent data mart comes directly from the enterprise data warehouse.
Liu Qing believes that at present, there are many dependent data marketplaces in China.
The difference between the two lies in whether they come from a central data warehouse. Because the data warehouse projects in China are mostly promoted by foreign manufacturers in the initial stage, it must be irrational. Most of them begin with the concept of data warehouse.
Combine scattered and heterogeneous data. Independent data mart is a small data warehouse built in some or its own regions because customers have analysis needs and need data support for analysis. Number of independent instances
The data mart is a service-specific analysis application.
For applications, the independent and dependent data mart should be similar.
Because currently, the modeling methods for data mart are basically the same: the topic-oriented star schema is used to serve an analysis application. The main difference between the two lies in scalability.
Inconsistent data. Because it may be encoded locally, it is difficult to share data between them when multiple independent data marketplaces exist. The dependency-dependent data mart is based on the central data
The uniform encoding method of the repository, which can be shared.
Data Structure
The structure of data in a dataset is generally described as a star or Snowflake.
Structure. A star structure contains two basic parts: a fact table and various dimension tables. Fact tables describe the most intensive data in the data mart. In telephone companies, call data is the most intensive data;
In a bank, data related to account reconciliation and ATMs is the most intensive data. Sales and inventory data is the most intensive data for the retail industry. (CCW-cnw)