Bi data warehouse data layering

Source: Internet
Author: User

Why layer data warehouses?

    • Use space for time and a large amount of preprocessing to improve the user experience (efficiency) of the application system. Therefore, a large amount of redundant data exists in the data warehouse;
    • If there is no hierarchy, changing the business rules of the Source Business system will affect the entire data cleansing process, resulting in a huge workload.
    • Data cleansing can be simplified through hierarchical data management, because the previous step is divided into multiple steps to complete, which is equivalent to splitting a complex job into multiple simple tasks, A large black box is converted into a white box, and the processing logic of each layer is relatively simple and easy to understand. In this way, we can easily ensure the correctness of each step. when data is wrong, we usually only need to adjust a certain step.

Data Warehouse standards can be divided into four layers: ODS (temporary storage layer), PDW (Data Warehouse layer), mid (data mart layer), and app (Application Layer)

ODS layer:

The temporary storage layer is the temporary storage area of interface data. It prepares for the next step of data processing. Generally, the data on the ODS layer is homogeneous with that on the source system. The main purpose is to simplify subsequent data processing. In terms of data granularity, the data granularity of the ODS layer is the finest. Tables on the ODS layer generally have two types: one for storing the data to be loaded and the other for storing the processed historical data. Historical data must be cleared 3-6 months later to save space. However, different projects should be treated differently. If the data volume of the source system is small, it can be retained for a longer period or even full storage;

PDW layer:

For the data warehouse layer, the data in the PDW layer should be consistent, accurate, and clean, that is, the data after cleaning (removing impurities) the source system data. This layer of data generally follows the third paradigm of the database, and its data granularity is usually the same as that of ODS. On the PDW layer, all historical data in the Bi system is saved, for example, 10 years of data.

Mid layer:

For the data mart layer, this layer of data is subject-oriented to organize data, usually in a star or snowflake structure. In terms of data granularity, this layer of data is mild Summary-level data, and no detailed data exists. In terms of data time span, it is usually part of the PDW layer. The main purpose is to meet the needs of user analysis. From the perspective of analysis, generally, you only need to analyze data from recent years (such as the last three years. In terms of data breadth, it still covers all business data.

APP layer:

For the application layer, this layer of data is built to meet specific analysis needs, and is also a star or snowflake structure data. Data is highly aggregated in terms of data granularity. In terms of data breadth, it does not necessarily cover all business data, but a real subset of mid-layer data. In a sense, it is a duplication of mid-layer data. In extreme cases, a model can be built for each report at the app layer to support it. The standard Tiering of the data warehouse for the purpose of changing the space for time is only a standard of the recommended nature, in actual implementation, the Data Warehouse hierarchy should be determined based on the actual situation. Different types of data may also adopt different layering methods.

--- [Supplement]

Data Cache layer:

The database layer used to store the raw data provided by the interface. The table structure of this layer is basically the same as that of the source data. The data storage time depends on the data size and project conditions. If the data size is large, you can save only recent data and back up historical data. This layer aims at data transfer and backup.

Core data layer:

The data at this layer is integrated to a certain extent on the basis of the data cache layer. It is called a data mart, and the storage is still a relational model. The purpose of this layer is to integrate necessary data to prepare for the next multi-dimensional model.

Analysis Application Layer:

The data at this layer is the multidimensional model data that needs to be constructed based on business analysis. Data can be directly used for analysis and presentation.

Note: Data layers can be tailored to actual project requirements. If the business is relatively simple and independent, the core data layer and the analysis application layer can be merged. In addition, application data can be analyzed from multi-dimensional model data, relational model data, or even raw data.

Bi data warehouse data layering

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.