BI architecture and data warehouse design
(14:26:41)
BI architecture and data warehouse design
A data warehouse solution is often related to a large Bi strategy. When you plan to use the fast tracing architecture, the most important thing is to understand the role of the system-related big-picture architecture in Bi policies. This section reviews the role of the data warehouse in the enterprise environment and the Best Practices for Database Design in the environment.
BiAnd Data Warehouse Architecture Overview
1 indicates the architecture of a typical Bi environment, including the ETL process, temporary data storage in the segmented environment, data warehouse or data set,CubeBody, layer report description, dashboard, and analysis opinion.
A small solution where some components can be distributed on one physical machine at the same time, but a better architecture chooses to distribute components on different servers.
Figure 3 shows different physical servers that constitute the Bi environment.
Figure3:The hardware architecture of multiple server systems that make up the Bi environment to execute different functions
This environment represents a type of Scalable Data Warehouse SMP model, in which the data warehouse is concentrated on a separate server. The goal of the quick tracing reference architecture is the data warehouse server and the storage component based on the SMP system.
The difference between a data warehouse and a data set lies in the data size. A data warehouse contains the data of the entire company. The difference is that a data set processes data of a department or subject. Datasets are often used to create a central data warehouse or a company to operate on data storage (ODS ). Many enterprise environments have multiple datasets. You can create a dataset without a data warehouse, or a data warehouse without a data warehouse.
You can quickly track reference architectures in data warehouses or data sets. Although datasets are often smaller than data warehouses, they can have several GB of data and need the same adjustment methods and architecture as data warehouses.
Data warehouse design
The term data warehouse is often used to store historical commercial data. But in fact, it does not provide design principles. The most common mistakes in data warehouses are as follows:
- Offline backup of a replicated or transactional system is not a data warehouse. Copying these databases does not meet the requirements of history tracking or efficiency reports. A system designed for the transaction processing process is often faced with complex situations and performance challenges, and does not store data history.
- A database with a flat, text description, and multi-column combination of normalized tables is not a data warehouse. When such a system queries a table or combines the amount of records with the space occupied by the table, overflow may occur, leading to performance problems. Of course, there is a scope of the data warehouse design model, including various data standardization practices and changes. However, the goal is the same: optimize data reports and analysis results while tracking historical data.
Data Warehouses are often used3D ModelingThe technology is designed to effectively process performance and historical objectives. 3D modeling focuses on the optimization of the report structure (Attribute), And then separate them in the Digital Indicator analysis.
A 3D model uses two major types of tables: a 3D table and a fact table.
- A 3D table is related to some attributes of a commercial entity, such as products, storage, invoices, or dates. When modifying 3D data, you sometimes need to keep the history and add a new primary key, called the proxy primary key. A 3D table often has many columns, but the record data is usually much smaller than the second table type.
- A fact table contains an analytical standard called a measurement or fact. To be attribute-related, a fact table also has a foreign key associated with a three-dimensional table (proxy primary key. Fact tables are organized based on the measurement or transaction type, such as sales, account balance, or events. Fact tables often have millions or even billions of rows of data, but they are ideal for reporting because they do not include invalid columns.
Figure 4 shows an example of the two fact tables: store sales and storage. Fact tables in the middle of the graph share the same three-dimensional relationship, such as stroe, data, and buyer. Because the dimensions must be consistent, this design allows both analysis and query of sales records and storage quantities.
Figure4:An example of two fact tables and their related 3D tables
Like a 3D table, fact tables also track history, such as historical sales, historical storage, historical account balance, or historical events. For example, a historical storage fact table can track the historical data recorded by week for each warehouse product. Based on this information, the user can discover the trend and analyze the storage conditions during this period, which cannot be obtained in the source transaction system.
The quick tracking data warehouse reference architecture describes indexes and partitions in the "select and implement the correct fast Tracing System" section, which focuses on the tables that represent the data warehouse design.