ETL Model Design

Source: Internet
Author: User

The traditional relational database generally uses the form of two-dimensional tables to represent data, one dimension is a row, the other is a column, and the intersection of rows and columns is the data element. Relational data is based on the relational database model, which is implemented through the standard SQL language.

Data Warehouse is a multidimensional database, it extends the relational database model, the star-shaped architecture as the main structure, and on its basis, the expansion of the theoretical snowflake structure and data constellations, and so on, but no matter what kind of architecture, dimension tables, fact tables and fact table metrics are essential components.

    • Dimension : Is the structural property of a cube. These are the organized hierarchies (levels) of the classification used to describe the data in the fact table. These classifications and levels describe a number of similar member collections, which users will analyze based on these member collections.
    • measure : In a cube, a measure is a set of values that is based on a column in a cube's fact table and is usually a number. In addition, the measure is the central value of the cube being parsed. That is, the measure is the digital data (such as sales, gross profit, cost) that the end user focuses on when they browse the cube. The measure you select depends on the type of information the end user is requesting. Some common metrics are sales, cost, expenditures, and production count, etc.
    • fact table: a result table that is generated from a dimension after data aggregation.

1) Star-shaped model
Star model is the most common implementation mode of data warehouse design structure, which makes the Data warehouse form an integrated system, providing the end user with reporting service and providing the user with the Analysis service object. Star mode supports a variety of decision queries by using a fact table that contains a topic and a number of dimension tables that contain a non-normalized description of the facts. The star model can adopt the relational database structure, the core of the model is the fact table, the dimension table around the fact table. The various dimension tables are connected through the fact table, and each dimension table is connected to the central fact table. Objects in a dimension table are associated with objects in another dimension table through fact tables so that you can establish a connection between each dimension table object. Each dimension table is connected by a primary key to the fact table.
The fact table primarily contains data that describes specific business events, that is, measures for certain business events. In general, the data in the fact table is not allowed to be modified, the new data is simply added to the fact table, and the dimension table mainly contains the characteristic data stored in the fact table. Each dimension table uses the Dimension keyword to constrain a row in the fact table by a foreign key in the fact table, and to correlate the fact table, which requires that the foreign key in the fact table cannot be empty, which is different from the foreign key allowed in the general database. This structure makes it easy for users to start with data analysis in a dimension table and obtain dimension keywords to connect to the central fact table for querying, thus reducing the amount of data scanned in the fact table to improve query performance.
There are two main reasons for using star mode: To improve the efficiency of queries. The advantage of using star pattern design Data Warehouse is that the data organization has been preprocessed, the main data are in the large fact table, so as long as scan the fact table can be queried, without having to join a large number of tables, query access efficiency is high, and because the dimension table is generally small, even can be placed in the cache, When connected with the fact table, it is faster and easier for users to understand; For non-computer professional users, the star mode is more intuitive, by analyzing the star pattern, it is easy to combine a variety of queries.
2) Snowflake Model
The snowflake model is an extension of the star model, and each dimension can be connected to more than one detail category table. In this mode, the dimension table, in addition to the function of the dimension table in the star model, also joins the detailed category table which is detailed to the fact table, and the detailed category table achieves the purpose of reducing the fact table and improving the query efficiency by the detailed description of the fact table on the relevant dimension.
The Snowflake model is further standardized on the dimension table of Star model, and the dimension table in star-shaped model is normalized. The dimension table of a snowflake model stores normalized data, which improves query performance by combining multiple smaller standardized tables, rather than large, nonstandard tables in a star model. Because of the low granularity of standardization and dimension, the snowflake model improves the flexibility of data Warehouse application.
These connections take a considerable amount of time. In general, a snowflake chart is less efficient than a star chart.
3) Constellation Mode
A complex business intelligence application tends to hold multiple fact tables in the Data warehouse, when multiple fact tables share one or more dimension tables, which is the fact constellation, also known as the Galaxy schema.
4) Data mart
A data mart is a term that is often used when building a data warehouse. If the Data Warehouse is enterprise-wide and collects information about the entire organization's topics, such as customers, goods, sales, assets, and people, then the data mart is a subset of enterprise-wide data, such as information that contains only sales topics, so that the data mart is only useful for specific users, Its scope is limited to the selected topic.
A data mart is a division (or a topic) that is separated from the Data Warehouse for a department in an enterprise, which can be either logical or physical.
The Data Warehouse stores the overall information of the enterprise, and the data mart stores only the information needed by a topic, which aims to reduce the amount of data processing and make the use of information faster and more flexible.
Because the Data Warehouse is enterprise-wide and can model multiple related topics, it is generally used in Galaxy mode when designing its data composition.

ETL Model Design

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.