"Bi thing-the art of data" understanding Dimension Data Warehouse-fact table, dimension table, aggregation table

Source: Internet
Author: User

Fact table

In a multidimensional data warehouse, a table that holds a detailed value or fact for a measure is called a fact table. A fact table that stores sales and sales by state, product, and month has 5 columns that are conceptually similar to the following example.

Sate

Product

Mouth

Units

Dollars

WA

Mountain-100

January

3

7.95

WA

Cable Lock

January

4

7.32

OR

Mountain-100

January

3

7.95

OR

Cable Lock

January

4

7.32

WA

Mountain-100

February

16

42.40

In the sample data rows for these fact tables, the first 3 columns-state, product, and month-are key columns. The remaining two columns-sales and sales-are measures. Each column in the fact table is typically either a key-value column or a measure column, but it may also contain columns for other reference purposes-such as a purchase order number or a issuing ticket.

In the fact table, each measure has a column. Different fact tables will have different measures. A sales data warehouse may contain these two measure columns: Sales and sales. A field information Data Warehouse may contain 3 measure columns: Total, number of minutes, and number of defects. When you create a report, you can think of a measure as an extra dimension. That is, you can use sales and sales as side-by-side column headings, or you can use them as row headings. In the fact table, however, each measure is displayed as a separate column.

The fact table data row contains the details of the lowest level from which you want to get measure information. In other words, the most detailed item member in the fact table for each dimension has data rows. If you have measures that use other dimensions, simply create another fact table for those measures and dimensions. The data warehouse may contain different fact tables that have different measures and dimensions.

The sample data row in the preceding table shows the conceptual layout of the fact table. The fact is that the fact table almost always uses an integer value to represent (dimension) members without using descriptive names. Because the fact table tends to contain a large number of unimaginable rows of data-in a medium-sized data warehouse, the fact table often contains millions of rows of data-using integer key values can effectively reduce the size of the fact table. The actual layout of the fact table is shown below.

state_id

prod_id

Month

Sales_units

Sales_dollars

1

347

1

3

7.95

1

447

1

4

7.32

2

347

1

3

7.95

2

447

1

4

7.32

1

347

2

16

42.40

When you use an integer key value in a fact table, the name of the dimension member needs to be placed in another table-that is, the dimension table. Typically, each dimension in the fact table has a dimension table.

The fact table prefix is fact.

Induction:

Each data warehouse contains one or more fact tables. The fact table may contain business sales data, such as cash register transactions.

The resulting data, the fact table usually contains a large number of rows. The main feature of the fact table is the inclusion of numeric data (facts), which can be summarized to provide the relevant units as historical data, each fact table contains a multiple-part index that contains the primary key for the correlation latitude table as the foreign key, and the dimension table contains the attributes of the fact record. The fact table should not contain descriptive information, nor should it contain any data other than the numeric Measure field and the related indexed field that makes the corresponding item in the fact and latitude table.

Measures that are included in the fact table have two: one is a measure that can be accumulated, and the other is a non-cumulative measure. The most useful measure is a cumulative measure, and its cumulative number makes sense. Users can get summary information by accumulating measures, for example. You can summarize the sales of specific items for a group of stores within a specific time period. Non-accumulative measures can also be used in fact tables, and single summary results are generally meaningless, for example, when measuring temperatures at different locations in a building, it is meaningless to accumulate the temperature of all the different positions in the building, but averaging is meaningful.

In general, a fact table is associated with one or more latitude tables, and a user can use one or more dimension tables when creating a cube with a fact table.

Dimension table

A dimension table contains a specific name for each member of the dimension. The name of the dimension member is called a property (Attribute). Assuming that there are 3 products in the Product dimension, the dimension table will look like the following.

prod_id

Product_Name

347

Mountain-100

339

Road-650

64u

Cable Lock

A product name is a property of a product member. Because the product ID in the dimension table matches the product ID in the fact table, it is called a key attribute. Because each product ID has only one product name, which is displayed with a name instead of an integer value, it is still considered part of the key attribute.

In the Data Warehouse, the key attribute in the dimension table must contain a unique value for each member of the dimension. The term description in relational database is that the key attribute is called the primary key column. The primary key values in each dimension table are related to the key values in any related fact table. Each key value that appears once in the dimension table appears more than once in the fact table. For example, product ID 347 for Mountain-100 appears in only one dimension table data row, but it appears in multiple fact table data rows. This is called a one-to-many relationship. In the fact table, the key-value column, which is the "many" side of a one-to-many relationship, is called a foreign-key column. Relational databases use matching primary key columns (in dimension tables) and foreign key columns (in the fact table) values to join dimension tables to the fact table.

Moving the dimension information to a separate table, in addition to making the fact table smaller, has the added advantage of adding additional information to each dimension member. For example, a dimension table might add category information for each product, as shown below.

< P align= "left" >prod_id

product_name

category

347

mountain-100

bikes

339

Road-650

bikes

447

cable Lock

accessories

Now the category is another property of the product. If you know the product ID, you can infer not only the product Name but also the category. The name of the key property may be unique-because each key has only one name, but other properties do not need to be unique, such as the category attribute may appear several times. This allows you to create a report that groups fact table information by product and category.

In addition to the name, a dimension table can contain many other properties. Essentially, each property corresponds to a column in the dimension table. The following is an example of a product dimension table with only 3 members with additional attributes.

prod_id

Product_Name

Category

Color

Size

Price

347

Mountain-100

Bikes

Black

44

782.99

339

Road-650

Bikes

Silver

48

3399.99

447

Cable Lock

Accessories

NA

NA

25.00

Dimension attributes can be either grouped or non-grouped. In other words, have you ever seen a report that groups metrics by which attribute? In our example, Category, size, and color are all grouped properties. This naturally reminds you of the possibility of grouping sales by color, size, or type in a report. But Price doesn't look like a grouped property-at least it's not. There may be more meaningful other properties in the report-such as Price group, but the prices themselves vary too much to make grouping on the report less meaningful. Similarly, grouping on a report by the product Description attribute is not very meaningful. In a customer dimension, city, country, gender, and marital status are properties that can be grouped by them on a report, but street address or nickname should be non-grouped. Non-grouped properties are often referred to as member properties (member property).

Some of the grouped properties can be combined to create a natural hierarchy (natural hierarchy). For example, if product has category and subcategory attributes, in most cases a single product will only belong to a single subcategory, and a single subcategory will only belong to a single category. This will form a natural hierarchy. In the report, categories may be displayed, then allow the user to drill through to subcategories from a category and eventually drill through to the products.

Hierarchies--or drillthrough paths--are not necessarily natural (for example, each low-level member determines the next-highest member). For example, you might create a report that groups products by color, but allow users to drill through each color to a different size. Because of the drillthrough capabilities of the report, color and size form a hierarchy, but there is no information based on size to determine what color the product will be. This is a hierarchical structure, but not a natural hierarchy-but it is not an unnatural hierarchy. Color and size form a hierarchy and nothing wrong, it's just a simple fact: the same size can appear in multiple color.

The dimension table prefix is dim.

Induction:

A dimension table can be viewed as a window into which the user analyzes the data, and the latitude table contains the characteristics of fact records in the fact table, some features provide descriptive information, and some attributes specify how the fact table data is aggregated to provide useful information to the analyst, and the dimension table contains a hierarchy of attributes that help summarize the data. For example, a dimension table that contains product information typically contains hierarchies that divide products into categories such as food, beverages, non-consumer goods, and each of these products is subdivided more than once until each product reaches the lowest level.

In a dimension table, each table contains fact attributes that are independent of the other dimension tables, for example, the Customer dimension table contains data about the customer. Column fields in a dimension table can divide information into hierarchical levels of structure.

Conclusion:

1, the fact table is what you want to pay attention to;

2. The dimension table is the angle from which you observe the transaction, and from what angle to observe the content.

For example, the sales of goods in a region are viewed from a regional perspective. The fact table is the sales table, and the dimension table is the regional table.

Aggregation tables

The data is stored in the fact table in the most detailed format, and the various reports can take advantage of the data. General query statements when querying the fact table, one operation often involves thousands of records, but by using aggregate, average, extremum and other aggregation techniques can greatly reduce the number of data queries. Therefore, the underlying data from the fact table should be stored in the intermediate table in advance by aggregation. The intermediate table stores the aggregated information, so it is called the aggregation table, and this process is called the aggregation process.

"Bi thing-the art of data" understanding Dimension Data Warehouse-fact table, dimension table, aggregation table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.