Fact tables and dimension tables (reproduced)

Source: Internet
Author: User

A dimension represents the amount of data you want to analyze, such as when you analyze a product's sales, you can choose to analyze it by category, or by region. Such a press. The analysis forms a dimension. The previous example can have two dimensions: type and region. In addition, each dimension can have sub-dimensions (called attributes), such as attributes that can have subtypes, product names, and so on.

Here are two common dimension table structures:

Product dimension table: prod_id, Product_Name, Category, Color, Size, price
Time dimension table: TimeKey, Season, year, Month, Date

The fact table is a result table that is generated from a dimension after data aggregation. Examples of its structure are as follows:

Sales Fact Sheet: prod_id (Reference Product dimension table), TimeKey (Reference Time dimension table), SalesAmount (total sales, in currency), Unit (volume)

The tables above are present in the Data Warehouse. From here you can see that it has several features:

1. The redundancy of the dimension tables is large, mainly because the dimensions are generally small (relative to the fact table), and the redundancy of the dimension tables can save a lot of space in the fact table.
2. Fact sheets are generally very large, and if queried in an ordinary way, the time to get the results generally is not acceptable to us. So it usually has to do some special processing. As in SQL Server 2005, the fact table is processed such as pre-build processing.
3. The primary key of the dimension table generally takes the type of the flag column of the integer value, which is also to save the storage space of the fact table.

The dividing line between the fact table and the dimension table

Fact tables are used to store the skeleton content of a theme. Take the daily workload as an example, the workload may have the following properties: Work date, staff, work hours, overtime, work nature, whether the field, work content, auditor. So what is the backbone content? It is easy to see the length of work, overtime is the backbone, that is, the basic content of the workload theme, then the date of work, staff, the nature of work, whether the field, work content is the backbone of information? Careful analysis of the characteristics will find, date, personnel, nature, whether the field can be classified, such as the date of the year-month-day level, the staff also have a subordinate relationship, the field and normal work is also two types of work attendance record, and the length of work and overtime is not such a sense. So we generally list the attributes that can be categorized, become dimension tables, and maintain the referential relationship between facts and dimensions in the fact table.

In the above example, the fact table can be designed as follows
Workdate Employeeid,worktypeid,islegwork,content,
The time, the employee, the type of work, and whether the field is classified as a dimension table.

All in all, the same tables that establish the primary foreign key relationship are the same. But the establishment of a dimension table is a hierarchy (though not a must, but also a typical feature), and the fact table is set up for the fact that already happened, is the archive of historical data, that is to say, should not be modified. Take the test department test software bug as an example. Each bug is a fact. The state of this bug may be designed to be new, transferred, repaired, rejected, etc. in the data dictionary. Then there is a field status in the bug table in the fact table. When a tester or developer changes the value of this state, how is it updated in the fact table? Is it a direct update to the status or what other way? Obviously, in order to be able to track the history of this bug, it should be to reinsert a new record. So what's the difference between this and the previous database design? You can see that for the original record and the newly inserted records, the other fields are all the same, i.e. all redundant. If you use BugID as the primary key, you will find that the primary key is redundant (of course, you can only delete the primary key before inserting). So it can be seen that the fact table generally does not have a primary key. The quality of the data is fully grasped by the business system.

In general, the design of a fact table is based on the ability to correctly record historical information, and the design of the dimension table is based on the ability to aggregate the subject content in a suitable perspective.

Dimension is the angle of the analysis problem, each dimension represents a unified path to access information in the Data Warehouse.

In real-world problems, some dimensions contain multiple levels.

The fact is that the intersection of each dimension is a measure of a particular event that exists only if the combination of certain dimension values does not cause a hole. The number of facts attribute is called a measure.

The recognition of fact data and dimension data must be based on specific subject matter. A fact table, a measure (measure) that stores facts, and a foreign key value that points to each dimension. The dimension table is used to hold the metadata of the dimension, that is, the description information of the dimension, including the hierarchy of the dimension and the member category.

The resulting data, the fact table usually contains a large number of rows. The main feature of the fact table is the inclusion of numeric data (facts), which can be aggregated to provide the relevant units as historical data, each fact table contains a multiple-part index that contains the primary key of the correlation latitude table as the foreign key, and dimension tables contain the attributes of a fact record。   The fact table should not contain descriptive information, nor should it contain any data other than the numeric Measure field and the related indexed field that makes the corresponding item in the fact and latitude table. Measures that are included in the fact table have two: one is a measure that can be accumulated, and the other is a non-cumulative measure. The most useful measure is a cumulative measure, and its cumulative number makes sense. Users can get summary information by accumulating measures, for example. You can summarize the sales of specific items for a group of stores within a specific time period.   Non-accumulative measures can also be used in fact tables, and single summary results are generally meaningless, for example, when measuring temperatures at different locations in a building, it is meaningless to accumulate the temperature of all the different positions in the building, but averaging is meaningful. In general, a fact table is associated with one or more latitude tables, and a user can use one or more dimension tables when creating a cube with a fact table.
Let's give a practical example. The bank accounts for the deposit, the actual data in table A, including the account number, affiliation, deposit amount and so on, B is the corresponding relationship between the institution number and the institution name.   Then a is the fact table and B is the dimension table. A dimension table can be viewed as a window into which the user analyzes the data, and the latitude table contains the characteristics of fact records in the fact table, some features provide descriptive information, and some attributes specify how the fact table data is aggregated to provide useful information to the analyst, and the dimension table contains a hierarchy of attributes that help summarize the data.   For example, a dimension table that contains product information typically contains hierarchies that divide products into categories such as food, beverages, non-consumer goods, and each of these products is subdivided more than once until each product reaches the lowest level. In a dimension table, each table contains fact attributes that are independent of the other dimension tables, for example, the Customer dimension table contains data about the customer.  Column fields in a dimension table can divide information into hierarchical levels of structure. To put it simply:
1, the fact table is what you want to pay attention to;
2, the dimension table is you to observe the angle of the transaction, from which angle to observe this content.

For example, the sales of goods in a region are viewed from a regional perspective. The fact table is the sales table, the dimension table is the regional table.

Fact tables and dimension tables (reproduced)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.