"Reprint" MySQL Multidimensional Data Warehouse Guide First chapter 1th

Source: Internet
Author: User
Tags naming convention types of tables

MySQL Multidimensional Data Warehouse Guide

First Principles

Chapter List :

Chapter 1: Basic Composition

Chapter 2: History of Dimensions

Chapter 3: Dimensions of additive

Chapter 4: Dimension Queries

This article outlines

You will use a relational database to implement a dimensional data warehouse. The fact and dimension tables form a basic part of a data warehouse pattern, and in the first part of this book you will build these basic parts with a MySQL database.

Section 1 Chapters : Basic Composition

Overview

This chapter will look at two important topics: Star mode and surrogate key. Star schema is a data structure of a dimension Data warehouse. A surrogate key is a field that is added to the fact table as a primary key in the Data warehouse.

In this chapter you will embark on a long journey to build a real data warehouse. Some tasks need to be done in this chapter:

n Create a database user.

n Establish two relational databases, one as the data warehouse and the other as the source database.

n Create a database table for the Data warehouse.

N Generate Surrogate key:

You need to create a source database because your Data warehouse has no source data. This step is not necessary in practical applications because your data warehouse is likely to be based on existing data sources. You will start using this source database in the second chapter of this book.

Section 1 Section Star-mode

A good dimensional data warehouse requires a concise data structure. From a technical point of view, a concise structure will mean faster queries. In a dimension Data warehouse, there are two types of tables, fact tables, and dimension tables in the implementation of relational databases. A fact table contains business fact data (or a measure). The dimension table contains the kinds of queries for the database.

Note that you will have a better understanding of the fact tables and dimension tables after reading the first two chapters of this book.

The contact patterns of these tables in the Data Warehouse look like stars, so the term for this pattern is called star mode.

Note that in addition to star mode, snowflake mode is also used in the Data warehouse. However, it is more difficult to model than a star-shaped structure. Moreover, snowflake mode is not easy to understand and apply, and its query performance is lower than star mode. These flaws are that snowflake mode is not suitable for building dimensional data warehouses, so this book only deals with star patterns.

A star pattern contains a fact table that surrounds two or more dimension tables. The single structure has only one fact table, and the multi-star structure is a multiple fact table for each star , and the dimension table can be shared by multiple fact tables. This chapter includes only the single structure, and the reader will learn the multi-star structure in the 19 chapter "multi-star mode".

Figure 1-1 shows a single-star dimension pattern, which is a data warehouse for sales orders that we will expand in this book.

Figure 1-1 A single-star dimension pattern

The suffix of a fact table name is usually a fact, and Dim ( The abbreviation for Dimension dimension) is typically used for the suffix of the dimension table. According to this naming convention, it is clear that figure 1-1 has a fact table (Sales Order fact table sales_order_fact) and four dimension tables (Customer Dimension Customer_dim, order dimension Order_ Dim, Product Dimension Product_dim, and Date dimension Date_dim). The fact table contains one or more measurable fact values (a measurable fact value is abbreviated as a measure), while the dimension table classifies the metrics.

Each dimension table has only one surrogate key field, and the suffix of the Surrogate key field name is SK. Each surrogate key field in a dimension table will have the same field in the fact table, making it easier to query the database. However, the field with the sk suffix in the fact table does not represent a surrogate key field.

The lines in Figure 1-1 that connect the fact table Sales_order_fact and the four dimension tables indicate the connection relationships to query these tables. These connections are based on the surrogate key field of the dimension table.

When building a dimensional data warehouse, you will generate the surrogate key values in the Data Warehouse, which are not derived from the source data. The value of the surrogate key is a sequential number.

Note The surrogate keys are described in detail later in the section on "Surrogate keys".

Now that you know the concept of Star mode, fact table, and dimension table, let's take a look at an example if we are interested in the order amount of the order and decide to use the order_amount of the sales_order_fact table in Figure 1-1. field as a measure. Table 1-1 shows a sample record of the sales_order_fact table.

table 1-1: Sampling of a fact table

The data is presented in a spreadsheet format:

Customer_sk

Product_sk

Date_sk

Order_sk

Order_amount

1

1

1

1

1000

The corresponding data records in the dimension tables associated with table 1-1 will be shown in tables 1-2 through 1-5.

Table 1-2: Associated rows in the Customer dimension table

Customer_sk

Customer_no

Customer_name

1

1

Dons Limited

Table 1-3: Associated rows in a Product dimension table

Product_sk

Product_code

Product_Name

1

1

Cangcung Hard Disk

Table 1-4: Associated rows in a Time dimension table

Sk

Date

1

2007–02–01

Table 1-5: Associated rows in the Order dimension table

Order_sk

Order_number

1

1

The line in the fact table indicates that the amount of the order is$1,000. This is a measure of fact.sales_order_factIn the tableCustomer_skField value is1 represents the row that corresponds to the customer_sk field value of 1 in the Customer_dim dimension table . This association indicates that the customer who submitted the order was named Dons Limited. Use the value of the product_sk field in the fact table to track product information in the Product dimension table. You can get the order date by associating the fact table with the date_sk field of the Date_dim timesheet. Similarly, you can get the order number by associating the order_sk field in the fact table with the Order_dim order dimension table.

Section 2 Section Surrogate Key

The Surrogate key field in the dimension table is the primary key of the dimension table. The value of the surrogate key is usually a sequential number and does not represent any commercial significance. In contrast, many of the key values from the source data have commercial implications.

You will generate surrogate key values inside your data warehouse; You can not get surrogate key values from the source data, and in Chapter 2, "Dimension history" I will explain the purpose of doing so.

In MySQL, you can generate proxy values by setting the auto INCREMENT property of the Surrogate key field. inserts an incrementing integer by substituting a null value for the self-increment field.

There are already enough principles for us to start building our data warehouse. In the next section, the task describes the detailed steps you need to complete in this chapter.

--------------------------------------------------------------------------------------------------------------- -------------------------------

Original address: http://blog.chinaunix.net/uid-43642-id-2124571.html

"Reprint" MySQL Multidimensional Data Warehouse Guide First chapter 1th

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.