Data Warehousing Special: Kimball Bus Matrix Description-Official edition

Source: Internet
Author: User

First, preface

Over the years, I had found that a matrix depiction of the Data Warehouse plan was a pretty good planning tool once you ha ve gathered the business requirements and performed a full data audit. This matrix approach have been exceptionally effective for distributed data warehouses without a center. Most of the new web-oriented, multiple organization warehouses we be trying to build these days with no center, so it's Even more urgent that we find a-to plan these beasts.

二、一级 Data Mart (First-level-Marts)

The matrix is simply a vertical list of data marts and a horizontal list of dimensions. Figure 1 are a example matrix for the enterprise Data Warehouse of a large telecommunications company. You start the matrix by listing all the first-level data marts so could possibly build over the NE XT three years across the enterprise. A First-level data mart is a collection of related fact tables and dimension tables which is typically:

    • Derived from a single data source
    • supported and implemented by a single department
    • Based on the most atomic data pos sible to collect from the source
    • conformed to the "Data Warehouse bus."

First-level data marts should be the smallest and least risky initial implementations of an enterprise Data warehouse. They form a foundation on which a larger implementation can is brought to completion in the least amount of time is still guaranteed to contribute to the final result without being incompatible stovepipes.

You should try-to-reduce the risk of implementation as much as possible by basing the First-level data marts on single production sources . In my experience, the cost and complexity of data warehouse implementation, once the ' right ' data has been chosen, turns o UT to is proportional to the number of data sources this must be extracted. Each separate data source can is as much as a six-month programming and testing exercise. Must create a production data pipeline from the legacy source through the data staging area and in to the fact and dim Ension tables of the presentation part of the Data warehouse.

In Figure 1, the First-level data marts for the telecommunications company is many of the major production data source S. An obvious production data source is the customer billing system, listed first. This row in the matrix was meant to represent all the base-level fact tables you expect to build in this data mart. Assume this data mart contains one major base-level fact table, the grain of which are the individual line item on a custom ER bill. Assume the line item in the bill represents the class of service provided, not the individual telephone call within the CL The service. With these assumptions, you can check off the dimensions this fact table needs. For customer bills, need time, customer, service, rate Category, Local Service Provider, Long Distance Provider, Locat Ion, and account Status.

Continue to develop the matrix, rows by listing, all the possible first-level data marts, that could is developed in the N Ext three years, based on known, existing data sources. Sometimes I am asked to include a First-level data mart based on a production system, does not yet exist. I usually decline the offer. I try to avoid including "potential" data sources, unless there was a very specific design and implementation plan in place . Another dangerously idealistic data source is the grand corporate data model, which usually takes up a whole wall of the I T Department. Most of the This data model cannot was used as a data source because it is not real. Ask the corporate data architect to highlight with a red pen the tables in the corporate data model that is currently pop Ulated with real data. These red tables is legitimate drivers of data marts in the planning matrix and can be used as sources.

The planning matrix columns indicate all the dimensions a data mart might need. A Real Enterprise Data Warehouse contains more dimensions than those in Figure 1. It is often helpful to attempt a comprehensive list of dimensions before filling in the matrix. When you start with a large list of dimensions, it becomes a kind of the creative exercise to ask whether a given dimension co Uld possibly is associated with a data mart. This activity could suggest interesting ways to add dimensional data sources to existing fact tables. If You are study the details of Figure 1, your may decide that is more X ' s should is filled in, or that some significant dimension s should be added. If So, more power to you! You were using the matrix as it was intended.

Inviting Data Mart Groups to the conforming meeting

Looking across the the The matrix is revealing. You can see the full dimensionality of all data mart at a glance. Dimensions can tested for inclusion or exclusion. But the real power of the matrix comes from looking at the columns. A column in the Matrix was a map of where the dimension is required.

Figure 1 The Matrix Plan for the enterprise Data Warehouse of a large telecommunications company.

The first dimension, time, is required in every data mart. Every data mart is a time series. But even the time dimension requires some thought. When a dimension was used in multiple data marts, , it must be conformed . Conformed dimensions is the basis for distributed data warehouses, and with using conformed dimensions is the the-to-avoid STO Vepipe data marts. A dimension is conformed when both copies of the dimensions are either exactly the same (including the values of the keys a nd all the attributes), or else one dimension are a perfect subset of the other. So using the time dimension with all the data marts implies, the data Mart teams agree on a corporate calendar. All the data mart teams must with this calendar and agree on fiscal periods, holidays, and workdays.

The grain of the conformed time dimension needs to being consistent as well. An obvious source of stovepipe data marts are the reckless use of incompatible weeks and months across the data marts. Get rid of awkward time spans such as quad weeks or 4-4-5-week quarters.

The second dimension in Figure 1, Customer, was even more interesting than time. Developing a standard definition for "customer" are one of the most important steps in combining separate sources of data F Rom around the enterprise. The willingness to seek a common definition of the customer are a major litmus test for an organization intending to build An enterprise Data warehouse. Roughly speaking, if an organization are unwilling to agree in a common definition of the customer across all data marts, T He organization should not attempt to build a data warehouse that spans these data marts. The data marts should remain separate forever.

For these reasons, your can think of the planning matrix columns as the invitation list to the conforming meeting! The planning matrix reveals the interaction between the data marts and the dimensions.

Communicating with the Boss

The planning matrix is a good communication vehicle for senior management. It's simple and direct. Even if the executive does not know much about the technical details of the data Warehouse, the planning matrix sends the message, definitions of calendars, customers, and products must is defined, or the enterprise won ' t be able t o Use it data.

A meeting to conform a dimension are probably more political than technical. The Data Warehouse project leader does not need to being the sole force for conforming a dimension such as Customer. A senior manager such as the enterprise CIO should be willing to appear at the conforming meeting and make it clear how IM Portant the task of conforming the dimension is. This political support is very important. It Gets the Data Warehouse project manager off the hook and puts the burden of the decision making process on senior Manag Ement ' s shoulders, where it belongs.

三、二级 Data Mart (Second-level-Marts)

After you has represented all the major production sources in the enterprise with First-level data marts, you CA n define one or more second-level marts. A Second-level data mart is a combination of the or more First-level marts. In most cases, a Second-level mart are more than a simple union of data sets from the First-level marts. For example, a Second-level profitability Mart could result from a complex allocation process that associates costs from SEv eral first-level cost-oriented data marts onto products and customers contained in a First-level revenue mart. I discussed the issues of creating these kinds of profitability data marts in my column, "Not so Fast."

Iv. Summary

The Matrix planning technique helps you build an enterprise data warehouse, especially when the warehouse is a distributed Combination of far-flung data marts. The matrix becomes a resource that's part technical tool, part project management tool, and part communication vehicle to Senior management.

Data Warehousing Special: Kimball Bus Matrix Description-Official edition

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.