21 Principles of Data Warehouse design [dmresearch.net]

Source: Internet
Author: User
Tags data structures dba final implement key query requires client
Design | data
[For the convenience of their own reading so collect and organize here, Www.DMResearch.net]



21 Principles of Data Warehouse design

--7 steps, 7 taboos and 7 ways of thinking



Seven steps to efficiently implement a data warehouse

The data warehouse is somewhat related to our common RDBMS system, but it is different. If you have not implemented a data warehouse, the whole process will bring you a completely different experience from setting goals to designing, from creating data structures to writing data analysis programs, to facing critical users. In a word, if you try to create a data warehouse in an old way, you're not faced with budget overruns or the data warehouse you're building is not functioning well.

There are a lot of issues to be aware of when dealing with a data warehouse project, but there are also a lot of constructive references that can help you get the job done more smoothly. Open minded, constantly trying new ways, is also necessary to find a feasible method of data Warehouse implementation.

1. Equipped with a full-time project manager or your own overall responsibility for project management
Under normal circumstances, the project manager will be responsible for the implementation of multiple projects at the same time. This is done entirely out of financial and IT resources considerations. However, for the management of the Data Warehouse project, it is absolutely impossible to have one person and several projects. Since you are in an area where you and your team have not entered before, everything about data warehousing-data analysis, design, programming, testing, modification, maintenance-is all brand new, so you or your assigned project manager can be a great help to the success of the project if you are fully committed to it.

2. Push project management responsibilities to other project managers
Because the Data warehouse implementation process is so difficult, to avoid self abuse, you can push project management responsibilities to other project managers after the current phase of the project is complete. Of course, this new project manager will have to compound the first one said to be full-time. Why would you do that? First of all, from the project manager's point of view, the Data Warehouse implementation process at any one stage is enough to make people physically and mentally tired. From the development of physical storage devices to the realization of extract-transform-load, from the design and development model to the OLAP, all phases are obviously more difficult than the previously contacted projects. Each phase requires not only new processing methods, new management methods, but also innovative viewpoints. Therefore, the management responsibility to other project managers will not damage the project, but also can play a role in helping.

3. Communicate with the user
What is said here is much more important than an article itself. You have to understand that in the design phase of the data warehouse, the potential users themselves are not sure what they need the data warehouse to do for them. They are constantly exploring and discovering their own needs, and your development team is doing the same thing with customers. More frequent contact with customers, more records, and let your team focus on the results of the project requirements discussion rather than the process itself.

Since your communication with your client is to understand what type of data is stored and how to store it efficiently, you may need to use a new approach (along with your users) to observe the data rather than directly process the data. You can try to find hidden information, such as digital fluctuations over a period of time. Instead of trying to find the answer to the project's needs, ask for answers.

4. Technology/Information base as the leader
Since the various stages of the data Warehouse implementation are very different, you need someone who can maintain the continuity of the entire project, but this responsibility does not require that kind of full-time. There are three key aspects of project implementation: architecture, technology and business. The focus of the architecture is to ensure that the architecture of the Data warehouse is well maintained throughout the project from the physical level. And we should focus on technology because the development team and key users are using tools they've never used before, and someone has to oversee the development process and the consistency of the tools used.

Finally, the business requirements emerging in the application process of the data warehouse must be analyzed and recorded in detail to promote the development process of the machine to continue. If users cannot communicate well with developers and other users, the development process for data analysis and metrics will be postponed, so someone has to focus on business development and drive development to a higher level.

5. Jump out of the trap of repeatedly modifying the program
The first implementation of the Data warehouse is certainly not the final delivery version. Why, then? In fact, before you actually see the product, you can't be sure you know what your goal is. Or, the end user will only be able to tell you the product is not what he wants until after using the Data Warehouse product for some time. Unlike the projects you've been dealing with in the past, business intelligence is in the early stages of development, and each company has different interpretations of business intelligence, so your project never succeeds.

In order to get the data in the right format, you need to explore in the changing situation. Bi has a strong personality, different environments, different markets, and different companies have different bi. What does that mean? This means that you need to put the database administrator in a relatively closed environment, do not let him know the data structure of the warehouse and the ETL program is constantly changing. There is no other way to do that. This will ease the pressure on you and the DBA.

6. Data source analysis for a large number of front-end resources
In the Data Warehouse implementation, you have to trek through old data from old databases, old tape drives, and remote data. Most of them are messy and difficult to obtain. You have to do a lot of this data, and also design an ETL program to find useful information. If you want the whole project to work out well and find a way to succeed at once, it will take your developers enough time to fully study the old data, rule the messy data, and try to design and implement robust data acquisition and conversion processes. The ETL portion of the Data Warehouse consumes 80% of the total project resources, so make sure your resources are in the cutting edge.

7. Putting interpersonal relationships first
In the implementation of the Data warehouse The real hell doesn't come from technology or development, but from the people around you. You may encounter a leader who is not optimistic about the project and has no time to listen to your presentation. You may meet some developers who procrastinate too long and complain about why they can't use the old method. You may also encounter some unrealistic fantasies of users who want to be able to perform their imagined functions with a light mouse, but are unwilling to do more intellectual investment and train their own employees. And you're already exhausted, encouraging investment, and promoting new development skills in development teams and users (and even bosses).

Anyway, you have to keep smiling. When all is done, your troubles are dispelled, and the last laugh is the easiest.



Seven taboos in the process of data Warehouse development

The OLTP technology we've been using in the past may be hiding a lot of serious flaws. The implementation of the Data warehouse is not a simple task, you will find the rich experience accumulated before, not suitable for the unique requirements of each data warehouse.

The terms listed below are some of the problems you will face in implementing the Data Warehouse, some of which may not seem as serious as you might think, but you should try to avoid similar problems. The Data warehouse is not a transaction processing system, it does not have a certain standard and does not implement a particular application, but it is inherently very organized. All in all, the data warehouses built by each company are unique, and the implementation of each data warehouse is not immutable. In the implementation of the data warehouse need to pay attention to not only "how to do", but also pay attention to "how to do". Here are the seven points we've summed up.

1. Don't write code that you can't quickly modify
The program you are writing is primarily for data analysis, not for dealing with transactions. And your users don't really know what kind of program they really want. So you have to rewrite the code several times before you know what kind of program the user needs. If you write a program that has good structure and flexibility, it won't be a waste of effort even if you need to change it. Conversely, you will be exhausted by yourself.

2. Do not use the database access API that cannot be modified
In the past, your database can provide a stable data query service for a large number of customers. Now, your program must be able to cope with more data queries. This makes it imperative to rewrite the program so that each query request gets the maximum amount of data, which is generally not a success, so the program can adapt to new requirements as quickly as possible by choosing the appropriate API to modify.

3. Don't design anything that can't be expanded
In online process (OLTP) applications, data analysis is not a real application. In fact, the key to data analysis is to get a lot of old data, extract the data model from it, and infer the new information with this model. The code that you write to access the potential information should be extensible and can append new data. Never assume that data is in a fixed format in code that supports data analysis.

4. Do not attach unnecessary functions
A warehouse to do is just the right service, the user went into the warehouse, from the shelves to obtain their own information, nothing more. Because business intelligence, analytics, and regularity issues all have their own processes, your customer's only need is to get information. They need an application environment that allows them to quickly obtain data from the Data warehouse for the analysis process, regardless of the data. Maybe you want to help them refine the data they get, but it's best not to do so. Be sure to keep in mind that you do not add any functionality that affects data access performance to your customer's Data Analyzer.

5. Do not simplify the process of data cleanup and data source analysis
The most noteworthy place in the implementation of the data warehouse is to analyze the data source for the extract-transform-load mechanism and to purge the data for the optimal load. The safe approach is to assume that the project manager will need more than half of the project's resources at this stage. On the contrary, if you make a simplification in this area, you will certainly regret it later. So even if the system works slowly, do not simplify the process of cleaning up old data.

6. Do not avoid particle size and zoning problems
In the process of data warehouse design, there are two biggest data storage problems, the first is how to position the transformation data to a proper granularity, and the second is how to partition the data absolutely. Why are these two issues so important? Because the response capability of the entire data warehouse is affected by granularity, and the efficiency of data access is directly related to the data partitioning performance. So this is a crucial task and do not try to avoid facing these problems.

7. Do not use OLAP before you consider business problems
Users usually don't know what kind of program they want before they see the program with their own eyes. So there are a lot of mistakes in their view, like they want the results to be faithful to performance metrics, or they want the program to make a difference in the business of their department or company. And you have to jump out of your area of responsibility, from an IT manager's point of view of the user department up to the entire enterprise operation mode, in order to avoid such problems in the development process. In the usual OLTP development, you can easily understand the business process. In the field of online analytical processing (OLAP), everything needs to be inspected in person, and the people who work around you may not find your misunderstanding about the business. Therefore, do not assume that you already know enough information. Constant questioning will give you a real idea of what "business" is like in "Business intelligence".


Seven ways to develop data warehouse successfully


For most IT consultants, the difficulty of implementing a data warehouse is more difficult than any project that has been done before. Given the different data structures, uses, and application development methodologies, the accumulated experience and skills are largely useless. But as long as you make a slight correction on your way forward, you'll find it's not difficult to implement a data warehouse, even if it's the first time you've implemented a data warehouse.

The following is a list of steps to consider in the Data Warehouse implementation process, some of which you may never be aware of, while others may have been used in the implementation process, but rethinking perhaps you will have more insight. Open thinking, try new ways, find a feasible method of data Warehouse implementation.

1. Think twice about how the application is implemented
The data Warehouse does not involve transaction processing and is only a small part of the report. The nature of the Data Warehouse application is analysis, especially for business intelligence analysis. Bi is not the usual data: it is a new data that is modeled from the old data. So how can you dig out the new data from the old data? In fact, this job is not for you to accomplish, but for your client to do. From the project manager's point of view, there should be an experienced data Form designer working with you to decide how to integrate the various programs. The main challenge will be how to look at the data in a new way, which is what your customers are trying to use.

2. Create an abstract, well deployed database access component
In the past you have contacted the database project and the current data warehouse, there is a point of difference, that is: in the online Transaction processing (OLTP) environment, the number of users is very large, but the use of less data; Analytical processing (OLAP) environments are just the opposite, with a small number of users using large amounts of data. And your job is to write an application that optimizes the difference. Here's a clue: in all of your analysis programs, you have to be able to crawl continuous data items so that data that is similar to the physical structure of the original data can be stored in the data structures that you build and access later. How to achieve it? First, do not normalize data. The second minimizes the number of read requests placed in the array. In this way, the DBA will be happy to work with you.

3. Keep Loose
Now looking back at the first step, you should be able to understand that defining an analytic program is not a simple matter, and it is generally difficult to achieve the final product that meets the requirements for the first time. And that's the same problem with the data structure you're going to analyze. In a word, the implementation process will have many variables, you need to constantly change your program. Usually we all want to minimize the number of changes. In a data warehouse implementation, the essence is to analyze the process without error, which also requires the involvement of the DBA. Don't die. Grab your program design, code, block diagram, or something else you've built and don't let go, keep adjusting to that change.

4. Putting Management first
How do you do in analyzing data sources? Do you think it is very difficult to clean up the rubbish data? You're not the only one who thinks this way. People who have done similar jobs have this view. In a general-scale organization, as part of the Data Warehouse implementation process, there will be a large number of old data that must be consistently processed. So analyzing the data source and spending hours writing the converter to import the old data into the Data warehouse is the hardest part of the entire data Warehouse implementation process. And this is the most important part of the whole project, which can account for three-fourths of the whole project cycle and budget. So be sure to treat it with care.

5. Finding problems from the lines
Communication with the user is a very troublesome thing, why do you say so? Because many users don't know what they want before they see the final product. Defining a Data Warehouse application is an exploratory process, and the process has to be repeated. Remember that the so-called "business intelligence" is defined by the user, who processes the business process according to their own understanding. So these users are the bridge between connecting data and business processes. What they want is not the data itself but the intelligence behind the data. You can have them discuss, think, and give constructive advice. But don't let them solve or let them arbitrarily imagine and publish those "possible" views. Finally, be sure to keep an eye on the user's conclusions.

6. Stay ahead
The data Warehouse does not seem to be rooted in traditional OLTP patterns, as it is. Although many people are involved in the development of the data Warehouse, the implementation of the data warehouse seems to be quite confusing for the first time because its framework is very different from that of the previous system. But it's important to keep going. It has two important roles to play.

First, the technology leadership. It can track the deployment and proper use of software tools at any stage of the project, as well as the development process. If this is a combination of your background, you can pay more attention to it.

Second, the leadership of the architecture. It enables projects to be converted at all stages, and the physical and logical architecture of the data warehouse and the systems it supports are consistent and unchanged. And that's what you can offer.

7. Issuing a warning
Finally, remember that you are not the only one on the new continent. Everyone around you will have one or a few questions: unrealistic expectations, misconceptions about technology, old habits or bad habits, competitive behavior, or lack of trust in the project. Although the task of communication and communication should be the responsibility of the project manager, in fact, you have to assume the same responsibility. So what do you do as a technical director? First of all, we must sincerely treat the people around, but we must erect prestige, appropriate to issue a warning. When you find that the project is slow, the resource is lost, or the employee loses the target, be outspoken. Giving warnings quickly and clearly is a wise move in most cases. A hastily launched Data Warehouse project may derail, but don't let failed projects take you down.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.