21 principles of data warehouse design

Source: Internet
Author: User

21 principles of data warehouse design-7 steps, 7 taboos and 7 ideas

Seven steps for efficient data warehouse implementation

The data warehouse has some kinship with our common RDBMS systems, but it is different. If you have not implemented a data warehouse, from setting goals to providing design, from creating a data structure to writing Data AnalysisProgramIn the face of discerning users, the whole process will bring you a completely different experience from previous projects. In a word, if you try to create a data warehouse in the old way, you will not be faced with budget overspending or the data warehouse you have created will not work well.

There are many issues to be aware of when dealing with a data warehouse project, but there are also many constructive references that can help you complete the task more smoothly. Open thinking and constantly trying new ways are also necessary for finding a feasible data warehouse implementation method.

1. Have a full-time project manager or you are fully responsible for project management.
Generally, the project manager is responsible for implementing multiple projects at the same time. This is all about funding and IT resources. However, for data warehouse project management, there must be no one person and several projects. Because you are in a field that you and your team have never entered before, everything about data warehouses-data analysis, design, programming, testing, modification, and maintenance-are all brand new, so if you or your assigned project manager can devote all your efforts to it, this will be of great help to the success of the project.

2. forward project management responsibilities to other project managers
The data warehouse implementation process is too difficult. To avoid self-abuse, you can assign project management responsibilities to other project managers after the current project is completed. Of course, this new project manager must compose what the first article says to be full-time. Why? First, from the perspective of the Project Manager, any stage of the data warehouse implementation process is sufficient to make people physically and mentally exhausted. From the development of physical storage devices to the implementation of extract-transform-load, from the design and development model to OLAP, all stages are obviously more difficult than previous projects. Each stage requires not only new processing methods, new management methods, but also innovative ideas. Therefore, assigning management responsibilities to other project managers will not only cause damage to the project, but also help.

3. communicate with users
The content here is far moreArticleIt is important. You must understand that at the data warehouse design stage, the potential users themselves do not know what they actually need the data warehouse to do for them. They are constantly exploring and discovering their own needs, and your development team is also doing the same thing in contact with customers. More frequent contact with customers, record more, and let your team focus more on the results of the project demand discussion rather than the discussion process itself.

Since your communication with customers is to understand the types of stored data and how to effectively store the data, you may need to (with your users) adopt a new method to observe the data, instead of directly processing data. You can try to find hidden information, such as digital fluctuations in a period of time. Instead of trying to find the answer to the project's needs, you need to find the answer.

4. Lead by technology/information library
Since the implementation of the data warehouse is very different in various stages, you need someone to maintain the continuous operation of the entire project, but this role does not need that full-time. Project implementation has three important aspects: Architecture, technology and business. The architecture as the focus can ensure that the data warehouse architecture will be well maintained from the physical layer throughout the project. We should focus on technology, because development teams and key users are using tools they have never used before, and someone must supervise the development process and consistency of tool usage.

Finally, the business requirements emerging in the application process of the data warehouse must be analyzed and recorded in detail to promote the continuous development of the machine. If users cannot communicate well with developers and other users, the development process of data analysis and measurement will be postponed. Therefore, some people must pay attention to business development and promote development to a higher level.

5. jump out of the trap of repeatedly modifying programs
The data warehouse implemented for the first time will certainly not be the final delivery version. Why? In fact, before you really see a product, you cannot determine what your goal is. In other words, the end user can tell you whether the product is expected only after using the Data Warehouse product for a period of time. Different from the projects you have previously processed, business intelligence is still in its infancy, and each company has a different interpretation of business intelligence. Therefore, your project will never succeed.

To get data in the correct format, you need to explore the changing situation. Bi has a strong personality. Different Environments, different markets, and different enterprises have different bi. What does this mean? This means that you need to put the database administrator in a relatively closed message environment, and do not let him know that the data structure of the data warehouse and the ETL program are constantly changing. There is no other way to do this. This reduces the pressure on you and DBA.

6. analyze a large number of front-end resources
During the implementation of the data warehouse, You Have To trudge through the old data, which comes from the old database, the old tape drive, and remote data. Most of them are messy and hard to get. You need to process the data in large quantities and design an ETL program to find useful information. If you want the entire project to work smoothly and find a way to succeed, your developers must spend enough time to fully study the old data, normalize messy data and design and implement robust data collection and conversion processes. The ETL part of the data warehouse occupies 80% of the total project resources, so make sure that your resources are used in the cutting edge.

7. Put interpersonal relationship processing first
In the data warehouse implementation process, the real hell is not from the technical or development aspects, but from people around you. You may encounter a leader who is not optimistic about the project and has no time to listen to your statement. You may encounter some developers who have delayed the process for too long and complain about why they cannot be implemented using the old method. You may also encounter some users with unrealistic fantasies. They hope to tap the mouse to implement the functions they imagined, but do not want to invest more in their intelligence, better training for their own employees. You are exhausted, encouraging investment, and introducing new development skills in development teams and users (and even bosses.

Always smile. When everything is done, your troubles will be swept away, and the last laugh will be the easiest. Seven taboos during data warehouse Development

The OLTP technology we used in the past may have hidden many serious defects. The implementation of a data warehouse is not a simple task. You will find that the accumulated experience is not suitable for processing the unique needs of each data warehouse.

The terms listed below are some of the problems you will face when implementing a data warehouse. Some of them seem less serious than you think, but you should try to avoid similar problems. A data warehouse is not a transaction processing system. It does not have certain standards and will not implement a specific application, but it is very organized in nature. In short, each company's data warehouse is unique, and the Implementation Method of each data warehouse is not static. When implementing a data warehouse, you should not only pay attention to "How to do it", but also "how not to do it ". The following is the summary of "What should I do ".

 

 

1. Do not write files that cannot be modified quickly.Code
The program you want to write is mainly used for data analysis, rather than processing transactions. And your users do not really know what programs they really want. Therefore, you have to modify the code several times to understand what kind of program the user needs. If the program you write has a good structure and flexibility, it will not be too effort-consuming to modify it. On the contrary, you will be exhausted by yourself.

2. Do not use database access APIs that cannot be modified
In the past, your database was able to provide stable data query services for a large number of customers. Today, your program must be able to cope with more data queries. This makes it imperative to rewrite the program so that each query request can obtain the maximum data size. In general, such code modification will not be successful once, so only the appropriate API can be modified, so that the program can adapt to new requirements as soon as possible.

3. Do not design anything that cannot be expanded.
In online processing (OLTP) applications, data analysis is not a real application. In fact, the key to data analysis is to obtain a large amount of old data, extract the data model from it, and use this model to infer new information. The code you write to access potential information should be scalable and new data can be appended. Do not assume that the data is in a fixed format in the code that supports data analysis.

4. Do not add unnecessary functions
A warehouse requires proper services. Users enter the warehouse and obtain the information they need from the shelves. That's all. Because business intelligence, analysis, and regular problems all have their own processing procedures, your customer's only need is to obtain information. They need an application environment that allows them to quickly obtain the data required for the analysis process from the data warehouse, regardless of what the data looks like. Maybe you want to help them refine the obtained data, but it is best not to do so. Remember not to add any functions that affect data access performance to the data analysis program of the customer.

5. Do not simplify data clearing and Data Source Analysis Steps
The most important thing to note when implementing a data warehouse is to analyze the data source for the extract-transform-load mechanism and clear the data for Load Optimization. It is assumed that the project manager needs more than half of the project resources at this stage. On the contrary, if you simplify it, you will regret it later. So even if the system is slow, do not simplify the process of clearing old data.

6. Avoid granularity and partition issues
There are two major data storage problems in the data warehouse design process. The first is how to locate an appropriate level of granularity for the converted data, and the second is how to partition the data absolutely. Why are these two questions so important? Because the response capability of the entire data warehouse is affected by granularity, and the efficiency of data access is directly related to the Data Partition performance. Therefore, this is a key task. Do not try to avoid these problems.

7. Do not use OLAP before considering business issues
Users usually don't know what kind of program they want before seeing it with their own eyes. Therefore, there are many mistakes in their views. For example, they want the analysis results to faithfully reflect performance measurements, or they want programs to make their departments or companies work differently. However, you must jump out of your responsibilities. From the perspective of IT managers, you must consider the operation methods of user departments and the entire enterprise to avoid such problems during the development process. In normal OLTP development, you can easily understand business processes. In the Online Analytical Processing (OLAP) field, everything needs to be examined in person. people working around you may not find any misunderstandings about your business. Therefore, do not assume that you have learned enough information. By constantly asking, you can truly understand what the "business" in "Business Intelligence" looks like.

Seven ideas for smooth data warehouse Development

For most IT consultants, it is more difficult to implement a data warehouse than any previous project. Considering different data structures, uses, and application development methods, most of the previous accumulated experience and skills are useless. However, as long as you make a slight correction on your way forward, you will find that it is not difficult to implement a data warehouse, even if you are the first to implement a data warehouse.

The following lists the steps that need to be taken into account in the data warehouse implementation process. Some of them may have never been realized, while others may have been used in the implementation process, but you may have more insights if you think about it again. Open-minded, constantly trying new ways to find a feasible data warehouse implementation method.

1. think twice about how to implement the application
The data warehouse does not involve transaction processing, and only occupies a small portion of the report. The essence of Data Warehouse applications is analysis, especially for business intelligence analysis. Bi is not commonly referred to as data: it is a new data that is modeled from old data. So how can we dig out the new data from the old data? In fact, this job is not done by you, but by your customers. From the perspective of the project supervisor, an experienced data table designer should work with you to decide how to integrate various programs. The main challenge encountered here will be how to observe the data in a new way, which is what your customers are trying to use.

2. Create abstract and well-deployed Database Access Components
There is a difference between the database project you used in the past and the current data warehouse: in the online transaction processing (OLTP) environment, the number of users is very large, however, the data used is relatively small, while in the Online Analytical Processing (OLAP) environment, the opposite is true. A small number of users are using a large amount of data. Your job is to write an application to optimize this difference. Here is a clue: In all your analysis programs, you must be able to capture continuous data items, in this way, data similar to the physical structure of the original data can be stored in the data structures established and accessed in the future. How to implement it? Do not normalize data first. Second, put it into the array to minimize the number of read requests. In this way, DBA will be happy to work with you.

3. Keep loose
Now let's look back at the first step. You should be able to understand that defining an analytical program is not a simple task. In general, it is difficult to implement a qualified final product for the first time. This problem also exists in the data structure you are going to analyze. In a word, there will be many variables in the implementation process, and you need to constantly change your program. We usually want to minimize the number of changes. In the implementation process of a data warehouse, it is essential that the analysis process should be free of errors, which also requires the participation of DBA. Do not grasp your program design, code, block diagram, or other things you create and keep making adjustments based on these changes.

4. Put management first
How do you analyze data sources? Do you think it is very difficult to clean up junk data? Not only do you think this way, but people who have done similar jobs have such opinions. In a general scale organization, as part of the data warehouse implementation process, a large amount of old data must be processed in a consistent manner. Therefore, it takes several hours to compile a conversion program to analyze the data source and import old data into the data warehouse. This is also the most important part of the entire project, which accounts for 3/4 of the entire project cycle and budget. So be careful.

5. discover problems between lines
It is very troublesome to communicate with users. Why? Because many users do not know what kind of products they want before they see the final product. Defining a data warehouse application is a process of exploration that needs to be repeated. Remember that the so-called "Business Intelligence" is defined by the user, and they process the business process according to their own understanding. Therefore, these users are bridges between data and business processing. What they want is not the data itself, but the intelligence hidden behind the data. You can ask them to discuss, think, and give constructive comments. But never let them solve it or let them imagine and express those "possibly" ideas. Finally, pay attention to the conclusions drawn by users at any time.

6. Stay ahead
The data warehouse does not seem to be rooted in the traditional OLTP model. Although many people are involved in data warehouse development, the implementation of data warehouse at the beginning seems quite messy because of its different framework from previous systems. But persistence is very important. It plays an important role in two aspects.

First, the adequacy of technology. It can track the deployment and proper use of software tools at any stage of the project, as well as the development process. If this compound your background, you can pay more attention to it.

Second, the adequacy of the architecture. It enables the physical and logical architecture of the data warehouse and the supported system to be persistent during the transformation of the project at various stages without changing. This is what you can provide.

7. issue a warning
remember that you are not the only one who has been in the New World. Everyone around you has one or more of the following questions: unrealistic expectations, misunderstandings about technology, old habits or bad habits, competitive behavior, or lack of trust in the project. Although the project manager is responsible for communication and other tasks, you must assume the same responsibilities. How do you act as the technical director? First of all, we must treat people around us with sincerity, but we must erect prestige and give appropriate warnings. When you find that the project progress is slow, the resources are lost, or the employees have lost their goals, you must speak out bluntly. Quick and clear warnings are wise in most cases. A data warehouse project in a hurry may be derailed, but do not let the failed project drop you down.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.