Flexible and effective data warehouse solutions: Part 1: customer interaction and Project Planning

Source: Internet
Author: User
Tags ibm db2

 

Adopt flexible and effective methods to plan, design, and implement Basic Data Warehouse solutions based on IBM DB2 Data Warehouse Edition. Part 1 of this series will focus on the customer interaction process and plan data warehouse projects ......

Adopt flexible and effective methods to plan, design, and implement Basic Data Warehouse solutions based on IBM DB2 Data Warehouse Edition. Part 1 of this series will focus on the customer interaction process and plan data warehouse projects.

 Introduction

Business Intelligence has evolved into more and more data analysis technologies. No matter which data analysis method is used, data warehouses are still an important foundation for the use of information assets. This series of articles will help you use IBM DB2 Data Warehouse Edition (DB2 DWE) to deliver a Data Warehouse infrastructure that is critical to on-demand business intelligence. This article focuses on the Data Warehouse plan, including the customer interaction process, business discovery, project suggestions, and project plan. Target readers

This article is for IT professionals who need to know how to deliver Data Warehouse solutions. This document assumes that you are familiar with the concepts of systems and databases. Many topics are not described in this article, but they are also the basis for delivering good Data Warehouse solutions, including system and database design, management, and performance tuning. This article focuses only on issues closely related to data warehouses. What is business intelligence?

Business Intelligence (BI) is a collection and analysis of large volumes of data to gain insight into how to drive strategic and strategic Business decisions. BI is a collection of processes and technologies used to convert data into information. It includes a wide range of technologies, including data warehouse, multi-dimensional analysis or Online Analytical Processing (OLAP), data mining, and data visualization, as well as simple queries and many analysis tools used to prepare reports. These technologies allow business users to collect, store, access, and analyze data to improve their ability to make business decisions. Figure 1. What is business intelligence? What is a data warehouse?

Data warehouse is a centralized repository that contains comprehensive and detailed data and summary data, provides a complete view of customers, suppliers, business processes, and transactions from an ever-changing perspective.

Data mart, on the other hand, contains a subset of the data stored in a data warehouse that is of interest to a specific business community, department, or user group (for example: sales Promotion, finance, or account collection ).

The data mart is defined by the functional scope of its users rather than the size of the data mart database, realizing that this is important at 01:10. In a well-structured BI system, a data warehouse acts as a source for multiple data marketplaces. What is a data warehouse?

Data warehousing is the design and implementation of processes and tools used to manage and deliver complete, timely, correct, and understandable information for decision making. It includes all activities that enable an enterprise to create, manage, and maintain a data warehouse or data mart. Data warehousing manages the development, implementation, and operations of data warehouse or Data mart. It includes metadata management, data collection, and data cleansing) data integration, storage management, data distribution, data archiving, operation report production, analysis report production, security management, backup and recovery plans, etc.

The following section describes the data warehouse (except report creation and analysis. Focus on preparing data for analysis-this task typically accounts for 80% of most data warehouse project plans. Why choose IBM DB2 Data Warehouse Edition?

IBM DB2 DWE is a powerful and complete Business Intelligence infrastructure product, it includes DB2, integrated OLAP, advanced data mining, data Extraction, conversion and Loading (Extraction, Transformation and Loading, ETL), and report production tools. DB2 DWE operates and improves performance of advanced desktop OLAP tools such as DB2 OLAP Server and IBM partners.

DB2 DWE is one of the most cost-effective data warehouse tools. According to Market Magic Ltd's research report in 2004 (see references), DB2 DWE has implemented Probable Cost of Ownership for Data Warehouses over five years? (PCO) lower than Oracle and NCR Teradata.

Foreseeable scalability and unlimited scalability are the key criteria of the Business Intelligence platform. DB2 meets this requirement through its unique shared-nothing architecture. Scalability is applicable to both large and small databases.

Scalability and price are important, but they cannot solve the challenge of building a BI platform separately. DB2 DWE completes the blueprint by delivering key analysis and mining technologies. DB2 and the DB2 Cube Views for OLAP applications, Intelligent Miner Scoring for Real-time Data Mining in databases, and spatial extenders such as deep-embedded DB2) it is fully integrated with new tools such as XML queries to ensure seamless integration and optimized performance. Customer interaction process

The customer interaction process of the data warehouse solution is similar to that of other IT solutions in some way. However, data warehouse solutions have some important differences, including powerful business-oriented data, multi-layer process iterations, and more end users.

Shows the main interactions you have with customers during a successful project as a data warehouse solution provider. Figure 2. customer interaction process of data warehouse solution

Solution start up: in the initial step of customer interaction, you and your customers will decide to start the data warehouse project and start establishing the protocol. This is a common step for all types of projects, so it will not be discussed in detail in this article.

Business discovery: This is a process of understanding the differences between the current and expected Business data analysis requirements. It includes collecting and recording business requirements, understanding the customer environment, and performing difference analysis. (For details, see the next section .)

Solution proposal: based on customer needs, you need to provide suggestions for data warehouse projects or solutions.

Solution planning: In this step, you plan the Solution and specify the required data warehouse infrastructure, personnel, and resources.

Warehouse conceptual modeling: advanced Warehouse design includes Warehouse architecture and implementation selection, and conceptual data modeling for capturing all business theme fields defined in business requirements.

Warehouse phase design: Warehouse phase design includes logical and physical data modeling to capture business needs at a more detailed level, but only designs the subject fields in the current project iteration. This step also includes the ETL process design.

Solution implementation cycle: Data warehouse implementation includes the target repository, data mart database, and ETL process implementation.

Solution deployment: Move the new data warehouse Solution to the production environment.

The customer interaction process of this data warehouse is based on the bottom-up (or staged) data warehouse implementation method. After the data warehouse solution is deployed, you can start the project for other business topics related to the current business needs on the new logical and physical data modeling, or if you have new business requirements, restart the business discovery phase. Business discovery

The business discovery process consists of three tasks: collecting and recording business requirements, understanding the customer's business environment, and performing difference analysis. These three tasks can be overlapped, and you will always execute several of these tasks at the same time. For example, a part of understanding business needs is to investigate the customer's business data sources, which involve three business discovery tasks. Before starting the business discovery process, it is important for the solution provider to understand the objectives of each task.

The purpose of the difference analysis is to understand the customer's business difficulties and needs, and evaluate the resources that need to be used to compensate for the current business status and the differences between the business needs. Figure 3. Business discovery process

  Collect and record business requirements

During this task, you should be able to discover and understand the customer's business difficulties, identify and prioritize business needs, and focus on the business theme areas of interest. In a perfect world, at the beginning of customer interaction, you may have a complete set of business requirements for writing data warehouse projects. In the real business world, especially in the intermediary market, initial business needs are usually incomplete. Initial contacts often include phone calls, email or informal conversations. It is important to follow all initial meetings to fully identify all business needs before investing too much time and resources in the project.

Collecting complete business requirements is not a common task. It needs to actively communicate with your customers. An experienced analyst should have strong business and personnel skills and reasonable knowledge about data warehouse and data modeling. Determine end user requirements

When collecting requirements, you collect and record the requirements of end users. You usually need to study how end users are involved in business processes and information analysis activities. Because these end users do not necessarily understand the concept of a data warehouse, you should ask questions that allow you to understand specific business problems. In this phase, it is usually found that the requirements of end users are informal records, and they are not represented by a detailed data structure. When collecting the requirements of end users, you can interview end users, study existing documents and reports, and monitor ongoing information analysis activities. Experience in business process engineering and information analysis may be very helpful.

 

End user requirements can be divided into four categories:

The business object is an advanced representation of the Information Analysis target in commercial terms. A given data warehouse project may have one or more business objects. For example, the business object can be: "The data warehouse must support the analysis of operation costs and the analysis of product sales profits ."

The joint business object set in the data warehouse project can help determine the project scope. They can also help identify the information topic fields involved in the project, and identify the business processes (usually high-level) analyzed by end users.

Business query indicates the query, hypothesis, and analysis problems that end users ask during their daily information analysis activities. Like business objects, business queries are also expressed in commercial terms. You will typically expect precise planning of them. They are not represented in SQL terms. Some instances frequently encountered in business query categories include:

Check queries exist. For example, "is a given product sold to a specific customer ?"

Item comparison query, for example, "comparing the prices of two customers in the past six months" or "comparing the number of items sold for a specific product per store per week ".

Trend analysis query, for example, "how is the sales growth of a given product set in the past 12 months ?"

Used to analyze the ratio, level, and cluster query, for example, "list the best customers according to last year's Dollar sales ."

Statistical analysis query, for example, "calculate the average item sales of each product category in each sales area ."

The data analysis scenario is a good way to increase the essence of the demand set you capture and analyze. For example, some business requirements are generated by analyzing the existing report query workflow and interpreting the current business data analysis structure.

The existing data model may be available and can be used to further specify or support the requirements of end users. You can reconstruct and integrate the source data model to collect the data model.

The demand set of end users involves many fields and many factors can affect the results. These factors may include the end user's business knowledge, how well they can express themselves, or how long they have been interviewed. User requirements also change over time. The correct content of a day may not be valid until the next day. How do you know when the user's needs are successfully identified? There is no absolute test, but if your needs solve the following problems, you may have enough information to start Data Modeling:

Who are users interested? Consider individuals, groups, and organizations.

What business processes and functions are analyzed by end users?

Why do users need data?

When (which time point) need to record data?

Where does the related process occur (geographically, in organization?

How can you measure the performance or status of business processes and functions? Determine functional requirements

End user requirements help you understand the current business process and business difficulties, while functional requirements help you understand the proportion of services that customers expect from the data warehouse solution. The queried questions are based on your data warehouse knowledge, evaluation, and understanding of end user needs. The functional requirement information is usually derived from key business contracts, business managers, IT professionals, and potential end users. Function requirements help you set the overall project proportion and objectives. Query the following questions:

What new information analysis functions do you need to improve your business? Give a detailed definition of the report you want to build based on the data warehouse.

If you have an existing data analysis process, what problems do you encounter?

How many potential users are there in the new data warehouse? Where are they located?

How often does a business report need to be rebuilt?

Who on the client will participate in the project and what are their responsibilities?

What is the project budget (if the information is available )?

What is the target data of project completion?

If you are obligated to define specific aggregate measurements, what are the definitions of those measurements?

What type of Security Configuration does the data warehouse need? Understanding the customer's Environment

You must understand the customer's environment when collecting and recording business requirements. These tasks will continue throughout the project. It is very important to understand the customer environment in the early stages of the project, so as to avoid misunderstanding and undesirable surprises. Many business and technical assumptions are based on the results of early customer environment surveys. Understand the customer's business environment

It is difficult to predict what knowledge you need to fully understand the customer's business environment, because each business is unique. However, to achieve successful customer interaction, you must know a few things. They include but are not limited:

Who is the project decision maker?

Who is the key contact person of the project?

What types of business problems need to be solved?

Who is an end user? End users may not be decision makers, but they provide valuable information about data warehouse availability.

What special business knowledge do you need?

Does your customer have IT personnel? If yes, how much support can you get from them? Understanding Information Infrastructure Environment

The customer's network environment may be simple or complex. You may not need to understand everything about its network, but you need to record things related to the Data Warehouse Production Environment for designing and configuring the data warehouse. Here are some things you should know:

What types of network connectivity and protocols are used in the production environment?

What is the average and maximum throughput of network circulation? When is the conflict and peak time?

How many end users does a data warehouse need to support? What operating system and report production applications do your customers plan to use?

Where is the end user located (in a LAN, across the public Internet or WAN, In a VPN, or in some combination )? Understanding the data environment

Understanding the customer's data environment is one of the most important tasks in the data warehouse project. It is the basis of the following work:

Construct realistic project suggestions and contracts.

Design and implement data collection.

Design the data warehouse and data mart.

Design data verification and cleanup.

Assign this task to experienced data warehouse professionals who understand business analysis, data analysis, and modeling. They need to work with the customer's IT staff and end users.

To identify data sources that will support warehouse data models and business needs (both internal and external), you need to record all data sources. Describes its location, system, access method, source data stream communication and update frequency, data security, and data quality.

Data quality is one of the important issues in identifying data sources. It is necessary to determine whether the business data is available, and whether the data quality of all data sources is sufficient to support the business needs. If there is a data quality problem, you and your customers need to know as soon as possible.

Data problems may exist in customers' data sources for many years. In many cases, the problem is found in the design and implementation of the data conversion process in the early analysis of source data or later. Make sure your customers are notified so that they can prepare a processing plan.

Checking data quality is not a common task; it requires both data modeling and business knowledge. Most likely, you will need some end users to participate in this task. In some cases, you may not be able to access sensitive business data. In this case, you should try your best to obtain some random business data samples and allow the customer to modify some data values without affecting the data quality.

You need to know as much project-related data as possible. The following are the questions you need to answer in detail (not only at a high level ):

How many data sources are related to projects and where are they located?

Does the Data Warehouse directly access the data source? What types of data connections are supported?

Does the Data Warehouse require external data from the customer's Enterprise Network (intranet? How can I access data?

How much new data is generated every day in all data sources?

What is the expected frequency of data updates in a data warehouse?

Is there any shared data? If so, which is the primary data source?

What is the data quality? If possible, check all available data fields.

Can your customers correct the lost or dirty data in the data source?

Can customers ensure the data quality of data fields corrected in the future? If not, who will clean up the data?

If the lost or dirty data cannot be corrected in the customer's data source, what business rules will be used to correct the data? Difference Analysis

After collecting business requirements and studying the business and data environment, you can perform a variance analysis. The difference analysis checks your information and determines the resources and work required to meet customer requirements. The purpose of the Difference Analysis is:

Understand the customer's business difficulties.

Understand the subject areas of customer issues.

Provide the customer with an evaluation of the resources or support they need, and an evaluation of the development work required by your party to deliver the solution according to the customer's needs.

Help specify the technologies and tools used for the project.

Difference analysis is very important because it is the basis for your data warehouse project suggestions and design. Requirement Analysis

Depending on the amount of time and resources you use, you can decide to put demand analysis at the business level. This means that you will generate a complete report on your understanding of the customer's business difficulties, business areas, report definitions, and performance measurements.

By using demand analysis technology, you can build an initial warehouse data model to present your end-user needs that have been captured in an informal manner. The requirement analysis generates a chart (schematic) representation of the model that the Information Analyst can directly explain. After the requirement analysis result passes the requirement verification stage, it will become the main input for data warehouse modeling. Identify business challenges

When you have a complete set of project business needs, business difficulties related to the project can be easily identified. You may have to go back to the customer and ask more questions to discover and prioritize all of their business challenges. In a data warehouse project, you usually divide business difficulties through departments, business questions to be answered, and the closeness of business questions. When identifying business challenges, you can discover potential new projects to detect all new opportunities. Identify business subject fields

Theme domains are roughly divided by topics of interest to the business. To extract a list of potential theme domains, you should first consider the customer's commercial interests (for example, customers, profits, sales, organizations, or products ). To help identify theme domains, we need to consider "when, where, who, what, why, and how" related to commercial interests. For example, possible answers to "who" questions include customers, employees, managers, suppliers, business partners and competitors. After identifying the list of all candidate topic domains, you can more clearly break down, rearrange, select, and redefine them to generate a list of theme domains that best represent your customer organization. Determine the level or other groups of theme domains to provide a clear definition of what they are and how they are associated.

Once you have developed a list of theme domains, You need to define the business relationships between them. The relationship is a good starting point for determining the dimensions that may be used in the Dimension Data Warehouse model, because the topic field is a full picture of the business you are interested in. Based on the customer's business difficulties, it is important to give priority to some theme areas and put project suggestions based on these priorities. Identify differences

Based on the analysis results of business requirements and your understanding of the business environment, you can identify the difference between the current content of your business and what they expect from the data warehouse project. Data difference

Data is undoubtedly the main element in the data warehouse project. The first question to be answered in the difference analysis is whether available business data supports business needs. Please pay special attention to the following areas:

Is there enough data to meet the project's business goals? If not, can I obtain an external data source to fill this gap?

Does business data quality meet business needs? If not, can we clean up enough to meet the stated needs?

Although it is the responsibility of the customer to provide high-quality data, it is an opportunity to provide additional services to the customer. Infrastructure differences

Evaluate the customer's network, hardware, software, and existing applications. Please pay special attention to the following areas:

Is the server system and network robust enough to process the expected warehouse data stream?

If there are multiple data sources in multiple networks and locations, what type of network are they? How much data can be exchanged between local networks?

How can I access the data source? If you cannot directly access the data source, what level of IT support do you need from the customer to obtain the data?

Do your customers need to add new servers for the Data Warehouse? If so, what are the server specifications?

What types of data management and analysis tools are available in the warehouse? What types of data analysis services do these tools provide? Resource difference

The determination of resource differences includes evaluators, skills, domain knowledge, schedules, and budgets. Please pay special attention to the following areas:

Can customers provide the skills and personnel hour types required to support projects?

Is the customer's project budget sufficient to cover the entire business demand set? If not, which topic is the most important field to be included in the current budget? Project suggestions

By now, all your work has laid the foundation for project suggestions. It is recommended that the data warehouse project contain at least the following information:

Your understanding of the customer's business needs.

Your understanding of the customer's business difficulties.

Your understanding of the customer's information infrastructure and data environment.

The scope of the solution.

The subject of the customer's business issues.

The business and technical methods you use as the solution provider.

Business and technical assumptions applied to the project.

The final deliverable definition of the project.

If the customer has IT professionals who understand data modeling, IT is a good idea to include the initial data mart model in the recommendations to demonstrate how you capture business needs.

The complete project proposal is the basis of the final project contract signed with your customer. The most important thing is to include all necessary business and technical assumptions in the project proposal and contract so that both parties can have a good understanding of what is expected from the project. Any modifications to the project hypothesis may affect the project schedule and budget; making the customer aware of this and clear the statement will save a lot of trouble in the future.

It is recommended that the following project assumptions be included. Business hypothesis

What level of customer business knowledge is required for the project?

What level of customer business management support does the project need?

What level of IT professionals are required for the project?

What subject areas are involved in this project and what are the expected project deliverables? Technical hypothesis

What data needs to be transferred to the Data Warehouse?

How long does it take to update the data warehouse?

If shared data exists among multiple data sources, which one is the primary data source?

If there is no primary data source, what data integration business rules are there for shared data?

What if there is lost or dirty data? It is recommended to include detailed business rules for data repair. Your customers should correct the lost data in their data sources, but they may seek help. The rule of thumb is that you do not modify any customer data, but optimize the data.

What is the business definition of all data aggregation in a data warehouse? Your customers need to provide this information. Make sure that the specified deadline is specified. Technical Introduction

In the project recommendations, you should generally describe the technology to be used in the solution. If a project suggestion statement exists, make sure you plan a brief technical demonstration for the customer. If the customer can see what the project will eventually get, it will be extremely helpful. Development Project Plan

After the customer signs the solution proposal, the next step is to create a real project plan, which contains as many project details as possible. This plan will clearly record all expectations of both parties, so the customer knows from what you expect and what you need from them. It is a good idea to involve the customer as much as possible in the development project. Without the understanding and support of the customer, the plan is not really a plan. The project plan should contain the following elements. Project Scope

Data warehouse project plans share many things with typical IT project plans. However, the data warehouse project also has some unique features:

Data Warehouse targets are usually defined using common statements. Data Warehouse development needs should not be too specific. If they are too specific, they can affect the design of the Data Warehouse. It is possible to exclude seemingly unrelated factors that may be critical to the analysis performed.

One of the main reasons for defining the scope of a project is to prevent the entire lifecycle from changing when new requirements emerge. In data warehousing, be careful when defining the scope. You need to prevent the target from changing with new requirements. However, the two keys to valuable data warehouses are their flexibility and the ability to process unknown queries during design. Therefore, when defining the scope, it is important to understand that the delivered data warehouse is likely to be wider than the one specified in the initial requirement.

Due to the repetitive nature of the data warehouse project, the project scope may only include the most important or urgent subject areas. However, remember that the advanced data warehouse design should cover all business theme areas.

The main purpose of a data warehouse is to analyze data. Do not confuse the operation target with the information target of the data warehouse. Infrastructure Plan

The data warehouse infrastructure plan describes software, hardware, data networks, and other elements that support data warehouses. The infrastructure plan is based on the difference analysis and project budget. Personnel Plan

Once the customer approves the project plan, the entire project team selected for the solution should be integrated. The skill and personnel plan should include the following details:

Describes the required skills, detailed responsibilities, and schedule of each team member. Key team members should always have a backup.

The official definition of exceptions, such as changes in the project scope or project team members.

The data warehouse team should include:

The project manager is responsible for managing and coordinating solution interactions between you and your customers.

Experts in the field to provide business knowledge for data warehouse design.

End users are responsible for testing and verifying the warehouse design and implementation.

Data Warehouse architect is a key person in data discovery and data warehouse design. At least one experienced data warehouse architect is very important to be involved in a successful data warehouse project. The architect is usually from the provider of the warehouse solution.

Data modeler is responsible for modeling data in logical and physical warehouses.

ETL developers are responsible for ETL design and development.

This is a list of roles. A person can have multiple roles in a data warehouse project. For example, data warehouse architects and data modelers can be the same person, while field experts and end users can be the same group of people. Design and development plan

Develop a comprehensive plan for warehouse solution design, development and testing based on available skills and experience. All technical members should participate in the creation of the project plan, because only they know what it will take to complete the plan. The plan should include:

A comprehensive list of required hardware, software, and documentation

A detailed list of deliverables (such as organization charts, Data, formats, etc.) provided by the customer at different stages or within the project time range)

A comprehensive schedule for project design, development, and testing activities

A detailed list of deliverables (including documents, training materials, and solutions) that you will provide at different stages or within the project time range)

A comprehensive list of project dependencies, assumptions, and risks with backup plans. Project checkpoint plan

You should work with the customer on the project checkpoint plan. Some customers actually use this plan to agree to the project step by step. The plan should include:

A comprehensive timetable that includes your and your key checkpoints.

A comprehensive list of project deliverables for each checkpoint. Deployment and user acceptance test plan

Before deploying all or some of the solution deliverables to the customer's production environment, you should perform the User Acceptance Test (UAT ). UAT is an extremely formal test process; it is the formal approval of the customer for the deliverable of your project. UAT may take a lot of time for end users, because you may need to train them before they can start UAT. The plan should include:

The final deliverable of the project and its deployment schedule.

Train End users for solutions.

UAT schedule. Customer Education Plan

Customer education is part of each phase of the Data Warehouse project. It is very important for end users to participate in the solution development process because they can correct errors in the early stages and learn a lot about how to use the solution. The customer education plan should include:

The end user list assigned to the project and its project schedule.

The list of deliverables of the main checkpoint project (including user documents) and the schedule.

Formal user education implementation timetable. Financial and technical risk assessment

If there are not enough experienced and skilled personnel to participate, the Data Warehouse project is a high-risk business. Let experienced colleagues in your organization view the data warehouse project plan to ensure:

The technical risks of the project are quite low.

The project schedule is feasible.

Projects will be profitable.

After this evaluation, check the project plan with your customer and create a project plan with consistent consent.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.