About Data Warehouse

Source: Internet
Author: User

A data warehouse is an environment rather than a product that provides current and historical data for decision-making support. This data is difficult or cannot be obtained in traditional operational databases. Data Warehouse Technology is a general term for various technologies and modules to effectively integrate operational data into a unified environment to provide decision-making data access. All you do is to make it easier for you to query the required information and provide decision-making support.

Composition of Data Warehouses

Data Warehouse database: it is the core of the entire data warehouse environment. It is the place where data is stored and provides support for data retrieval. Compared with the manipulation-type database, it features support for massive data volumes and fast retrieval technology.

Data extraction tools: extract data from various storage methods, perform necessary conversion and sorting, and store the data in a data warehouse. Access to different data storage methods is the key to data extraction tools.ProgramMVS job control language (jcl), Unix scripts, and SQL statements to access different data. Data conversion includes deleting data segments that are meaningless to decision-making applications, transforming them to uniform data names and definitions, calculating statistics and derived data, and assigning missing value data to default values; unify different data definitions.

Metadata: Metadata describes the structure and creation of data in a data warehouse.CubeData. It can be divided into two categories by purpose, technical metadata and commercial metadata.

Technical metadata is the data used by the design and management personnel of the data warehouse for development and daily management. Including data source information, description of data conversion, definition of objects and data structures in the data warehouse, rules used for data cleaning and data updating, and ing between source data and target data; user Access Permissions, data backup history, data import history, and Information Release history.

Commercial metadata describes the data in a data warehouse from the business perspective. Including: Business topic description, including data, queries, and reports;

Metadata provides an information directory to access the data warehouse. This directory fully describes the data in the data warehouse, how the data is obtained, and how to access the data. It is the center for data warehouse operation and maintenance. The data warehouse server uses it to store and update data. Users can use it to understand and access data.

Access Tool: provides a means for users to access the data warehouse. Data Query and report tools, application development tools, Management Information System (EIS) tools, online analysis (OLAP) tools, and data mining tools are available.

Data Marts: A part of data that is independent from a data warehouse for a specific purpose or scope of application, also known as Department data or subject data (subject area ). In the implementation process of the data warehouse, you can often start from the data mart of a department, and then use several data marketplaces to form a complete data warehouse. It should be noted that when different data marketplaces are implemented, field definitions of the same meaning must be compatible, so that subsequent implementation of data warehouses will not cause great trouble.

Data Warehouse Management: security and privilege management; tracking data updates; data quality inspection; managing and updating metadata; auditing and reporting data warehouse usage and status; deleting data; copy, split, and distribute data, backup and recovery, and storage management.

Information Publishing System: send data or other related data in the data warehouse to different locations or users. The web-based information publishing system is the most effective way to deal with multi-user access.

Create a data warehouse
Why do we need to build a data warehouse:

Business: use all possible data to quickly and correctly make decisions. Users are experts in the business field, rather than computer professionals. Enterprise Data doubles every 18 months, there is a need for an effective way to access this data; competition is intensifying in terms of business intelligence and effective enterprise data.

Technically: the computing power of computers is getting cheaper (the price of MIPs is falling), the price of storage media is falling, the bandwidth is increasing, and the transmission capacity of networks is getting cheaper; the computer environment of enterprises is becoming more and more complex. application systems of different manufacturers exist at the same time in various times. New applications need to access data of other applications.

Precautions for Implementing Data Warehouses:

Commercial (considering ROI)

Implementation steps: from top to bottom or from bottom to top

Human resources: training or employment

Design (think big, but start small)

Many types of data sources may be used. Historical data may be "old" and the database may become very large.

Compared with OLTP, data warehouses are more business-driven than it-driven. They need to constantly communicate with end users, the creation process may never end.

Key points:

1) The data warehouse should contain detailed data (cleaned ).

2) any data that the user can see should have a corresponding description in the metadata.

3) consider how the data in the data warehouse is distributed across servers when the data volume rapidly increases to the point where the data in a single server cannot be stored, by topic, geographical location, or time? These policies have a significant impact on the performance of the entire data warehouse.

4) should you note that the metadata formats supported by the tools are compatible with those supported by the data warehouse design tools? Can metadata formats of different tools be freely converted?

5) the use of the data warehouse by the end user has a great impact on the performance of the Data Warehouse. To improve the performance of the data warehouse model, the user's use of the data warehouse should be taken into account.

Nine steps for designing a data warehouse:

1) select an appropriate topic (the domain in which the problem is to be resolved)

2) Clearly define the fact table

3) confirm and confirm dimensions

4) Choosing the facts

5) Calculate and store derivative data segments in the fact table

6) Rounding out the dimension tables

7) Choosing the duration of the database

8) the need to track slowly changing dimensions

9) determine the query priority and query mode.

Technically

Hardware Platform: the hard disk capacity of the Data Warehouse is usually 2-3 times of the hard disk capacity of the operating database. Generally, the mainframe has more reliable performance and stability, and is easy to combine with legacy systems; while PC or Unix servers are more flexible, it is easy to operate and provides the ability to dynamically generate query requests for queries. When selecting a hardware platform, do you have to consider: Do you want to provide parallel I/O throughput? What is the multi-CPU support capability?

Data Warehouse DBMS: its ability to store large data volumes, query performance, and support for parallel processing.

Network Structure: The implementation of the data warehouse will produce a large amount of data communication in that part of the network segment, and there is no need to improve the network structure.

Implementation

Steps for creating a data warehouse:

1) Collect and analyze business requirements

2) establish a data model and physical design of the Data Warehouse

3) define the data source

4) Select the Data Warehouse Technology and Platform

5) extract, purify, and convert data from operational databases to Data Warehouses

6) Select Access and report tools

7) select database connection Software

8) Select data analysis and data presentation software

9) update the data warehouse

Data Extraction, cleaning, conversion, and migration

1) Data Conversion tools should be able to read data from various data sources.

2) supports flat files, index files, and legacy DBMS.

3) data can be integrated with different types of data sources.

4) standardized data access interfaces

5) it is best to have the ability to read data from the data dictionary.

6) generated by the toolCodeIt must be maintained in the Development Environment

7) Only the specified data that meets the specified conditions and the specified part of the source data can be extracted.

8) data type conversion and Character Set conversion during extraction

9) generate derivative fields during extraction

10) allows the data warehouse management system to automatically call to regularly extract data, or generate flat files based on the results.

11) the vitality and product support capabilities of software vendors must be carefully evaluated.

Major Data Extraction Tool suppliers: prism solutions. Carleton's passport. Information builders Inc.'s EDA/SQL. SAS Institute Inc.

Metadata

Is the center for the operation and maintenance of the entire data warehouse environment. All kinds of software and tools need to access this part of data. Prim provides Directory Manager to develop and manage metadata.

User

Casual user: you only need to access predefined queries and generate reports. You don't need any tools, just take a look.

Power User: You usually need to define some simple queries or combine predefined queries. Drill-Drown is usually required. Which data query tools are required by such users for non-computer personnel.

Expert: These users usually need to define complex queries and directly analyze various types of data stored in the data warehouse.

Benefits of building a data warehouse

Data Warehouse applications include data positioning, data presentation (reports and charts), test hypothesis, knowledge discovery, and shared analysis.

Tangible benefits

Improves product inventory control, reduces promotion fees for common products, makes more efficient decisions, and provides a big picture for the entire enterprise.

Intangible benefits

Put all the data in one place to facilitate access and improve production efficiency; reduce repeated data processing and analysis; increase user types; and provide support for business processes.

Computer World developer club http://dev-club.esnai.com

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.