A review of data warehouse and OLAP Technology

Source: Internet
Author: User
Tags tools and utilities

1. Introduction

Broadly speaking, a data warehouse is a type of database, which is maintained separately with the operational database of the Organization. The data warehouse system allows various application systems to be integrated to provide a solid platform for unified historical data analysis and support information processing.

Data warehousing is acollection of demo-support technologies, aimed at enabling the knowledgeworker (executive, manager, analyst) to make better and faster decisions.

A data warehouse is a "subject-oriented, integrated, time-varying, non-volatile collection of data that is usedprimarily in organizational demo-making" (William H. Inmon, 1996)

Be sure to differentiate data warehouse and datawarehousing (the process of building and using a data warehouse ).

Four keywords, topic-oriented, integrated, time-varying, and non-loss-prone, differentiate data warehouses from other data storage systems.

Topic-oriented: Data Warehouse focuses on Data Modeling and Analysis of decision makers; integrated: constructing a data warehouse inherits multiple heterogeneous data sources; Time-Varying: data storage provides information from a historical perspective (such as the past 5 to 10 years); Non-Easy: Data Warehouses are physically separated to store data (It only requires two data access operations: data initialization and data access)

Data Warehouses support on-line analytical processing, which is different from operating databases.Supported online transaction processing (on-line transaction processing ).

Note: distinguish OLTP from OLAP:

The main task of online database operations is to execute online transactions and query processing. Therefore, OLTP is oriented to customers (such as student scores). It usually manages the current data and uses the ER model and application-oriented database design. The access source is mainly composed of short atomic transactions.

OLAP is intended for knowledge workers and is used for data analysis. The OLAP system manages a large amount of historical data and provides a collection and clustering mechanism. Generally, the star and snowflake models and topic-oriented database design are used, most accesses to the OLAP system are read-only operations. Therefore, query throughput and response time are more important than transaction throughput.

To facilitate complex analysis and visualization, data in a data warehouse is usually modeled in multiple dimensions. Dimensions are hierarchical, such as day-month-quarter-year, and product-category-industry.

OLAP operations derollup (increasing the level of aggregation) and drill-down (decreasing the levelof aggregation or increasing detail) along one or more dimension hierarchies, selection (selection and projection ), and aggregate (re-orienting themultidimen1_view of data ).

Data Warehouses can be implemented on standard or extended relational database management systems, known as Relational OLAP (ROLAP) servers. In contrast, the multi-dimensional OLAP (MOLAP) server uses a special data structure to directly store multi-dimensional data.

 

2. Architecture and end-to-end Processing


Figure 1: Data warehouse architecture



Figure 2: a readable data warehouse architecture


A three-tier architecture is usually used: Front-End Tool (top-level)-OLAP Server (Middle Layer)-data warehouse server (bottom layer ).

The underlying data warehouse server is usually a relational database system. The middle-layer OLAP Server is typically implemented as a ROLAP model or MOLAP model. The top layer is the front-end client for data analysis and mining (such as trend analysis and prediction ).

 

3. backend tools and utilities

Backend tools are used to extract, clean, load, and refresh data. Data Extraction, usually collected by multiple heterogeneous external data sources; data cleaning, detection of errors in data, may be correction of their crops; data loading, sort, summarize, merge, and calculate views, check integrity, and create indexes and partitions. Refresh and disseminate updates from data sources to data warehouses.

 

4. Conceptual Models and front-end tools

In a multidimen=datamodel, there is a set of numeric measures that are the objects of analysis. examples of such measures are sales, budget, revenue, inventory, ROI. each ofthe numeric measures depends on a set of dimensions, which provide the contextfor the measure. for example, the dimensions associated with a sale amount canbe the city, product name, and the date when the sale was made. each dimensionis described by a set of attributes.


Figure 3: Multidimensional Data Model


5. Database Design Methods

Here we will discuss the design of the relational database pattern that affects multi-dimensional data attempts. Most data warehouses Use star schema to represent multidimensional data models. The database includes a fact table. The fact table contains all the dimensions, and each item points to each dimension table. Different columns in each dimension table indicate different attributes of the dimension.


Figure 4: Star mode example


The snowflake mode (snowflakeschema) is a variant of the star mode, in which some dimension tables are normalized, so data is further decomposed into additional tables. Dimensional tables in Snowflake mode may be normalized to reduce redundancy once, which is easy to maintain and saves storage space.


Figure 5: snowflake mode example


Complex applications may require multiple fact tables to share dimension tables. This mode can be seen as a collection of star patterns and thus called a fact constellation (fact constellation ).


6. Indexing Technology

A data warehouse may contain a large amount of data, so it is necessary to optimize the query response. First, data warehouses use redundant structures, such as indexes and materialized views ). In addition, you can use parallelization to optimize the query response time. You can use Bitmap indexes and connection indexes to index OLAP data.

Index Structure

Bitmapindexing is an alternative representation of the recordid (RID) List during bitmap indexing. The popularity of join indexing is derived from its application in relational database query processing.


Figure 6: bitmap index example


Materialized methods and OLAP index structure are designed to speed up data cube query processing.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.