(Reproduced from Peking University Gaoke website, http://www.pku-ht.com /)
The concept of Online Analytical Processing (OLAP) was first proposed by E. F. codd, the father of relational databases, in 1993. He also proposed 12 principles for OLAP. The proposal of OLAP has caused a great deal of response. as a type of product, OLAP is clearly distinguished from the OLTP.
Today's data processing can be roughly divided into two categories: online transaction processing OLTP (on-line transaction processing) and Online Analytical Processing OLAP (on-line analytical processing ). OLTP is the main application of traditional relational databases, mainly for basic and daily transaction processing, such as bank transactions. OLAP is the main application of the data warehouse system. It supports complex analysis operations, focuses on decision support, and provides intuitive and easy-to-understand query results. The following table lists the comparison between OLTP and OLAP.
|
OLTP |
OLAP |
User |
Operator and lower-level management personnel |
Decision makers and senior management personnel |
Function |
Routine operations |
Analysis and decision-making |
DB Design |
Application-oriented |
Subject-oriented |
Data |
Current, latest, and two-dimensional discrete |
Historical, aggregated, multi-dimensional, and unified |
Access |
Read/write dozens of records |
Read millions of records |
Work Unit |
Simple transactions |
Complex Query |
Number of users |
Thousands |
Hundreds |
DB size |
100mb-gb |
100gb-tb |
OLAP is a kind of software technology that enables analysts, managers, or executors to quickly, consistently, and interactively access information from multiple perspectives to gain a deeper understanding of data. OLAP is designed to meet decision-making support or specific query and report requirements in multi-dimensional environments. Its core technology is the concept of "dimension.
"Dimension" is a high-level classification from the perspective of observing the objective world. Dimensions generally contain hierarchical relationships, which are sometimes quite complex. By defining multiple important attributes of an object into multiple dimensions, you can compare data in different dimensions. Therefore, OLAP is also a collection of multidimensional data analysis tools.
OLAP basic multidimensional analysis operations include drilling (roll up and drill down), slice, dice, rotation, drill SS, and drill through.
· Drilling is used to change the dimension hierarchy and the granularity of analysis. It includes roll up and drill down ). Roll up summarizes low-level detailed data to high-level summary data in one dimension, or reduces the dimension, it goes from summarized data to detailed data to observe or add new dimensions.
· Slice and slice are concerned with the distribution of measurement data on the remaining dimension after a set value is selected on some dimensions. If there are only two remaining dimensions, the slice is used; if there are three, the slice is used.
· Rotation is to change the direction of the dimension, that is, to reschedule the placement of the dimension in the table (for example, row-column Interchange ).
OLAP has multiple implementation methods. Different data storage methods can be divided into ROLAP, molap, and holap.
ROLAP indicates the relational database-based OLAP implementation (Relational OLAP ). With relational databases as the core, multidimensional data is represented and stored in a relational structure. ROLAP divides the multidimensional structure of a multi-dimensional database into two types of tables: fact tables used to store data and dimension keywords, and dimension tables, that is, at least one table is used for each dimension to store the description information of dimension levels, member categories, and other dimensions. A dimension table is associated with a fact table by the primary keyword and the external keyword to form a "star mode ". For complex hierarchical dimensions, to avoid occupying too much storage space for redundant data, you can use multiple tables to describe this star mode extension called "Snowflake mode ".
Molap indicates the implementation of OLAP based on multi-dimensional data organization ). Taking multi-dimensional data as the core, that is, molap uses multi-dimensional arrays to store data. Multi-dimensional data will be stored"CubeBlock (cube) structure. In molap, "Rotate", "cut", and "slice" of "cube" are the main technologies used to generate multidimensional data reports.
Holap indicates the OLAP implementation (Hybrid OLAP) based on the hybrid data organization ). For example, the lower layer is relational and the higher layer is multi-dimensional matrix. This method provides better flexibility.
There are other ways to implement OLAP, such as providing a dedicated SQL Server and providing special support for SQL queries in some storage modes (such as star and snowflake.
OLAP is an online data access and analysis tool for specific problems. It analyzes, queries, and reports data in multiple dimensions. Dimension is a specific angle for people to observe data. For example, when considering the sales status of a product, an enterprise usually observes the sales status of the product from different perspectives of time, region, and product. The time, region, and product here are dimensions. Different combinations of these dimensions and multidimensional arrays composed of the measured indicators are the basis of OLAP analysis, which can be formally expressed as (Dimension 1, dimension 2 ,......, Dimensions N, metrics), such as (Region, time, product, sales ). Multi-dimensional analysis refers to the use of slice, dice, drill-down, and roll-up for multi-dimensional data) in order to analyze the data, users can observe the data in the database from multiple perspectives and aspects, so as to gain a deep understanding of the information contained in the data.
According to the different organization methods of comprehensive data, currently common OLAP mainly includes multi-dimensional database molap and relational database-based ROLAP. Molap organizes and stores data in multiple dimensions, while ROLAP uses the existing relational database technology to simulate multi-dimensional data. In data warehouse applications, OLAP applications are generally the front-end tools of Data Warehouse applications. At the same time, OLAP tools can be used together with data mining tools and statistical analysis tools to enhance the decision analysis function.