First, Basic Concepts
Dimension (Dimension): is a specific angle that people observe data, is a kind of attribute when considering a problem, attribute set constitutes a dimension (Time dimension, Geography dimension, etc.). Level of
dimension: people looking at a particular angle of data (that is, a dimension) can also have various descriptive aspects of varying degrees of detail (Time Dimension: date, month, quarter, year). Member of the
dimension: A value of the dimension that is a description of the position of the data item in a dimension. ("One day of the year" is a description of the position on the time dimension).
Metric (Measure): The value of a multidimensional array. (January 2000, Shanghai, notebook computer, 10000).
Cubes: is the backbone of decision analysis, the core of OLAP, sometimes referred to as cubes or super cubes. OLAP presents a multidimensional view of the user. Cubes can be represented by a multidimensional array.
Multidimensional Array: A combination representation of a peacekeeping variable. A multidimensional array can also be expressed as: Dimension 1, Dimension 2, ... dimension n, observing variables. (Time, region, product, sales).
Data Unit (cell): The value of a multidimensional array. (January 2000, Shanghai, notebook computer, ¥100000).
The concept of
two, the guidelines
Online Analytical Processing (OLAP) was first proposed by the parent of the relational database, E.f.codd, in 1993, and he also presented the 12 guidelines on OLAP:
Benchmark 1 OLAP models must provide multidimensional Concepts view
Guideline 2 transparency Guidelines
Guideline 3 access capability Inference
Guidelines 4 stable reporting capabilities
Guidelines 5 client/server architecture
Guideline 6 dimensional equivalence guideline
Guidelines 7 Dynamic Sparse matrix Processing Guidelines
Guidelines 8 Multi-user support competency Guidelines
Guideline 9 non-restricted trans-dimension operations
Guideline 10 intuitive data manipulation
Guidelines 11 Flexible report Generation
Guideline 12 unrestricted dimension and aggregation hierarchy
Main features of online analysis processing Point, is directly modeled on the user's multi-angle thinking mode, in advance for the user to build multidimensional data model. Here, the dimension refers to the user's angle of analysis. For example, the analysis of sales data, the time period is a dimension, product categories, distribution channels, geographical distribution, customer groups are also a dimension of the class. Once the multidimensional data model is completed, the user can get the data from each analysis angle quickly, and can dynamically switch between each angle or carry on the multi-angle synthesis analysis, which has the great analysis flexibility. This is the basic reason that the on-line analytical processing has been paid much attention in recent years, it has the essential difference from the old management information system from the design idea and the real realization.
Three, common analysis methods
OLAP's basic Multidimensional Analysis operations are drillthrough (drill-up and Drill-down), slices (Slice) and Dice (Dice), and rotation (PIVOT).
Drillthrough: is to change the dimensions of the dimension and transform the granularity of the analysis. It includes a drill down (Drill-down) and a drill up (drill-up)/Roll Up (roll-up). Drill-up is a one-dimensional generalization of low-level detail data to a higher level of aggregated data, or a reduction in dimensions, whereas Drill-down, in contrast, observes or adds new dimensions from the aggregated data to the detail data.
slices and cubes: to care about the distribution of measurement data on the remaining dimensions after a value is selected on a part of the dimension. If the remaining dimension is only two, it is a slice; if there are three or more, it is diced. For example: You can slice the time dimension by "time = January 2002" To get a slice of the product and store two latitude. The slice operation defines the child by selecting two dimensions or multiple dimensions. Such as: Can press "time = January 2002" and "city = Shanghai" to cut.
Rotation: is the direction of the transformation dimension, that is, rearranging the dimension's placement in the table (for example, row and column swaps).
four, storage format
OLAP systems can be divided into relational OLAP (RELATIONALOLAP, referred to as ROLAP) according to the data storage format of their storage. multidimensional OLAP (MULTIDIMENSIONALOLAP, referred to as MOLAP) and hybrid OLAP (HYBRIDOLAP, HOLAP) three types.
1.ROLAP
ROLAP stores the parsed multidimensional data in a relational database and, depending on the needs of the application, defines a batch of real views as tables also stored in the relational database. Instead of saving every SQL query as a real view, define only those queries that have a higher frequency of application and compute a larger workload than the real view. For each query against an OLAP server, prioritize the calculated real view to generate query results to improve query efficiency. The RDBMS, also used as ROLAP storage, is optimized for OLAP, such as parallel storage, parallel queries, parallel data management, cost-based query optimization, bitmap indexing, SQL OLAP extensions (CUBE,ROLLUP), and so on.
2.MOLAP
MOLAP Physically stores the multidimensional data used by OLAP analysis as a multidimensional array, forming a "cube" structure. The attribute values for the dimension are mapped to the subscript or subscript range of the multidimensional array, and the summary data is stored as the value of the multidimensional array in the cell of the array. Because MOLAP uses a new storage structure, which is realized from the physical layer, it is also called physical OLAP (PHYSICALOLAP), while ROLAP is implemented mainly through some software tools or intermediate software, the physical layer still uses the storage structure of the relational database, Therefore, it is called virtual OLAP (VIRTUALOLAP).
3.HOLAP
because MOLAP and ROLAP have their own advantages and disadvantages, and their structures are quite different, this presents a challenge for the analyst to design an OLAP structure. To this end, a new OLAP structure--hybrid OLAP (HOLAP)--is proposed to combine the advantages of MOLAP and ROLAP two structures. To date, there has not been a formal definition of HOLAP. However, it is obvious that the HOLAP structure should not be a simple combination of MOLAP and ROLAP structure, but a combination of the advantages of these two kinds of structures, which can satisfy all kinds of complicated analysis requests of the users.
Five, epilogue
from the application point of view, the Data Warehouse system can also use the traditional report, or use the data mining means such as mathematical statistics and artificial intelligence, in addition to the on-line analysis processing, which covers a wider scope; As far as the scope of application is concerned, Online analytical processing is often based on user analysis of the theme of the application segmentation, such as: Sales analysis, marketing analysis, customer profitability analysis, and so on, each analysis of the theme of an OLAP application, and all of the OLAP application is actually only part of the Data Warehouse system.