Some knowledge about OLAP-background required by the next project

Source: Internet
Author: User
Tags types of tables snowflake schema

1. dimension is a specific angle for people to observe the topic. Each dimension is described by a table, called a dimension table. It is a detailed description of the dimension.

2. fact indicates the topic of interest. It is also described by a table. It is called a fact table. It is mainly characterized by numerical data (FACTS ), these numeric data can be summarized to provide information about the operation history.

3. Each fact table includes an index composed of multiple fields. The Index consists of the primary keys of the dimension table. The primary keys of the dimension table can also be called dimension identifiers. Fact tables generally do not contain descriptive information. dimension tables contain information describing fact table fact records. Multi-dimensional data structures formed between multiple dimension tables reflect the multi-dimensional nature of data in space, also known as "multi-dimensional cubes ".

DW's multidimensional data model is divided into three types: Star mode, snowflake mode, and fact constellation mode. The following sections describe each other.

1.Star Mode(Star Schema)
Most DW data streams use the star mode. The structure of the star mode includes the following three points:
● A fact table containing large amounts of data without redundant data;
● Several dimension tables with relatively less data;
● Each dimension independently forms a dimension table. Each dimension table has a dimension identifier associated with the fact table, and its Graphical description is star.

2.Snowflake Mode(Snowflake schema)
Dimensions are generally hierarchical or lattice structures. In Example 2, the product hierarchy is commodity → category. Each tuples in the product table indicate the category and category of the product; the store hierarchy is store → city → province → country → continent. Each tuples in the store table indicate the city, province, country, and continent of the store. The date hierarchy is day → month → year, each tuple in the date table indicates the month and year to which "day" belongs. In star mode,Large data redundancyThe snowflake mode should be used.

The snowflake mode is an extension of the star mode,It is actually a normalization of the star mode.. In snowflake mode, dimension tables in star mode are further layered. The original dimension tables may be extended to small fact tables to form partial "hierarchical" areas.It minimizes data storage and combines smaller dimension tables to improve query performance.

The snowflake mode increases the number of tables that users must process and the complexity of some queries. But this can make the system more professional and practical, while reducing the general degree of the system.

 

3.Fact constellation Mode(Fact constellation schema)
The fact constellation mode refers to the fact tables with multiple fact tables.Share some dimension tables. Actually,The fact constellation mode is a combination of the star and snowflake modes.

 

 

The OLAP concept has the following five features:
(1) fast: Users have high requirements on the Rapid Response of OLAP. The system should be able to respond to most of the user's analysis requirements within 5 seconds.
(2) testability: the OLAP system shouldAble to process any logic analysis and statistical analysis related to applications.
(3) Sharing: among a large number of usersAchieve potential sharing of Secret DataRequired security requirements.
(4) multidimensional:Multidimensional Processing is a key feature of OLAP.. The system must provide multidimensional views and analysis of data.
(5) Information: No matter how large the data volume is or where the data is stored, the OLAP system should be able to obtain information in a timely manner and manage large-capacity information.
OLAP is a technical concept independent of DW. Its basic idea is that the company's decision makers should be able to flexibly manipulate the company's data, observe the company's status and understand the changes in multiple dimensions.

 

3. Basic concepts of OLAP
There are several basic concepts in OLAP:
(1)Object)
In analyticdb, the analytical objects that we focus on and focus on are called "objects ". Generally, an application has one or more objects, which constitute the focus of the analysis application. For example, in an analytic application of a chain store, one of the objects is the sales amount, which is the gathering point of this application analysis.
(2)Variable (variant)
A variable is the actual meaning of data, that is, the description of what the data is ". Generally, a variable is always a numerical metric. For example, "Number of people", "unit price", and "sales volume" are all variables, "100" and "200" are values of the variable.

(3)Dimension (dimension)
In Analytic Applications, objects can be analyzed and observed from different perspectives and different results can be obtained. "Dimension" is used to reflect the object's observation angle. For example, in the chain store example, the sales amount can have the following three dimensions :,
● Time dimension: analyzes and counts the sales amount by time period.
● Commodity dimension: the sales amount of different commodities can be analyzed and calculated by product classification.
● Regional dimension: You can analyze and calculate the sales amount of a chain store in different regions.
Dimensions have their own inherent features, suchHierarchy(Used for data aggregation and analysis ),Sort(Used to define variables) andComputing Logic(Matrix-based algorithms), these features are very useful for decision making.

(5)Dimension member
A value of a dimension is a member of the dimension. If a dimension is multi-level, the dimension members are composed of values of different dimension levels. For example, the time dimension consists of three layers: day, month, and year. Each value is combined on the day, month, and year, A member of the time dimension, "a month or a day", is obtained ".

(6)Multi-dimensional array
A multidimensional array can be expressed as follows: (Dimension 1, dimension 2 ,..., Dimension n, variable ). For example, if the product sales data is a three-dimensional cube organized by time, region, and sales channel, plus the variable "sales", it forms a three-dimensional array: (time, region, sales channel, sales ).
If you expand the dimension of a product, you will get a four-dimensional array: (goods, time, region, sales channel, sales ).

(7)Data Unit (cell)
The value of a multi-dimensional array is called a data unit.
When each dimension of a multidimensional array selects a Dimension member, the combination of these dimension members uniquely determines the value of a variable. In this case, the data unit can be expressed as: (Dimension 1 dimension member, dimension 2-dimension members ,..., Dimension n-dimensional Member, variable value ).
For example, the "toothpaste", "January 2004", "Shanghai", and "wholesale" members of products, regions, time and sales channels are used ", the only value that determines the variable "sales" (assuming 100000) can be expressed as: (toothpaste, January 2004, Shanghai, wholesale, 100000 ).

 

4. Comparison between OLAP and OLTP

 

 

 

ROLAP
ROLAP stores multidimensional data used for analysis in relational databases, and defines a batch of real views as tables based on application requirements.
ROLAP is mainly implemented through some software tools or intermediate software. The physical layer still adopts the storage structure of relational databases.Virtual OLAP(Virtual OLAP ).
Although data is stored in a relational format, data is provided in multiple dimensions for users. To hide the storage format, a semantic layer of metadata should be created, which will be mapped to the relational table. To improve response time, metadata should also be generated for summarized data or clustered data. All metadata is stored in a relational database that can be maintained and managed.

Basic Steps:
(1) Use star, snowflake, and hybrid modes to construct a dimension model.
(2) Add appropriate clustering and summary data.
(3) break down large databases into manageable parts to improve efficiency.
(4) Add the generated index or bit mode index to enhance the function.
(5) generate and store metadata. Metadata includes the definition of a dimension, the ing from a dimension to a relational table, the hierarchical relationship between dimensions, the definition and description of generalization and aggregation, formulas and calculations, and other data.
Procedure:
(1) construct customer tools using the application view or view of data.
(2) query OLAP from customer tools and check metadata in real time.
(3) create multiple select statements and/or related subqueries and submit them to the relational database.
(4) perform multi-dimensional functions on the database query results, such as formulas and calculations, from byte to application description conversion.
(5) return the result to the client tool for further processing and display, or display it immediately.
The main functions provided to users and administrators are::
(1) Business view of relational databases.
(2) Support at the dimension level.
(3) computing, statistics, and accounting functions, which can be expanded by users.
(4) Details.
(5) Select the frontend and backend tools.
(6) The database administrator can enhance the existing backup and recovery functions and provide database subsets for case analysis.
(7) use metadata navigation.
(8) multi-level security control with permissions.
Possible problems:
(1) This method allows data to be decomposed and processed in star and snowflake modes. This enhances functions, but the cost has a negative impact on the flexibility and scalability of relational databases. This makes it more difficult to modify the database and may require batch modification.
(2) unless it is a batch load, the star schema used and its deformation, and the aggregated and summarized data used assume that the data is static.
(3) perform row-level calculation. For example, when the profit is equal to the cost of income reduction, You need to replace rows and columns. This calculation is still restricted even if multiple select statements are used.
(4) managing and maintaining metadata is a long-term problem that consumes long-term costs.

ROLAP implementation
1. Divide the multi-dimensional structure of data into two types of tables:
Fact table (used to store measurement values of facts and code values of each dimension)
Dimension Table (dimension description, including dimension levels and member categories)
2. Two structures are designed:
Star Model
Snowflake Model


Note: In ROLAP, multi-dimensional data cubes do not actually exist. Generally, after accepting the customer's OLAP request, the ROLAP server needs to convert the SQL statement into a multi-dimensional access statement, the Join Operation is used to combine multi-dimensional data cubes. Therefore, the response time of ROLAP is long.


OLAP-MOLAP Based on Multi-dimensional database (mddb)
OLAP Server: stores OLAP Service software and multidimensional databases
Mddb storage: "hyper-cubes"
Mddb access: multidimensional operations
Basic design steps:
(1) select functions, such as sales revenue analysis and financial analysis reports.
(2) specify digital information, for example, store measurement information similar to sales revenue and customer.
(3) determine the dimension (time, region, product, etc.) and the granularity of each dimension. For example, the time is on a monthly or quarterly basis, and the region is based on the city, county, or region.
(4) define the logical model and load multi-dimensional data storage, which can be obtained directly from the data source, or filter and match the selected content of the data warehouse.
Main functions provided to users:
(1) Quick Response to connotation queries. Only quick response can ensure the process of analysis and thinking.
(2) interact with multiple databases to support applications such as prediction, pre-planning, and budgeting.
(3) explore the rich links between each dimension element or information to discover subtle relationships.
(4) powerful computing engines and comparative analysis, including classification, comparison, percentage classification, maximum value, minimum value, average value, average turnover rate, and period-based comparison.
(5) Cross-Dimension calculation, or row-based calculation for spreadsheet-based applications.
(6) comprehensive statistics and financial functions, such as cash transfers, trend analysis, and time series analysis.
(7) Smart timing, including the year, current time, finance, or internal calendar of the date.
(8) Ability to query detailed data at the underlying layer of the data warehouse.
(9) powerful navigation functions along single-dimension or multi-dimensional processing of datum points, tables, fine section and overview.
Possible problems:
(1) the dimension of the supported multidimensional database is smaller than that of the relational database. In this case, the sparse matrix technology can be used to save space. The side effect is to increase operations. Because general information and aggregation information are stored, the storage requirements are smaller than the original ones.
(2) storing data at a granularity level (summary, aggregated, pre-calculated, and derived data) will lead to side effects. For example, the detailed data level cannot be reached in detail section.
(3) High-Level Data has access and security control, but does not have access control based on the used permissions and subset level.
(4) Changing the dimension structure requires reorganizing the multi-dimensional database. Common backup and recovery functions are limited.
(5) The selection is restricted because a specific frontend and backend is required. However, after the multi-dimensional database is expanded, it cannot be transplanted to another multi-dimensional database.

Molap firstSort all foreign keysAnd write the sorted index values into the virtual multidimensional cube. Of course, Virtual Multi-dimensional cubes are conceived for ease of understanding. molap's actual data is stored in a data file, the order of data placement is the same as that of the Virtual Multi-dimensional cube in the order of X, Y, and Z coordinates (for example ). To facilitate data search, molap needs to create a dimension index in advance, which is placed in the summary file (outline) of molap.
The ROLAP mode is on the left and molap mode is on the right. The two correspond to the same 3D model.

Holap: Hybrid Online Analytical Processing
So far, there is no formal definition of holap. However, it is obvious that the holap structure should not be a simple combination of the molap and ROLAP structures, but an organic combination of the advantages of these two structures and technologies, which can satisfy users' various complex analysis requests.

  • For common dimension and dimension levels, use multidimensional data tables in holap to record.
  • Dimensions and data that are not commonly used are stored in a ROLAP star structure.
  • A large amount of detailed data can be stored in a relational database, while aggregation is kept in a separate molap storage.
  • Holap has a major performance between molap and ROLAP, and its technical complexity is higher than that of molap and ROLAP.
  • Holap benefits from the scalability of ROLAP and the fast computing of molap. (For example, ms SQL Server)
  • In a holap multi-dimensional data table, the data dimension is less than that in molap, and the data storage capacity is also less than that in molap.
  • Holap is slower than molap in terms of data access speed.






 

Some knowledge about OLAP-background required by the next project

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.