ssas--Foundation--cube

Source: Internet
Author: User
Tags snowflake schema

I. Analysis Services  

Analysis Services is the data engine for decision support and BI solutions. It provides analysis data that is used in reports and clients.

It can create high-performance query structures, business logic, and KPIs (Enterprise key performance indicators) in a multipurpose data model that can be accessed by any client program that supports Analysis Services as a data source.

Creation of a multipurpose data model: Use SQL Server data Tools and select the tabular or multi-and data mining project template.

Data population for multipurpose data models: typically data warehouses

Use more as a data model: deploy it in an Analysis Services instance that runs the database in a specific server mode. And make the data available to authorized users of the application connection.

Examples of Analysis Services:

    • Tabular instances, running tabular models
    • Multi-and data mining instances, running OLAP multi-and data mining models (default)
    • PowerPivot for SharePoint runs the PowerPivot or Excel data model in SharePoint. 

Ii. SSAS Architecture

SSAS uses server components and client components to provide online analytical processing (OLAP) and data mining capabilities for business intelligence programs.

    • Server components: The server component of SSAS is Application Msmdsrv.exe, which is implemented as a Microsoft Windows service. Multiple instances from one computer are supported, and each Analysis Services instance is implemented as a separate instance of Windows service. The application contains many components, including XMLA listener components, query processor components, and so on.
    • Client components: Communicate with Analysis Services using XMLA (XML for analysis). XMLA SOAP-based protocol for issuing commands and receiving responses. Query Language: SQL,MDX (an industry standard query language for analysis), DMX (query language for the data mining industry)
    • OLAP Support: Allows users to design, create, and manage multidimensional structures of data aggregated by multiple data sources. Multidimensional Analysis is the core of OLAP.
    • Data mining capabilities: Allows users to design, create, and manipulate data mining models.

Three, the basic concept of multidimensional data analysis:

  Observation variables: such as number 1000

 Measures: Numbers without context are data, not information, and the first thing to do when looking for digital information is to determine the metrics of the data, such as sales, sales, and so on. By adding tags, numbers change from data to information, which is metadata, and one way to convert data into information is to add metadata. the table in which the measure is located is the fact table, and there can be only one fact table in the regular cube structure.

Dimension: People observe the angle of the objective world, is a kind of high-level classification. Dimensions contain hierarchical relationships, which can be used as multiple dimensions for several important attributes. such as time dimension, Product dimension, model dimension and so on. Time dimension, the product sales volume changes over time. The table that contains the dimension information is called the dimension table.

Dimension members: One of the values of a dimension is called a dimension member. such as a certain day of the year on a Time dimension. A dimension member does not have to take a value at each dimension level, such as a year of the month.

Hierarchies: The collection of dimension members and their relative positions. such as the time dimension of the month and day three levels. The street number of the metropolitan area of the state city in the geographical dimension.

Cube: A collection of data that is constructed from a subset of the Data warehouse.

  

Dimension table: A table that contains information about a dimension. A dimension table consists of a primary key and a dimension attribute, which is the column property of the dimension table; Product dimension table: prod_id, Product_Name, Category, Color, Size, Price Time dimension table: TimeKey, Season, year, Month, Date. The primary key of the dimension table is the integer value, in order to save the storage space of the fact table.

Fact table: The table that contains the measure values. such as Sales Fact table: prod_id (Reference Product dimension table), TimeKey (Reference Time dimension table), SalesAmount (total sales, in currency), Unit (volume)

Iv. multidimensional data model and structure

At present, the main data Warehouse modeling is divided into two kinds, one is Entity Relationship modeling (Entity-relationship Modeling) and Dimension Modeling (Dimension Modeling). Here we discuss dimension modeling, which is the core of the Data Warehouse, and the model is very important.

1. Conceptual model

  The conceptual model is also what is often called demand analysis. That is, in the process of communicating with the user, determine the information that the Data warehouse asks for access, including current, future, and historical-related data.

  The goal of the conceptual model is to create a topic-oriented information package diagram. Because the hypercube lacks the intuition in the performance, especially when the dimension exceeds the three-dimensional, the data collection and the representation are more difficult. So the goal of the packet graph is to expand the hypercube on the plane, and use the two-dimensional table to represent multidimensional features. It is necessary to determine three main contents of the conceptual model using the information package diagram, which is an example of the analysis of product sales and Forecast topic:

a) determining dimensions

Get a multidimensional analysis of sales data in an e-marketplace to determine the dimensions that affect sales: time, goods, stores, customers

b) Identify categories

Analyze each dimension to determine the transfer and mapping relationship between it and the category. such as time of year, quarter, month, week, day and so on. Commodities have large categories, small categories, specific goods and so on.

c) Identification of indicators

Determine the user needs of the indicator system, the indicator is the user's most related information. According to the sales situation as the basis to determine the relevant sales indicators, such as the current sales volume, the current sales, cumulative sales, cumulative sales and so on.

2. Logical model

Using Star Graph modeling technology can establish a perfect logical model for data warehouse. A star chart is used to describe data warehouse requirements, and therefore includes three logical entities, dimensions, metrics, and detailed categories. It is known by the packet graph that each cell within a dimension is a category that represents a detailed hierarchy within that dimension.

A) Star-mode

Star schema is a multidimensional data model consisting of a fact table and a dimension table logically surrounding the fact table. The fact table is the core of the star pattern, which holds a large number of business-nature facts, and the fact table contains the metric attribute and the outer code that points to the surrounding dimension table, and one fact in the fact table points to a tuple in each dimension table. The large amount of data stored in a fact table is a measure of the object that is closely related to the subject and which the user is most concerned about. The user relies on the dimension attribute in the dimension table to query and analyze the data in the fact table and get the data that supports the decision.

b) Snowflake mode

Snowflake mode is a further extension and normalization of the star schema, which is to divide the dimension table associated with the fact table into a direct associated master dimension table and a Zevi table associated with the primary dimension table. The snowflake pattern adds hierarchy to the star pattern, which embodies the different granularity of the dimension.

c) Galaxy mode

When there is a common dimension between multiple topics, you can connect the fact table by sharing the dimension table.

Comparison of the advantages and disadvantages of snowflake and star patterns:

From the data format: Snowflake uses normalized data, and star uses inverse normalization data.

From the loading content, the star model loads the dimension tables and does not need to add a satellite model between the dimensions; the snowflake model loads the data mart.

From the level of ETL complexity: the star Model ETL is relatively simple, and can achieve a high degree of parallelism.

From the query performance point of view, in the OLTP-DW link, because the Snowflake form to do multiple table joins, performance will be lower than the star schema, but from the Dw-olap link, because the snowflake schema is more conducive to the aggregation of measures, so performance is higher than the star schema.

In terms of the complexity of the model, the star schema is simpler.

From the hierarchical concept, the snowflake architecture is closer to the structure of the OLTP system, which is more consistent with the business logic and the level is clear.

From a storage-space perspective, a snowflake schema has all the benefits of a relational data model without generating redundant data, whereas a star schema produces data redundancy.

It is generally recommended to use a star schema. Because we are in the actual project, often the most concern is the query performance problem, as for disk space is generally not a problem. Of course, a snowflake dimension can be used when the size of the data in the dimension table is enormous, the storage space is saved, or if the business logic is complex and must be clearly defined in a hierarchical concept.

V. OLAP and analytical services

Olap:olap is a technology that enables analysts to quickly, consistently and interactively observe information from all aspects in order to achieve an in-depth understanding of the data. Depending on how the data is stored, OLAP can be divided into Rolap,molap,holap

  ROLAP: A relational database to store data.

Advantages:

– No size limit– The existing relational database technology can be used.– Detailed data and profile storage can be implemented via SQL– Existing relational databases have made many optimizations for OLAP, including parallel storage, parallel queries, parallel data management, cost-based query optimization, bitmap indexing, SQL OLAP extensions (Cube,rollup), and much more Roalp speedDisadvantages:– generally slower response times– does not support read and write operations on precomputed–sql Unable to complete partial calculation? Cannot complete multiple rows of calculations? Unable to complete calculation between dimensions

MOLAP: Multidimensional databases to store data

Advantages: 

– Good performance and fast response times– Designed for OLAP– Support for high-performance decision support calculations? complex, cross-dimensional computing? Multi-user Read and write operations? Row-level calculationsDisadvantages:– Increase system complexity and increase system training and maintenance costs– Difficult to reach terabytes (only 10~20g), limited by file size in the operating system platform– Pre-calculations are required and may result in data explosion– Cannot support dynamic changes of dimensions– Lack of standards for data models and data access

ssas--Foundation--cube

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.