Overview of dimension model Data Warehouse base objects Concept

Source: Internet
Author: User

I. Measures, indicators, indicators

Metrics and dimensions constitute the main concepts of OLAP , and are measures for numerical, continuous fields stored in a fact table or a multidimensional cube. This conforms to the above meaning, there is a standard, a measure field is definitely a unified unit, for example, the number of yuan, households. If a measure field, where the measure may be the euro and possibly the dollar, then this metric cannot be aggregated.

In OLAP There is also the idea of calculating measures, with a total cost divided by the number of users, to get the average cost per household. But is this really a measure? This is not a measure of the original sense, just to address the convenience.

This is to say the indicator, the English Metric. In performance management software, there is usually this concept. Its definition can be expressed as " it is a value that represents a certain relative degree. " It differs from the concept of measurement, which is an absolute value, the result of a ruler, a sum of numbers, etc. And the indicator requires at least two measurements between the calculation to get, for example, ARPU, with income than the number of users, such as income growth rate, the monthly income than the previous month income. Of course, the calculation of the indicator may also require more than two measurements.

and Indicator literally means indicator, in the KPI , the last I is it, but in Chinese name it, always called " key performance Indicators ", and not called " indicator ", it will cause some confusion.

We act as indicators: traffic lights, reminding pedestrians whether they are waiting or passing, alarm lights in the monitoring room, reminders of anomalies, car dashboards, reminding the driver whether the oil is adequate and how fast it is. Their role is to pass on a macro message that motivates people to move on to the next step. A red light stops the Green Line, and when the alarm lights up, send someone to see it. In the present common enterprise performance management software, the dashboard (in some places called the cockpit) display interface is also essential, it is with this intuitive and more symbolic indicators to reflect the operation of the enterprise.

The idea of presenting a KPI is to be expected to be based on a rough (non-detailed) message, rather than data, for the next decision. resulting in different decision behaviors must be discrete input, the simplest is a switch, is or not (such as alarm lights). if metrics and indicators are quantitative, the indicator is a qualitative one.

However, the KPIs in these systems are not exactly the indicators mentioned above, and many of the system constructs are called metric systems or indicator systems. And for an enterprise, which indicators can fully reflect the business activities, which also needs to be carefully formulated, rather than let the technical department put forward a bunch of specious indicator names, such as the number of users in the network, income, and so on, this is not a KPI.

Description of the three differences:

" Metric " is an absolute quantitative value;

" Indicators " is a relative value calculated based on two or more measurements;

" indicator " is based on measures or indicators, and according to a benchmark worth the qualitative results.

Second, the difference between the middle level and the grade of the dimension

When defining dimensions in OLAP , layers (Hierarchy) and levels are two confusingconcepts. Simply put, a layer is a classification of dimension members, which is the inclusion relationship between dimension members or dimension member properties.

A dimension must contain at least one layer. Take [ product dimension as an example, you can create a [ origin layer, you can create a [ manufacturer layer, you can also create a [ category layer. In ssas , you can not define a layer, at which point the default layer for the dimension is allmembers The layer. In the schema definition Tool for mondrian , All manual definitions are required.

A layer must contain at least one level to [ product " Dimension as an example, [ origin layer can contain province - city - county three levels, [ category layer can contain daily necessities - washing supplies - detergent three levels. Levels are defined with 2 , one defined between the attributes of a dimension member, such as [ Each member of the product Dimension has a product family, a large class, and a small class of three attributes, which defines [ classify layer level, take advantage of these three properties directly, That is, each level is a property of a member. The other is between the dimension members, such as hr , each level is a specific dimension member, that is, each level is one or more dimension members, Each level contains multiple properties. The latter level is often persisted in the database in a recursive manner.

Third, data Warehouse related terms

Data Warehouse : A data Warehouse is a collection of data that supports management decisions. The data is subject-oriented, integrated, not easily lost, and time-variant . A data warehouse is a collection of snapshots for all operating environments and external data sources. It does not need to be very precise, because it must be extracted from the operating environment on a specific time basis.

Data mart : A data warehouse is limited to a single topic area, such as a customer, department, location, and so on. Data marts can rely on data warehouses when they get data from the Data warehouse, or they do not depend on the data warehouse when they get data from the operating system.

fact : The fact is that the information unit in the Data Warehouse is also a unit in the multidimensional space, limited by the analysis unit. The fact is stored in a table (when using a relational database) or a cell in a multidimensional database. Each fact includes basic information about the fact (sales, sales, cost, gross profit, gross margin, etc.) and is related to the dimension. In some cases, when all the necessary information is stored in the dimension, the mere fact that it occurs is sufficient information for the Data Warehouse.

dimension : A dimension is a class of attributes that reflect a business, and a collection of such attributes constitutes a dimension. For example, a geographic dimension might include the level of a country, region, province, and city. A time dimension may include levels for the year, Quarter, month, week, and day.

level : An element of the dimension hierarchy. Levels describe the hierarchy of data, from the highest (most summarized) level of data to the lowest (most detailed) level (such as the Big category-classification - Small category - fine classification). The level exists only in the dimension. A level is based on a member property in a column or dimension in a dimension table.

Data Cleansing : Data that is useless to the data warehouse system or that does not conform to the data format specification is called dirty data. The process of cleaning is the process of clearing dirty data.

Data Acquisition : A part of the back-end processing in a warehouse system. Data acquisition process refers to the collection of data from the business system in relation to the parameters of the warehouse.

Data Transformation : Interpreting business data and modifying its content to conform to data Warehouse data format specifications and into data storage media in the Data warehouse. Data transformations include conversion of data storage formats and conversion of data representations (such as product code to product name conversions).

Online Analytical Processing (OLAP online analytical Processing):OLAP is a multidimensional analysis technique used to meet decision-making users in a large number of business data, To explore the regularity of business activities, the analysis of market operational trends and to assist them in making strategic development decisions. According to the storage method of data, OLAP is divided into ROLAP,MOLAP and HOLAP. In the Data environment of customer information Data Warehouse CCDW ,OLAP provides on-line analysis mechanism of drilling, drilling, slicing and rotating. Completed functions include multi-angle real-time query, simple data analysis, and supplemented by a variety of graphical display analysis results.

Data Mining : The process of discovering new information in Data Warehouse data is called data mining, and these new information is not available from the operating system.

slices : A technique used to limit the analysis space in a dimension to a subset of data in a data warehouse.

dice : A technique used to limit the analysis space in multiple dimensions to a subset of data in a data warehouse.

Star Mode : is the best design pattern for data warehouse applications. It is named for its physical representation as a central entity, typically including indicator data, radiation data, and often a dimension that facilitates the browsing and aggregation of indicator data. The result of the star Graph model is often query data structure, which can provide the optimal data structures for the fast response user's query request. star graphs also often produce a two-tier model that contains dimension data and indicator data .

Snowflake mode : Refers to an extended star chart. A star chart typically generates a two-layer structure, that is, only dimensions and indicators , and the snowflake chart generates additional layers. In the process of construction of real Data Warehouse system, it usually only expands three layers: dimension (dimension entity), indicator (indicator entity) and related descriptive data (class detail entity); a snowflake model of over three layers should be avoided in a data warehouse system . This is because they begin to be like the normalized structure that is more inclined to support OLTP applications than the non-formatted structure optimized for data warehouses and OLAP applications.

granularity: Granularity will directly determine the level of detail that a built-in warehouse system can provide for decision support. A higher granularity indicates that the data in the warehouse is thicker and, conversely, thinner. Granularity is related to the specific indicator, which is expressed in the dimension values of some of the hierarchical dimensions that describe this indicator. For example, the time dimension, the time can be divided into years, seasons, months, weeks, days and so on.

The granularity of data stored in a data warehouse model will have an impact on many aspects of information systems. In fact table, what level of various dimensions as the most fine granularity, will determine whether the stored data can meet the functional requirements of information analysis, and the granularity of the hierarchical division of the aggregation table and the granularity of the selection will directly affect the query response time.

measure : In a cube, a measure is a set of values that is based on a column in a cube's fact table and is usually a number. In addition, the measure is the central value of the cube being parsed. That is, the measure is the digital data (such as sales, gross profit, cost) that the end user focuses on when they browse the cube.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Overview of dimension model Data Warehouse base objects Concept

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.