Basic concepts:
1. Multi-dimensional dataset: a multi-dimensional dataset is the main object in Online Analytical Processing (OLAP) and a technology that allows quick access to data in a data warehouse. A multi-dimensional dataset is a collection of data. It is usually constructed from a subset of a data warehouse, and is organized and aggregated into a multi-dimensional structure defined by a set of dimensions and measurement values.
2. dimension: it is the structural feature of a multi-dimensional dataset. They are organized hierarchies used to describe data in fact tables ). These categories and levels describe similar member sets based on which the user analyzes.
3. Measurement Value: In A Multidimensional Dataset, a measurement value is a group of values. These values are based on a column in a fact data table of A Multidimensional Dataset and are usually numbers. In addition, the measurement value is the center value of the analyzed multi-dimensional dataset. That is, the metric value is the digital data that the end user focuses on when Browsing multi-dimensional data sets. The metric value you selected depends on the type of information requested by the end user. Some common metric values include sales, cost, expenditures, and production count.
4. Metadata: Structure Model of Data and Applications in different OLAP components. Metadata describes objects such as tables, data warehouses, and multidimensional datasets in the OLTP database, and records which applications reference different record blocks.
5. Level: A level is an element of a dimension hierarchy. Level describes the data hierarchy, from the highest degree of data aggregation to the largest level) level to the lowest degree of detail) level.
6. Data Mining: Data Mining allows you to define a model that contains grouping and prediction rules to apply data in relational databases or multidimensional OLAP datasets. Then, these prediction models can be used to automatically perform complex data analysis to identify trends that help identify new opportunities and select opportunities that are at a high risk.
7. Multi-dimensional OLAP (MOLAP): The MOLAP storage mode enables partition aggregation and replica of its source data to be stored on the Analysis Server computer in a multi-dimensional structure. Based on the percentage and Design of partition aggregation, The MOLAP storage mode provides the potential to achieve the fastest query response time. All in all, MOLAP is more suitable for partitioning in frequently used cubes and for quick query response.
8. Relational OLAP (ROLAP): In the ROLAP storage mode, partitions are aggregated and stored in tables of relational databases specified in the partition data source. However, you can use the ROLAP storage mode for partition data, instead of creating an aggregation in a relational database.
9. Hybrid OLAP (HOLAP): The HOLAP storage mode combines the features of MOLAP and ROLAP.
10. granularity: the level or depth of data aggregation.
11. Aggregation | aggregation: aggregation is a pre-computed data summary. Since answers have been prepared before the question is raised, aggregation can improve the query response time.
12. Partition: partition data defined by multiple members of multiple dimensions is called a partition data.
13. Slice: partition data defined by a member of a dimension, called a slice.
14. Data drilling: the end user selects a single unit from a regular, virtual, or linked cube, and retrieve the result set from the source data of the Unit to obtain more detailed information. This operation process is data drilling.
15. Data Mining Model: Data Mining allows you to define a model that contains grouping and prediction rules to apply data in relational databases or multidimensional OLAP datasets. Then, these prediction models can be used to automatically perform complex data analysis to identify trends that help identify new opportunities and select opportunities that are at a high risk.
Instance construction process and analysis
1. Now we use a simple example to analyze and discuss the construction process of ms SQL Server data warehouse. In fact, the construction of the Data Warehouse is quite complex. It combines the front-end technology of the data warehouse with strong business requirements. Here is just a simple example to illustrate his general construction process.
2. Building a data warehouse model involves two parts. One is to consider the useful data that the original data source can provide, that is, the data can be used for the data warehouse after the data is filtered. Second, it depends on the analysis results required by the Company's business layer. This should work closely with the company's senior decision-making layer to fully understand its business needs, because data warehouse users are mainly senior decision makers of the company.
At this stage, we need to do a lot of preliminary work, because the data in your original database may be very different from the data warehouse you are about to establish, the structure is totally two things. How can you extract your original data as a useful data warehouse? Your original database stores fragmented transaction data, what you need in your data warehouse is converted and extracted statistical data. For example, your original database stores all the deposit and withdrawal records on a daily basis, your Data Warehouse does not care about the data of each record, but wants to calculate the total number of deposits and withdrawals for this month at the fastest speed in the shortest time, if this type of query is placed on the original database, it will lose the meaning of the Data Warehouse. The ultra-large amount of data makes it impossible to query, at this time, you need to convert the meaningful data for this query to the Data Warehouse. This is data cleansing, that is, ETL. There are many methods to clean data, and there are also many details, such as data type matching, data format conversion, and duplicate primary keys when data in a remote data table is centralized, and how you process data to a data warehouse on a regular basis. In my example, I did not strictly go through this step because I do not have a standardized original database or standardized business requirements. I just used the STAR MODEL AND snowflake model to create several typical data warehouse tables. The table relationships are as follows:
In the window, FACT is the FACT table, TIME, ADDRESS, and DETAIL are the TIME dimension, ADDRESS dimension, detailed ADDRESS dimension, and DETAIL are the subdimensions of the ADDRESS. They form a snowflake model. All of them have part of the data.
Now, the data warehouse has been established successfully. The next step is to create a metadata database on the OLAP server. This database is different from the database we previously mentioned. It is a database that stores metadata, such as the multi-dimensional dataset, role, data source, shared dimension, and mining model we will create next. Then you need to connect to the data source established in the ODBC data source Manager to connect it to the data warehouse. After these tasks are completed, you can use the dimension tables in the data warehouse to create a shared dimension. Now, we use time dimension and address dimension as an example. The creation process is the same. Click here to create a TIME dimension). Next, use ADDRESS and DETAIL to create a snowflake model sharing dimension. Click Next to create a DETAIL dimension. After the creation is complete, it must be processed to take effect. After creating a dimension, you should create a multi-dimensional dataset. A multi-dimensional dataset is a dataset Based on dimension tables and fact tables. It allows you to quickly access a data warehouse. Our Multidimensional Dataset structure is as follows:
DETAILSREET)
DETAILMARK)
ADDRESSPROVINCE, CITY)
TIMEYEAR, DAY)
Now a simple data warehouse architecture has been established successfully. We use front-end analysis tools to query the created data warehouse to see if it can meet our simple business requirements, excel is used as the query tool first. In addition to using Excel, english query, and other ready-made tools for QUERY, we can also use the MDX function to directly query olap.
So far, a simple data warehouse has been created successfully, and some simple business queries can be implemented. This instance mainly analyzes the creation process of the data warehouse, further deepens the understanding and understanding of the data warehouse, and further understands the basic concepts.