The construction and analysis of SQL Server Data Warehouse
(i) Basic concepts:
1. Cubes: Cubes are the primary object in online analytical processing (OLAP) and are a technology that allows fast access to data in a data warehouse. A cube is a collection of data, typically constructed from a subset of the Data Warehouse, and organized and summarized into a multidimensional structure defined by a set of dimensions and measures.
2. Dimensions: is a structural feature of a cube. They are organized hierarchies (levels) that describe the classification of data in the fact table. These classifications and levels describe a collection of similar members that the user will analyze based on these member collections.
3. Measure: In a cube, a measure is a set of values that are based on a column in the cube's fact table and are usually numbers. In addition, the measure is the central value of the cube being parsed. That is, measures are the numeric data that the end user focuses on when browsing the cube. The measure you select depends on the type of information requested by the end user. Some common measures include sales, cost, expenditures, and production count.
4. Metadata: The structural model of data and applications in different OLAP components. Metadata describes objects such as tables, data warehouses, and cubes in a data mart in an OLTP database, and also records which applications reference different block of records.
5. Level: A level is an element of a dimension hierarchy. Levels describe the hierarchical structure of data, from the highest (most summarized) level to the lowest (most detailed) level of data.
6. Data mining: Data mining allows you to define models that contain grouping and prediction rules to apply to data in a relational database or multidimensional OLAP dataset. These predictive models can then be used to automate complex data analysis to identify trends that help identify new opportunities and choose opportunities for winning.
7. Multidimensional OLAP (MOLAP): MOLAP storage mode enables the aggregation of partitions and copies of their source data to be stored on the Analysis server computer in a multidimensional structure. The MOLAP storage mode provides the potential for the fastest query response time, based on the percentage and design of partition aggregation. In summary, MOLAP is better suited to partitions in frequently used cubes and to the need for quick query responses.
8. Relational OLAP (ROLAP): The ROLAP storage mode enables the aggregation of partitions to be stored in a table in a relational database (specified in a partitioned data source). However, you can use the ROLAP storage mode for partitioned data instead of creating aggregations in the relational database.
9. Mixed OLAP (HOLAP): HOLAP storage mode combines the characteristics of MOLAP and ROLAP.
10. Granularity: the level or depth at which data is summarized.
11. Aggregation | Aggregation: Aggregation is a predefined data rollup, which improves query response time because the answers are prepared before the problem is presented.
12. Cut: partition data, defined by multiple members of a dimension, called a slice.
13. Slices: partition data that is qualified by one member of a dimension, called a slice.
14. Data drillthrough: The end user selects a single cell from a regular cube, virtual cube, or linked cube, and retrieves the result set from the cell's source data for more detailed information, which is data drillthrough.
15. Data mining Model: Data mining allows you to define a model that contains grouping and prediction rules to apply to data in a relational or multidimensional OLAP data set. These predictive models can then be used to automate complex data analysis to identify trends that help identify new opportunities and choose opportunities for winning.
(ii) example construction process and analysis
1. A simpler example is now used to analyze and explore the construction process of the MS SQL SERVER Data Warehouse. In fact, the construction of the data warehouse is quite complex, he combines the front-end technology of data warehousing and strong business requirements. Here is just a simple example to illustrate his general construction process.
2. Building a data warehouse model, he consists of two parts, one is to consider what the original data source can provide useful data, that is, after the data filter can be used for the Data warehouse. The second is to see what the company's business layer needs analysis results. This should work closely with the company's senior decision makers and fully understand his business needs, because the data warehouse users are mainly senior decision-makers of the company.
Do a lot of upfront work at this stage, because the data in your original database may be very different from the Data warehouse you are about to build, and the structure is completely two horses. How can you extract your raw data, as useful data for a data warehouse, your original database has fragmented transaction data, and your data warehouse needs transformed and refined statistics, for example, your original database that stores all of your daily savings and withdrawals records, And your Data warehouse doesn't care about your data for every single record, but hope that in the shortest possible time, to the fastest speed of all the savings and withdrawals this month's total number of deposits, if this query on the original database to do, you will lose the significance of data warehousing, large-scale data so you can not query down, At this point you will have to the query meaningful data into the data warehouse, this is the data cleansing, that is, ETL. There are many ways to realize data cleaning, there are a lot of details, such as data type matching, data format conversion, offsite data table data together when the primary key duplication, and how you regularly, on time to the data processing to the data warehouse and so on. In my example, there is no strict step, because I do not have a standard raw database, there is no specification of the business requirements. I just used the star model and snowflake model to do a few typical data warehouse tables. Its table relationships are as follows:
The fact in the window is the fact table, the Time,address,detail is the time dimension, the address dimension, the detailed address dimension, and the detail is the Zovi. They also form a snowflake model. There are some of the data.
3 Now that the Data Warehouse has been successfully established, the next step is to establish a metadata database on the OLAP server. This database is different from what we used to call a database, which is a database of metadata, such as the cubes, roles, data sources, shared dimensions, and mining models that we will create next. You then need to connect to the data warehouse with the data source that you established earlier in the ODBC data Source Manager.
I created the database mmm and the data source test, as follows:
After the work is done, we can use the dimension table in the data warehouse to establish the shared dimension, and now take the time and the address dimension as an example. The same as the creation process.
The next step is to create a time dimension, where you can use address and detail to establish a snowflake model shared dimension.
Click Next to create the detail dimension. Processing will take effect after the creation is complete
Once the dimension is created, you should create the cube. A cube is a data set that is based on a dimension table and a fact table to make fast access to the Data warehouse. Our cube structure is as follows:
DETAIL (Sreet)
DETAIL (MARK)
Address (province,city)
Time (Year,day)
Cube study is created as follows:
Click Next to create a success (STUDY), which is handled as follows:
Then we should create a mining model
After the completion of the processing is as follows:
To now a simple Data Warehouse architecture has been successfully established, we use the front-end analysis tools to the establishment of the Data Warehouse query to see if we can achieve our simple business requirements, first of all, Excel as a query tool:
In addition to using off-the-shelf tools such as Excel,english query, we can also use MDX functions to query OLAP directly
So far, a simple data warehouse has been created to achieve some simple business queries. This example is mainly to analyze the creation process of the data warehouse and further deepen the understanding and understanding of the data warehouse, and further understand the basic concepts.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.