Microsoft Data Warehouse Architecture!

Source: Internet
Author: User
Tags contains include microsoft sql server object model query
Schema | Data Microsoft Data Warehouse architecture

Absrtact: This article briefly introduces the data warehouse using the Microsoft Data Warehouse architecture, discusses the functions that the data warehouse can implement, the appropriate time to use the Data Warehouse, and how to synthesize the Data warehouse and system architecture.
Directory
Brief introduction
Data Warehouse
A cube as a data warehouse model
Making decisions using the Data Warehouse
Viewing cube fragments and programming interfaces
Microsoft Data Warehouse Architecture
Other applications of the Data Warehouse
Make the Data warehouse easy to make mistakes
Summarize
Brief introduction
Data warehousing software is already included in version 7.0 Microsoft®sql Server™ released in 1998. If you are unfamiliar with the data Warehouse, you may ask: "What can it do?" When is the best time to use a data warehouse? How can I combine a data warehouse with a system architecture? "This article will briefly describe the data warehouse using the Microsoft Data Warehouse schema.
Data Warehouse
In the relatively short history of the software industry, databases have become the basis of systems for collecting and distributing information. These databases are deeply hidden by statistics or measuring methods that strategists can study to improve the overall efficiency of the system. Data acquisition is an important part of this kind of information, the purpose is to make the correct decision according to the database content. Until recently, such efforts required expensive and cumbersome data-collection packages, or database experts who were able to accurately translate information requests into useful, more efficient queries. The costs of both programmes are extremely expensive for everyone else, except for the mega-type institutions.

For the purposes of data acquisition, common relational databases that can be effectively used in product classification, Inventory control, and order entry systems may not be the best design. Summarizing the output across tables and across databases (sometimes even across servers) can be very complex, and this complexity is avoidable. When data acquisition is required, it will be more meaningful to create a repository for data analysis queries. That's what the Data warehouse means. Information from different parts of the system is integrated into the data warehouse for easy access.
A cube as a data warehouse model
The name of the Data Warehouse repository is not perfect, but it is quite satisfying. How do I differentiate between a data warehouse cube and a geometric cube? There are several important differences between the two cubes. Data Warehouse cubes are defined by any number of dimensions (not limited to three-dimensional, and sometimes data warehouse cubes may be less than three-dimensional). The dimensions of the Data Warehouse cube are described as the length, width and height of the geometric cube. If you want, you can combine dimensions into any number of levels.

The relationship between the two dimensions can be shaped with a grid. A dimension is similar to a marker on a grid axis. The cell is the content. The content corresponds to the intersection result of each dimension of the cube. The data in the cell is a unit of measurement. The unit of measurement is the whole basis of judging the cube. If the cube is about the number of items sold, the unit of measure is the count of the number of items sold. To repeat the grid example, the unit of measurement is the number you find in the grid cell.



Figure 1: The above illustration shows the structure of the two-dimensional cube. In this example, the product and region are dimensions, and broccoli, soap, Oregon, Washington, Seattle, and Spokane are the levels of each of the dimensions. The cells that contain the different charts are the contents. The single data in the content cell is the measurement value. In this example, the cube used three units of measurement.
Dimensions and Levels
The dimensions of a data warehouse cube designed for grocery stores may include products, offers, time (business days), and regions. It may also include employee dimensions and customer dimensions (some grocery stores may have member accounts).

Levels are used to organize dimensions into smaller units as needed. Depending on the level of configuration in the cube, they may also contain other levels. For example, suppose you have a region dimension. Perhaps the grocery store is open in three states and uses the state boundary as the dividing line. Suppose the area dimension contains three levels: California, Oregon and Washington. If the store also includes other subregions (such as Seattle, Olympia, Yakima, and Spokane) in the state of Washington, these levels can be added as child levels to the Washington area even if the California and Oregon regions do not. Levels are only a convenient way to organize dimension content.
Content and unit of measurement
Content is composed of various dimensions. Locating content is similar to using a coordinate system. Just as the origin in a mathematical cube can be expressed as (x=0,y=0,z=0), the content is represented by a particular dimension combination (for example (Product=broccoli, Region=seattle, Time=wednesday)), Generate a content about selling broccoli in Seattle in Wednesday. Depending on how the cube is used, the content may display a unit of measure similar to "580 units sold" or "Sales $860.00". The meaning of the unit of measurement depends on how the cube is defined. In this case, there may be a variety of broccoli or multiple stores in the Seattle area. The value represents a summary of the definition group. The unit of measurement in a cube may be a number. For a grocery store, the unit of measurement may be product price, net sales, sales volume, commodity cost, etc.
Summary
The mathematical operation of counting and summing is one of the important reasons that data warehouse is useful, and it belongs to the summary function. After the dimension has been organized and the cube has been processed, the rollup will begin to be calculated. Typically, the cube is summarized immediately after the initial fill or after the contents of the cube have been changed.
Making decisions using the Data Warehouse
Suppose a grocery store situation. Given that a promotion has been going on for several days, the shopkeeper needs to decide whether to promote it again. Shopkeepers may have the following question: "Do you sell more products during the promotion than before the sale?" ”

In a transactional database with a common structure, the grocery store inventory system can record prices, products, sales, and promotions. Inventory systems are optimized for inserting and updating records, and may also be optimized for simple programmatic choices, such as retrieving project costs. This is not possible: the system is organized so that the generated reports can describe in detail the validity of a sale by day or by product. In fact, there is always a contradiction between systems designed for transaction effectiveness and systems designed for query effectiveness. In this case, you should use the Data warehouse. A data warehouse is a separate repository that uses relevant data from existing resources in the optimized structure.

In this case, it would be easy to answer the owner's question with the Data warehouse. By using product, promotion, and Time cube dimensions, you can generate the desired results by measuring the sum of the content records of the item sales.

The information in other systems may not even be in the same database as compared to this technique. The inventory data source may not be the same as a customer data source or an employee data source. Even if the system is in the same database, it is tedious to establish a query system that merges and summarizes the results in a way that produces the correct answer. In fact, merging data sources and rollup results is what data Warehouse software is best at.

The FoodMart sample cube is included in the Microsoft SQL Server Analysis Services software, Microsoft's data warehousing software. This is a great resource for understanding what information can be used when using the Data Warehouse. The FoodMart sample uses a grocery store as a model.
Viewing cube fragments and programming interfaces
Although building a data warehouse into multidimensional may be a simple design choice, and for the processor, executing queries that span multiple dimension generation results is not particularly complex, but the structure of multidimensional output is very difficult to display. Charts, graphs, and tables are usually displayed using two dimensions. Although there are some good three-dimensional charting tools, but the chart will become difficult to read. The common technique for viewing multidimensional output is to view a cube's two-D output "fragment" at a time. This is also how Microsoft SQL Server analysis Tool displays output.
Use DSO
Fortunately, the output is not limited to two dimensions. Microsoft SQL Server Analysis Services provides a programming interface for multidimensional Data Warehouse output: DSO, decision Support Objects (Decision Support object). DSO can be used for programmatic access to multiple dimensions.

For more information about the DSO that contains the object model and Programmer's Reference, read the Decision Support object (Decision Support objects) on MSDN Online. The link is located in http://msdn.microsoft.com/library/psdk/sql/prabout_84a4.htm (English).
Using MDX
MDX (Multidimensional extension) is the syntax designed to query multidimensional objects and data. For such systems, it is more efficient and more meaningful to use MDX than to use SQL (designed for a completely different set of objects). The syntax of an MDX query is similar to the syntax of an SQL query. Note the following MDX query, which selects the sales figures from the previous example (Broccoli sold in Seattle in Wednesday):

SELECT [Measures]. [Sales] On COLUMNS [TIME]. [Wednesday] On the ROWS from Mysalescube WHERE [Region]. [Washington]. [Seattle] and [Product]. [Vegetable]. [Broccholi]

The output of this query is a column labeled "Sales," and a grid cell marked "Wednesday" with the sales number "$860.00" in the row and at the intersection.

SQL Server Analysis Server Manager contains an interface that receives MDX queries. In addition, MDX queries can also be integrated into programs that use DSO. For more information about MDX, which includes the basic structure of a simple MDX query and the primary difference between SQL and MDX, see "MDX" on MSDN Online. The link is located in http://msdn.microsoft.com/library/psdk/sql/agmdxbasics_04qg.htm (English).
Microsoft Data Warehouse Architecture
The Microsoft Data Warehouse architecture is an open architecture that is easy to integrate into the current system. The Microsoft SQL Server DTS tool is used to import, export, and repair or transform data, if required. The schema contains an object-centric programming interface for customizing the Data Warehouse implementation. There is also a user interface, Microsoft SQL Server analysis Services Manager, that can be used to configure the Data warehouse and fill in or update the contents of the cube. It can be used to schedule tasks, monitor performance, and execute queries against the Data warehouse.

You can learn more about how to create cubes using Microsoft SQL Server analysis Services Manager from MSDN Online, as well as the principles to be aware of when designing and configuring a data warehouse. See the "How to use" article in Http://msdn.microsoft.com/library/psdk/sql/aghtintro_2vov.htm (English).
Other applications of the Data Warehouse
The Data Warehouse is a great tool for decision support, but the software has other practical uses.

Data archiving is one of the uses. Some systems may have limited storage capacity. It is therefore necessary to frequently remove old data from these systems in order to accommodate new data. If you need to archive information to save a long history report, consider storing some of the old data in the Data Warehouse.

The Data Warehouse isolates the report data from the running system. By moving query work to a more efficient system, this isolation can improve the performance of the running system. can improve security. Sensitive information is saved in the running database that will not be exposed to the query. The extraction level provided by the data Warehouse simplifies access to the statistics that are generated by the decision support application.
Make the Data warehouse easy to make mistakes
The data warehouse may make business processes significantly more complex in several ways.

They rely on other systems. If a data warehouse relies on a data source that changes (for example, the data source is no longer available or is replaced by a new system), then the process of transferring data to the data warehouse needs to be modified. Designing a process to transfer data to a data warehouse is the most time-consuming task in the process of creating a data warehouse.

The availability of useful data is critical. Sometimes data source content is not useful for decision support. If the data source is temporary, or with an undefined structure, or if the content changes too quickly, these will reduce the stability required for the data warehouse. Creating a data Warehouse does not solve problems in the useful data collection process.
Summarize
The Microsoft Data Warehouse architecture provides tools for constructing, populating, viewing, and accessing data warehouses. The basic unit of Data Warehouse software is the cube, which is an integrated information repository from existing data sources.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.