A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
Chapter 4 Section 2 How the pivot table component processes data
One of the most important and complex aspects of the pivot table component is how it interacts with various data sources and how it operates data in a session. This section explains how the pivot control communicates with the data source and how data is transmitted and operated during a session.
The functionality of the pivot control is a bit uncertain-because most of its functionality depends on the type of the data source it is connected. It can only use two types of data sources: Table column data and multi-dimensional data. (Multidimensional data sources can also be called OLAP data sources. I will use these two terms in this book .) We will also discuss the use of XML data as a data source. Although XML data looks very similar to any other table data source used for the pivot control, it still has some special requirements to be discussed.
Table column data source
The ole db data source of any existing public data table is a table column database. Generally, they all belong to the relational database engine field. However, this category can also contain non-relational data providers as long as they have some form of text command syntax or named tables ???.
Figure 4-6 shows the initial appearance of the report after loading the data returned by a table column data source to the pivot table component. (You can also view this report by running the ttablelist.htm file under the chap04file on the optical drive .)
This report is similar to the report generated after an external data area is imported in an Excel worksheet. However, because the pivot control combines the functions of the external data area and the pivot report, you can group data based on any field in the report, and create a new total for any field. For example, you can use the "move to row area", "Move to column area", and "autocalc" toolbox buttons to convert this normal data list to a pivot report from 4 to 7.
Figure 4-6. A pivot report that contains data from a table column data source.
Figure 4-7. A PivotTable report created from a common data list.
Because the data source is a table column data source, the pivot table control can display the details of any statistical value-this means that you can expand any number and immediately view the rows that make up this number. Figure 4-8 shows an overall structure for accessing the table column data source.
Figure 4-8. Access the table column data source.
When extracting data, the pivot control first connects to the ole db Provider defined in the provider attribute of the connection string. When creating a report in the design environment, you generally select the desired provider from the list in the Data Connection Properties dialog box. The provider is a COM component in the Process hosted on the client machine. It usually uses a dedicated protocol to communicate with the data server (if there is indeed a server ). For example, the SQL server provider uses various protocols to communicate with the server. The most common protocol is named pipe. However, the provider of the Microsoft JET database needs to have access to the MDB file because the Jet Database Engine is not a client-server system.
When the pivot control is connected to the data source, it passes the content in its commandtext attribute to the provider for execution. You can use the data source segment in the property toolbox to set the commandtext attribute during the design phase, or you can use code to set it at runtime. The pivot control uses the recordset object of ADO to execute command text. Therefore, any value that can be passed to the recordset open method can be used in the commandtext attribute. These values generally include SQL statements, table names, view names, or stored procedure names. After the provider executes the statement, it returns an irowset interface of ole db to allow the client to access the data returned by the execution command.
When you operate a table column data source, the pivot table control uses ADO to immediately load the returned data to a component named Microsoft cursor engine (wce). wce is a Microsoft Data Access Component (MDAC) it provides advanced traversal, sorting, and filtering functions on any data provider. Wce loads data from the data source provider to its own memory cache. If the data contained in the cache exceeds the memory limit, the data will eventually be converted to the disk. (This prevents wce from consuming all your available system memory .) After the data is loaded into wce, the pivot control filters, sorts, and traverses the data set by communicating with wce.
When you start to group the result set based on a field, or use steps similar to the preceding method to create a statistical value, a magic thing happens. To form a crosstab chart, the pivot table control uses another data pipeline named "pivot table service component. This component is actually the client provider of the OLAP Service, but it can also create a temporary cube on the client without the data source. When you group fields or create a statistical value, the pivot_table control transmits a reference pointing to the dataset and the dimension and the statistical value descriptions in the temporary cube to the pivot_table service component. This engine creates a temporary file under the temporary folder of the Microsoft Windows operating system. Therefore, if your company's policy is not to allow controls in the web browser to create temporary files, pay attention to this problem.
Name a temporary cube File
To implement this function, David worktendyke, one of the star data developers of the pivot table component, must design a scheme to name the temporary cube file, so that it does not overwrite any existing files or affect another pivot control running in other applications. His final solution is to use the current process and thread ID and the common cub extension to form a file name.
Therefore, when you use the pivot control of a table column data source to group and create statistical values, if you see some files with strange names in your Temporary Folder, remember that it is a temporary cube File Created by the pivot control. Don't worry-these files will be automatically deleted when the control is destroyed.
When you operate table column data, the pivot table control automatically generates two time layers for each date or date/time field in the detailed data. The first layer includes the Group Interval of year, quarter, month, and day, and the second layer includes the Group Interval of year, Week, and day. (Both layers are required, because weeks cannot exactly form a month .) These automatically formed layers allow you to easily analyze time-Dimension Data and view the Data Summary corresponding to each interval.
If you plan to use the pivot control on a web page, you may also need to study how to use Remote Data Services (RDS ). This is another data access Pipeline provided by MDAC. It uses http to access the data source. When Using RDS, the client only needs the RDS provider, which is installed along with the Office Web component. The RDS provider then communicates with the actual data provider (for example, SQL Server) through the Web server, so that the original data source provider only exists on the server. For more information about apsaradb for RDS, refer to the Microsoft website http://www.microsoft.com/datato access the data volume.
Multidimensional (OLAP) Data sources
You may be very familiar with table column data sources or relational data sources, but you may not know about multidimensional (or OLAP) data sources. Before I introduce how each element in the pivot table component maps to the structure of a multidimensional database, let me briefly introduce the concept of a multidimensional database.
In relational databases, tables and relationships are the most important data structures and concepts. You can build databases by defining tables that contain one or more columns, primary keys, rules, and so on. Then, you can establish a relationship between these tables by specifying the Foreign keys of the table (matching the primary keys of other tables. Specify the Foreign keys corresponding to the primary keys of other tables to establish relationships between these tables. Once you have done this, you can execute SQL statements on the database engine. You can use Association, sorting, restriction, and grouping as needed to meet your needs.
In a multi-dimensional database, the data structure of the primary key is a cube, or, more accurately, a hypercube ). This structure is an n-dimensional matrix, which is difficult to visualize. The items in each dimension are called members. The intersection of n members forms a number. Let's look at an example that makes us feel less abstract.
Suppose we model a company's sales data in a cube. In our example, we start from two dimensions: products and customers. A two-dimensional structure is easy to visualize because it looks like a rectangle. You may have seen this rectangle when comparing two-dimensional information, such as a cross table. Figure 4-9 shows the possible appearance of a rectangle.
Figure 4-9. A two-dimensional database.
Note that the customer name is displayed in one dimension, while the product name is displayed in another dimension. The number in the central area is the sales. There is a value in the combination of any product and customer, representing the sum of the amount consumed by the customer on the product. Note that each dimension contains an additional member named all. This member represents the statistical value of all members of the current dimension (generally the sum of all members ). Therefore, the intersection of MERs. All and a product represents the total sales of the product. Similarly, the intersection of products. All and a customer represents the total consumption produced by the customer. The intersection of two all members is the total sales of all customers and products.
Now imagine adding the 3D section containing the salesperson's name to this rectangle. The structure is changed to a three-dimensional cube, as shown in Figure 4-10.
Now the three coordinates-customer, product, and sales personnel-determine each intersection or unit in the cube. Another member named all appears in the sales personnel dimension, which represents the total sales volume of all sales personnel. This structure allows you to view data from multiple perspectives to help you answer various questions. Because these values are stored in the structure, multidimensional databases can quickly access any set of these units.
Figure 4-10. A 3D database.
It is difficult to visualize four-dimensional structures, but you can imagine that you need to summarize additional data values. For example, you may need to know not only the sales of the items sold, but also the sales quantity. These multi-values create the fourth dimension that contains two members (sales quantity and sales amount of the sold item. These values are called measures in multidimensional databases; however, most data sources regard dimensions as dimensions. Figure 4-11 shows a way to visualize Four-Dimensional Data.
In a multidimensional database, all data is stored in a four-dimensional supercube. However, you can use a Four-Dimensional Data Structure as multiple three-dimensional cubes to understand the four-dimensional data structure. If you want to view the sales amount of a specific customer, product, or sales person, you should check the first cube. If you need to know the number of products sold at the same intersection, you should check the second cube. Of course, you can extend this example to show tables in the cube and the cube in the cube-but so far, otherwise you will go crazy when attempting to visualize a 16-dimensional space.
Figure 4-11. A four-dimensional database.
Most multidimensional databases also allow you to group members in a dimension. In a group, parent and child elements are implicitly specified for the member. In fact, each dimension defines one or more layers within them, each layer contains one or more levels (levels), and each level contains a series of members. This imitates the natural structure of most classified data-a product generally belongs to a related product group, and a customer typically lives in a city in a State in a country, A salesperson belongs to a certain area, and so on. For example, customer dimensions can include levels of all, countries, States, cities, and customer names. The national members may be the United States, Canada, and Mexico, while the State members may be Washington, Oregon, Great Britain-Columbia, Albert, halisco, Veracruz, and so on.
A single dimension may contain multiple layers. For example, if you have an employee dimension, you may need to calculate travel expenses based on the Department structure to understand the total costs of each manager and department supervisor, or you may need to know the total cost of all employees (such as marketing, sales, product development, or administrative personnel) engaged in a job function. Members in dimensions are the same (all employees), but they are organized to different layers and therefore create different statistical values.
Many books, magazines, reports, and a large number of papers have deeply discussed multidimensional databases. If you have purchased a multi-dimensional database, it is likely that the sub-documents of your database describe these concepts in more detail than what I will introduce here.
How does the pivot table component interact with OLAP data sources?
The Pivot Table component communicates with the OLAP data source and interacts with the table data source in a similar way. Figure 4-12 shows an overall description of the structure.
Figure 4-12. The interaction between the pivot table component and an OLAP data source.
The pivot control uses the Microsoft-defined ole db for OLAP standard, which is supported by many multidimensional databases. This model is an extension of the ole db standard. Therefore, the interaction between the pivot control and the OLAP data provider is similar to that between the control and the data provider. The control first connects to the data provider, which is also an in-process COM component hosted on the client machine. The provider determines how it communicates with a multi-dimensional database. For example, the OLAP Service uses a TCP/IP socket connection between the client and the server.
After the pivot table component connects to the OLAP data source, it can display all layers and sizes in a specified hyper-cube in the pivot table Field List window. When you drag and drop layers and dimensions into the pivot control, or when the developer uses the code to insert layers and dimensions, the pivot control is in the MDX (multidimensional expression, required query requests are generated in the query language defined by the ole db for OLAP standard and executed on the data source. Finally, the data provider returns the query result. The Pivot Table Control displays the result on the screen.
When operating OLAP data sources, the amount of data transmitted over the network is very small. The OLAP provider generally only sends the MDX query string to the server, and the server returns the name of the cell and member you see on the interface. The server only sends the statistical value back to the client, instead of the underlying details necessary to create the total value. This allows the pivot control to quickly respond to requests, improve system scalability, and support a large number of concurrent clients.
Should I create a cube or just group the table column data?
When I show people that the pivot table component can group and summarize table column data, as if it is a report from OLAP cube, people often ask me, "Why do I need to create a cube?"
The answer to this question is divided into two parts. First, using a pre-creation method, the server-based hyper-cube is often more efficient than creating a temporary cube of table column data through the pivot control. When you Group a new field in a table column, the pivot control must recreate the cube and regenerate all the statistical values. A server-based Cube only creates these statistical values once. All clients accessing the cube share these statistical values.
Second, a pre-created cube can be defined at multiple levels to create a clear path for drilling in data. When the pivot control groups relational data, it only creates a layer for the date field. It cannot know that fields such as country, state, and city are actually three levels of the same layer. In a pre-created cube, you can define these layers so that data users can easily find the information they need.
The Pivot Table component also has a special data source, a URL that returns XML data in a specific format. In ado2.1, Microsoft's Data Access Group (the group that develops MDAC) defines an XML format for saving ole db row sets. They also create a data access pipeline called persistence provider, which can read and write XML data in this format to access a row set of ole db. The pivot control can use this provider to load XML data returned from a URL to the Microsoft cursor engine. Figure 4-13 describes the structure in this case.
Figure 4-13. Use a permanent provider to load XML data to wce.
I will explain immediately that if you want to use this method, you must pass that type of connection string to the pivot control. However, for the upcoming discussion, we need to insert a paragraph. The most important information required by the pivot control is the URL from which the XML data stream can be obtained. The Pivot Table Control passes the URL to the persistent provider. The persistent provider then uses the Windows Internet service to request the returned results of this URL. Then parse the result and load it to wce, while the pivot control then starts to process the result data, just as it processes table column data.
This XML data format is specific. Unfortunately, the corresponding documents in this format are not complete. However, the simplest way to view this format is to use the Save method of the ADO recordset object to save the content of A recordset to an object in adpersistxml format. If you need to dynamically generate data in this format-for example, on a Microsoft ASP page, you can use the recordset open method to test your output. If you can load your XML data to An ADO recordset object, it will be successfully loaded into the pivot table control, the control uses the same mechanism as the ADO recordset object. You can view the source code of the solution described in Chapter 6th, where an example of generating XML data on the ASP page is provided.
Start building with 50+ products and up to 12 months usage for Elastic Compute Service