Absrtact: We are in the data analysis, most of the time in the use of trend analysis, comparative analysis, subdivision analysis of these three methods, but in fact there is a way we will often use cross analysis, especially when troubleshooting data anomalies, Cross
When we do data analysis, most of the time using trend analysis, comparative analysis, subdivision analysis of these three methods, but there is a way we will often use-cross analysis, especially in the problem of troubleshooting data, cross analysis can show its powerful power. In addition to say sorry to everyone is the blog update frequency may not be so frequent, but as far as possible to publish at least a month, hope that the quality of the article is guaranteed, or welcome to comment on the discussion, can launch some interesting topics, together expand in the Web site data analysis ideas.
What is cross analysis?
Cross-analysis refers to the intersection of data in different dimensions, the combination of Multi-angle analysis method, to make up for independent dimension analysis can not find some problems.
Cross-analysis based on multidimensional model and data cube, can also be considered as a special subdivision, but with the concept of subdivision somewhat different, if interested can read the previous article-Data cubes and OLAP. The method of subdivision is more based on the depth of the same dimension, that is, drilling in OLAP (Drill-down), such as from the monthly summary of data segmentation to see the daily data, that is, in the time dimension of the subdivision, or from the province of the data subdivision view of the provinces of the data, is based on the regional dimension of the drill. Cross-analysis is no longer confined to a dimension, just like cubes in data cubes and OLAP articles, are based on the intersection of different dimensions, the time dimension, the region and the Product dimension cross together to analyze the data performance of each small cube, you can through OLAP slices (Slice) and Dice (Dice) Operations to view the sales of electronic products in Shanghai in March, this will help us find many problems that cannot be found in a single dimension. Therefore, cross analysis is based on horizontal combination of different dimensions of cross, rather than subdivision in the same dimension of the longitudinal expansion.
The manifestation of cross analysis
Cross-analysis involves multidimensional combinations, although charts and tables can be displayed, but because the graph can express the data is limited, and the comparison is not easy to show the intersection of multiple dimensions, in the cross analysis is not very common, usually in the form of the main. The forms we normally look at are usually called two-dimensional tables, generally the first column is placed in a dimension, such as a date, the table header lists all kinds of indicators (in fact, all indicators can also be considered a special dimension-metric dimension), so that the ranks of the two-dimensional formation of the most common two-dimensional table. Two-dimensional tables can be extended to show richer dimensions:
As shown above is a typical multidimensional analysis based on the layout of the table, in the ranks of the hierarchical placement of multiple dimensions, if we show only one indicator, then the indicator dimension here is not necessary to show. In fact, Excel PivotTable report (Pivot table) is a tool for cross analysis, I mentioned the PivotTable in the report and report of the data, and here is the original data from the screenshot of the article, what would be the effect if we showed each dimension according to the layout above:
Looks good, the information displayed is very rich, the left contains a day-time peacekeeping product dimension, you can use the expand button to summarize and expand, like a subdivision of the operation; the top of the table above is divided into two levels of the area of the dimension, the Excel PivotTable report provides rich settings, The default presentation of summary data based on individual dimensions allows us to look at data from the "total-point" perspective, which is useful for data analysis. What if we use the pivot table above to cross-analyze and find out if there is an anomaly?
Using analytical methods from the general to the detail, you can start by looking at the summary data for daily sales and conversions, and after folding the product dimension, look at the rightmost indicator summary to see the daily summary data; If there is a significant decline in sales or conversion rates for one day, we can combine the various dimensions to find the cause of the problem, is based on the various dimensions of the detail data, product dimension to observe what kind of product sales on the day of the problem, and then combined with the geographical dimension of the cross data, you can locate what kind of goods in which province of the sale of the problem, so that the problem is effectively positioned to the level of detail, to better detect problems, and then solve the problem. Therefore, cross-analysis is actually the embodiment of the analysis of the meaning of "divided".
The above method is generally a more common problem-based analysis method, but we rarely can locate the problem at once, often we will query the database or view the various reports on the dashboard to locate the problem. Combined with the crosstab analysis, we use a report to quickly locate the problem, from the overall to the details, the logic is very clear, the problem positioning is very accurate and in place, so the rational use of cross analysis can help us to more efficiently troubleshoot problems.
The basis of cross analysis
I'm going to have to say this. Cross-analysis is based on the underlying underlying data model, because if the underlying data model is not well designed, the cross analysis of the upper layer is difficult to achieve, or the multidimensional crossover is limited to make the analysis limited.
From a technical perspective, cross analysis is based on multidimensional models, the richer the dimension of the data, the more abundant and flexible the crossover can be, and the more effective the problem can be found through all kinds of cross analysis, but correspondingly, if we want to enrich the cross analysis of each dimension as much as possible, the requirement on the base model is higher. So how to design the underlying model of the data is very critical, or refer to the data cube and OLAP text in the data cube to see a simple example:
If a site analysis of the report contains only the monthly unit date and the corresponding indicators, then the data storage is a record per month, but obviously this highly aggregated data is not conducive to analysis, we need to build as the above data cube to obtain more detailed data. Using the data cube to expand the details of the data has two directions, one is the depth of expansion, that is, based on a dimensional subdivision, such as a one-month subdivision to each day, then a record will be extended to 30; there is also a horizontal expansion, is the intersection of multiple dimensions, like the above cube added product peacekeeping area dimension. The data stored in this way extends from the original single time dimension to the time, product and geographical three dimensions, that is, three-dimensional cube can show the form, of course, the dimension can continue to expand, four five until N, theoretically feasible, here as long as three dimensions for example can be. For data storage, the impact of horizontal expansion and depth expansion is the same, records are multiplied by the way, assuming that the product dimension is the product category, there are 20 major categories of products, plus 32 provinces or municipalities, then after deep and horizontal expansion, the original monthly 1 records became:
1x30x20x32 = 19200
While we are building multidimensional models in many dimensions of the amount of data contained in the total is not as small as the above example, imagine the site of the number of goods or pages may be hundreds or even thousands, then once multiplied by the form of expansion, the amount of data will suddenly increase. Although the rich multidimensional cube can bring convenience to the analysis, but also to the data storage and query pressure.
Therefore, richer and more flexible analysis requirements are implemented based on more complex multidimensional models or data cubes, while also resulting in greater system overhead. Google Analytics a good balance between flexible data analysis and complex data models, which is also the basic security of Google Analytics, Advanced subdivision of GA (Advanced segments) and custom dashboard is unmatched by other similar free web analytics tools, which is why we divide GA into Web site data analysis tools, while most of the others can be counted as the reason for Web site data statistics tools. GA is based on its strong underlying data model and efficient data computing and response capabilities, so that many of the analysis functions can be extended, many of which involve cross analysis, here Screenshot of two of these features, secondary dimension and pivot:
Google Analytics New version adds a lot of exciting features, secondary dimension features from the old version of the continuation, the above image in the Content Module page report selected traffic source as a second dimension, So we can see where each page's traffic comes from, each traffic source in the data performance of the page, and may also find some interesting phenomena, such as the flow of some pages are basically a source, for example, some of my blog articles are basically through search engines come in, Other articles are basically brought by direct traffic.
In the various reports of GA, you can choose the form of presentation in the upper right corner, the last one is the form of pivot,pivot to the table to expand the table, can be placed at different levels of the dimension, such as the above or using the page and traffic source intersection, the source dimension is placed above the indicator. While GA supports up to two metric metric on two dimensions, I choose pageviews and bounce Rate to measure the "quantity" and "quality" of the various traffic sources on each page, which is also valuable for analysis.
Multi-dimensional cross analysis we often use it imperceptibly in our daily life, cross-analysis for the problem of the investigation and positioning of additional effective, so we need to find ways to use a better form to show the data in order to more conducive to cross analysis, in fact, the introduction of the perspective of the way is the most commonly used, but also more useful, But this kind of way is too few, do not know everybody has other more effective cross analysis show way.