Problems and attempts of relational algebra (4) Hierarchical data and interaction

Source: Internet
Author: User

Abstract: This article is from Beijing Run Dry Software Technology Co., Ltd. Chairman Shang in the Tsinghua Big Data Industry Federation lectures.

When it comes to interactive computing, let's review the concept of OLAP. This word literally means online analysis, but what does online analysis actually do?

    • User guesses about what's happening
    • Calculate based on historical data to verify or falsify guesses
    • Revise the guesses based on the results of the calculations and repeat the process until a useful conclusion is reached

Business users see a number of phenomena, he will guess what is the reason, after guessing the beginning to take the historical data to see, see I guess right. Sales growth, may be a particular sales, I want to use data to verify, sales decreased, may have disaster, I want to take data validation. Guess right, may get a useful conclusion, guess wrong, I will change a guess. OLAP literally should be such a process.

But now the word OLAP has been severely narrowed. When it comes to OLAP, it refers to the cube that revolves around, rotating, slicing these. Strictly speaking, this operation is not analyzed, it has no process, it can only be regarded as the flexible presentation of the analysis results.

and the process of analysis, I have to look at the results of a step to know the next move, you can not design an end-to-end path in advance. You let the business people do their own, he will not model himself, the need for computing himself will not do, he can only look for technical staff, a lot of user agencies larger, like Mobile, bank, he looked for technical staff to help the cycle is in weeks, this can not play, and so calculate this conclusion when the market has no need. Unless you can find a person who knows both business and technology, it is rare and expensive. We'd better let the business people do many operations on their own.

In this sense, Excel is the best OLAP tool, which is much more useful than Cognos,bo. Excel like a calculator operation, I do a step, get the results and see what to do next, this is the real OLAP tools, Excel is nothing more than a small amount of data to support, the current version of the limit is 1 million lines, but flexible analysis when users often do not have that large amount of data.

The OLAP system, which is traditionally modeled, is like a cumbersome train model that runs much faster, but is too rigid. and users need more flexible car mode, I do not need to pull so much to run so fast, I need to go where I want to go.

We look at Excel, I don't care about its formatting ability, is not able to make a good-looking report out, I only care about its ability to calculate the data.

Excel's operational model can be understood as a variant of relational algebra, it has a natural order and dispersion, the correlation is weak, you have to join two tables, to use the lookup function to write, very troublesome. But this is not the focus of today's discussion, it will not unfold.

The basic model of Excel is a single-layer table, which is not closed for grouping operations. We have used excle all have this experience, you let it do a group, it will become another thing, you can not be free to the group after the various levels of the execution of single-level table action.

For example, I want to calculate a proportion of such a hierarchical group table, the formula will be difficult to fill.

Excel's $ symbol can only tube one layer, you use D36 divided by D4, the formula dragged to the next grid is not right, to the next group is even more useless, can only be hard written by hand, which is unacceptable. Other actions, such as sorting and re-filtering for the packet layer, are difficult.

The algebraic system behind Excel has no closeness to the grouping operation. We need to design an algebra that is closed to the grouping operations so that multiple layers of tables can be supported, and the result of the grouping is still the data type, so that it can be manipulated in a continuous manner.

This is a desktop tool that we developed based on this idea.

This product looks like Excel, the difference is that its model supports multi-level table, from the left side of the table can see the hierarchical information of each row, so that it can automatically deal with the level of grouping, the level of the same processing.

I want to calculate the ratio, fill in a lattice of the formula, the other same status of the lattice will be copied correctly, for example, here to calculate D35 divided by D2, the formula copied to the next group will automatically become D413 divided by D325, it will make things right.

We make an analogy between this table algebra and the arithmetic.

We can be free to add and subtract in the integer range, the result is an integer, but can not do division, the other is possible to go out, no longer is an integer, integer to Division is not closed, division in the integer set is irreversible.

If we extend the scope of the data to the rational number, it is closed to the division operation, I can be arbitrary continuous operation. But I want to redefine the subtraction operation rules for rational numbers, which are different from integers and are more complex than integers.

Similarly, in the Excel single-layer model, we are free to do the filtering, sorting, adding computed columns and other operations, but a grouping on the scope, can not continue to do, it is not closed to the grouping operation.

Coincidentally, the grouping operation is exactly called division in relational algebra. Strictly speaking, relational algebra is also closed to division, but it retains only the aggregation part, which is equivalent to the integer part of the division that is defined under integers, which is irreversible, and you do not return it, but we need a reversible grouping in the interactive operation.

We can design a closed and reversible multi-layer tabular model for the grouping operation, that is, the table after the grouping still belongs to this data type, so that the various operations can be executed continuously, complex interactive operation is feasible.

Similarly, it is necessary to redefine the rules for the operation of the original single-layer tables under the Multi-level table, such as sorting, filtering, filling in computed columns, and so on.

The actual application also needs more complex multi-layer table joins and other operations, these content is too many, will not elaborate.

Let's look at two examples of how to do this in a multi-layered tabular model.

Raw data:

Questions and calculation ideas:

This problem requires grouping to sort and calculate columns, and then sorting the group summary hierarchy.

According to the stock code group after the calculation of the situation of the continuous rise of each stock.

Calculates the maximum number of days to rise for each group and then sorts them in the group hierarchy.

The second question, the original data:

Questions and ideas:

The trouble here is that, in the 5th step, we need to break the packet and turn it into a single-layer table, which requires that the computing system is reversible for grouping. Excel is not reversible for grouping operations, and this process is not going to work.

Calculated after the top 10 per account.

The result can be obtained by knocking down the group of subjects and then re-grouping by students.

With such a multi-layered tabular model, you can make interactive data operations easier to do. This is a step forward for real OLAP.

Problems and attempts of relational algebra (4) Hierarchical data and interaction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.