Quick understanding of parquet DL and RL

Source: Internet
Author: User

For a detailed introduction to parquet, please refer to: Next-generation Columnstore format parquet, this article describes parquet in detail, here does not repeat the introduction, but in the definition level (DL) and repeated level (RL) part, More difficult to understand, here to do a more easy to understand the summary.

The understanding of DL and RL, preferably an example of a Document object in the text, is excerpted as follows:

A complete example

In this section we use the document example given in the Dremel paper and the given two values R1 and R2 to demonstrate the process of calculating repeated level and definition level, where undefined values are recorded as null and R for repeated level , d represents the definition level.

This example and explanation is clear enough, but for the less developed children's shoes, it is quite difficult to understand, such as me!

First clarify my understanding of DL and RL:

      dl,definition level, as the name implies, in the object tree, to the corresponding schema, the current node's definition depth (to the node itself), that is defined in the depth of the , if the node has a value, the corresponding DL value is the depth from the root node to the node itself, and if the value is NULL, its value should be the maximum depth from the root node to the node path

      RL, repeated level As the name implies, in the object tree, for the corresponding schema, the current node's " Repeat depth ", meaning "repeat depth", This is the most difficult part of the entire article to understand. I give my own definition: the so-called repetition depth refers to the repeated type node (the node in the array or list collection) as "Certaintype A" is relative to the previous schema type node ("Certaintype B"). Which level can be the most repeat, more commonly refers to the two " Common Mutual repeated ancestor depth , alas, it's actually a bit around!

For a chestnut, as shown in the example above, the value of en_US is Code node A, and code node B with the value of EN and the Code node C with the value EN-GB, because a is first seen in R1 and is defined at the 3rd depth, so the corresponding rl=0,dl=3, While the B node is the same as the previous schema type of Node A, the two can only be language do repeated in the language depth, but not in the language as a sibling node repeate, because the father is different, so for B, Its RL value is the depth of the language node in the ancestor, and the DL value is the depth of B itself, because rl=2,dl=3, however, notice that because the code node in the schema is a required type, which is a required value, its value must be defined for this type of node, So you can ignore the DL value, and the column in the RL maximum value only to 2, so the DL also took 2, in fact, I understand that this time the DL take a few does not matter. Because the DL is only meaningful for repeated and optional nodes. Similarly, c nodes and a, B nodes can only be repeate at the depth of the name node, and C is defined in the 3rd depth, so rl=1,dl=3, with the above explanation, DL takes the largest RL, that is, 2.

At last

For DL, in a word understand: this non-requied node is defined in the depth of the first, DL is a few

For RL, in a word understand: this repeted node and the previous same node in the depth of REPEATE,RL is a few

Quick understanding of parquet DL and RL

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.