For a detailed introduction to parquet, please refer to: Next-generation Columnstore format parquet, this article describes parquet in detail, here does not repeat the introduction, but in the definition level (DL) and repeated level (RL) part, More difficult to understand, here to do a more easy to understand the summary.
The understanding of DL and RL, preferably an example of a Document object in the text, is excerpted as follows:
A complete example
In this section we use the document example given in the Dremel paper and the given two values R1 and R2 to demonstrate the process of calculating repeated level and definition level, where undefined values are recorded as null and R for repeated level , d represents the definition level.
This example and explanation is clear enough, but for the less developed children's shoes, it is quite difficult to understand, such as me!
First clarify my understanding of DL and RL:
dl,definition level, as the name implies, in the object tree, to the corresponding schema, the current node's definition depth (to the node itself), that is defined in the depth of the , if the node has a value, the corresponding DL value is the depth from the root node to the node itself, and if the value is NULL, its value should be the maximum depth from the root node to the node path
  RL, repeated level As the name implies, in the object tree, for the corresponding schema, the current node's " Repeat depth ", meaning "repeat depth", This is the most difficult part of the entire article to understand. I give my own definition: the so-called repetition depth refers to the repeated type node (the node in the array or list collection) as "Certaintype A" is relative to the previous schema type node ("Certaintype B"). Which level can be the most repeat, more commonly refers to the two " Common Mutual repeated ancestor depth , alas, it's actually a bit around!
For a chestnut, as shown in the example above, the value of en_US is Code node A, and code node B with the value of EN and the Code node C with the value EN-GB, because a is first seen in R1 and is defined at the 3rd depth, so the corresponding rl=0,dl=3, While the B node is the same as the previous schema type of Node A, the two can only be language do repeated in the language depth, but not in the language as a sibling node repeate, because the father is different, so for B, Its RL value is the depth of the language node in the ancestor, and the DL value is the depth of B itself, because rl=2,dl=3, however, notice that because the code node in the schema is a required type, which is a required value, its value must be defined for this type of node, So you can ignore the DL value, and the column in the RL maximum value only to 2, so the DL also took 2, in fact, I understand that this time the DL take a few does not matter. Because the DL is only meaningful for repeated and optional nodes. Similarly, c nodes and a, B nodes can only be repeate at the depth of the name node, and C is defined in the 3rd depth, so rl=1,dl=3, with the above explanation, DL takes the largest RL, that is, 2.
At last
For DL, in a word understand: this non-requied node is defined in the depth of the first, DL is a few
For RL, in a word understand: this repeted node and the previous same node in the depth of REPEATE,RL is a few
Quick understanding of parquet DL and RL