Why does the traversal of a Java two fork tree give only the preamble and the post-order traversal, and cannot generate a unique two-fork tree

Source: Internet
Author: User

Recently in the study of Java data structure and algorithm knowledge, see the way data structure tree traversal. In the process of understanding. Look at an article, the field of vision is very deep, in the information theory view of the problem. Post the link and content of the article here.

"Article source" http://www.binarythink.net/2012/12/binary-tree-info-theory/

When we learn the traversal of a binary tree, we will inevitably learn about the three traversal modes of the two-fork tree, which are followed by the sequence traversal (root-left-right), the middle-order traversal (left-right-right), and the sequential traversal of (left-to-starboard-root). And each binary tree can be used in these three traversal mode and converted to a string sequence, in order to save on the computer. But we have trouble with the reverse: we can't infer the unique two-fork tree from a certain traversal, which means it's a one-way coding process. And when we have a binary tree of the two traversal mode of expression, it does not seem to be perfect: The combination of the pre-sequence traversal and the middle sequence traversal or the combination of sequence traversal and post-order traversal can be reversed to generate a unique two-fork tree, but it happens that the pre-sequence traversal and subsequent traversal can not. What is the reason for this?

1. hash function and inverse Polish expressionA. Hash map

e4da3b7fbbce2345d7772b0674a318d5 . Because the MD5 complex algorithm makes it once considered to be difficult to crack. So it has been used by many websites to encrypt key data such as user passwords. In other words, the database of the site only saved encrypted passwords, so that even if the hacker through some means to get the entire database, but also because it can not restore the string of 32-bit "garbled" and the password is unknown. However, in 1996, German cryptology Hans Dobbertin discovered that the vulnerability of the MD5 cryptography algorithm was gradually discarded by MD5 as a function of saving passwords. The flaw discovered by Hans Dobbertin is that there are several different original texts (that is, unencrypted text), and that the sequence of characters obtained by MD5 encryption is the same. This phenomenon is called collision in cryptography, that is, "collisions". It is because of such collision discovery, make people worry about MD5, after all, now don't even need to know your original password, perhaps a few other characters, the result can be same as your password login.

The process of mapping a large collection through an algorithm to a small set, similar to MD5 encryption, is called a "hash map." Familiar with the data structure of the students will not be unfamiliar, or even just contact some Java classmate also must have a certain understanding of the word hash. In Java, if you print an object directly, the ID of the object appears in the console, which is generated by the object's Hashcode method. Java also uses this hash method to determine whether two objects are equal (a bit like the one mentioned above to determine if the user's password is entered correctly).

From the information theory point of view, an arbitrary string longer than 32 bits is processed by a hash function, which essentially completes a message compression. We find that this compression is lossy when the processed information is not able to generate a unique message when it is decoded. The amount of information it contains will be reduced when a certain process makes it more uncertain.

B. Traversal of trees and inverse polish expressions

By now we will suddenly find that the traversal of a tree can also be seen as a hash map. Only this time we are mapping a specific tree with a specific structure to the number of nodes of the tree length, and our hash function is the familiar traversal order of the students. For example, we can construct such a tree a (B (D () ()) (H ())) (E () ())) (C () () (F (I ()))) (This cumbersome two-tree notation can be referred to here)

With the processing of this hash function through the pre-order traversal, we can get this string: ABDGHECFI. However, when we decode the string that was traversed by the preamble, we find that the result is not unique. For example, for binary tree A (B (D () ()) (H () ())) (E () ())) (C () (F ())))

We get the same result when we hash through the pre-order traversal, which means that the two trees have collision under the algorithm. And according to what we've mentioned before, this shows that our traversal algorithm is a lossy compression algorithm, and when a tree is traversing, its information is not preserved completely. So what exactly is the message being discarded? We will find that when a binary tree is traversed in a "middle-left-right" way, the biggest problem is that when the algorithm encounters a nonexistent left or right subtree, the algorithm itself does not record this nonexistent condition, but chooses to ignore it. This information is contained in the structure of a tree. In other words, such "ignoring" is the key to the loss of information. In this way, we can use a new traversal algorithm where all the information can be saved as long as the ignored position is marked (denoted by the character). (Please do your own brain-mending algorithm)

So, if we run the program, then we'll get its pre-order traversal result: ABDG H E C fi. And the result is corresponding to the original binary tree.

But here a lot of students will think of a counter-example, that is reversed Polish expression. This is a way to put the operand (operand) in front of it and put the operator (operator) after the operand. For example, the arithmetic infix expression 3 + 4 can be expressed as 3 4 + (is not think of scheme:-)), the beauty of this expression is that it can be conveniently in the computer through the stack to achieve the calculation, and for humans, The simplicity of this expression is that it can be completely free of parentheses. For example, infix expression (3-4) * 5 can be written as 3 4-5 * , and 3-4 * 5 is written as 3 4 5 *-. So we seem to have a problem, for infix expression, every bit of information is necessary (including parentheses), but why can the number of physical characters reduced in the case of semantic equivalence conversion?

But the shortcomings of such representations are also obvious, as 3 - (4 * 5) there seems to be no way to express them under this new rule, but in inverse Polish expressions, the same physical bits still seem to contain different amounts of information. But if we take a closer look at the inverse Polish expressions, we will find that the bits of information they contain are not the same. In infix expressions, 13 + 14 can be directly represented as 13+14 , occupy 5 characters, and in the inverse Polish expression, we can not simply write as 1314+ , because we cannot determine the first two operands where the dividing line, it may 1+314 or may be 131+4 . So to distinguish between two operands, we have to add a space between the two to write 13 14+ , so that we need 6 characters to represent the inverse of the Polish expression.

C. Summary

Now we understand that in the inverse Polish expression notation, although the parentheses disappear by forcing a more "simple" sequence of operations, it creates new confusion at the same time. In the tree traversal, although all the information of a tree is preserved on the surface, the hidden information, such as the existence of the left node of the tree, is not reflected in a particular traversal. We seem to have seen a vague conservation, though we have so far been unable to quantify it. So we can make a bold guess, even if some kind of representation will take into account the inverse Polish notation of the simple and unambiguous expression of infix or some kind of traversal means that the same number of characters to complete the restoration of a tree, we do not have to be overly excited about its sophistication, because this will inevitably have more complex parsing rules. The amount of information it contains may not be as complex as it was originally, but at least not as good as our ideal situation.

2. Information redundancy and doubtsA. What makes a binary tree?

When we learn the data structure, we will inevitably touch a theorem, that is, given a tree's pre-sequence traversal and the middle sequence traversal or given the sequence traversal and post-order traversal can restore the whole tree. Some teachers will also mention that it is not possible to give only the pre-order traversal and subsequent traversal. Here we can do a small experiment, if given the pre-sequence traversal "Abde" and post-order traversal "Dbea", we can try to restore a bit. After the restoration we will find that this tree can have two legal conditions: A (B (D () ()) ()) (E () ())

and A (B () (D () ())) (E ())).

The same is the two different traversal order, why the pre-sequence traversal can not be fully and post-order traversal to complement, and why for the pre-sequence traversal and sequential traversal, but also have a unique value to determine it?

Through our previous brief introduction of information theory, we should also be able to guess a ballpark figure. The first sequence traversal and the middle sequence traversal, the middle sequence traversal and the post-order traversal can be restored to a unique two-fork tree, indicating that they contain enough information to cover all the information of the binary tree. So why is the pre-sequence traversal and post-order traversal not possible? This is actually better understood that the students with linear algebra basis will be familiar with such a phenomenon, that is, the column vector A and column vector b linearly independent (that is, a, B can not be represented by a linear transformation), vector b and Vector c linearly independent, but we can not say that A and C is linearly independent, they do not satisfy the transitivity. In cases where a and C are linearly related, a and C are not the same, but the information they carry is much more repetitive. This results in a, B, c three different vectors, in fact they carry the information only need a, b two vectors and a little less than the information carried by C "Amount" on it, the "amount" is a, c is not repeated between the parts.

We can also use a set to more commonly explain that the card (a) represents the potential of set a (that is, the number of elements in a), we know that there is a formula:。 The reason for this is that there is a duplicate element in collection A and B, so the repeating element is retained only one copy, depending on the uniqueness of the set.

This is why the pre-sequence traversal and the post-order traversal cannot form a complete two-fork tree. If we try to restore a binary tree based on the pre-order, the middle order, the middle order, and the sequential traversal, we'll find that every bit of data is useful (you've already practiced enough of your homework). As an example, we first use the first element of the preamble to determine the root node and then find it in the middle order, in order to determine the range of the left and right subtree, and then call this procedure recursively. In this process, because the right lobe of the left lobe mentioned in the first section of this article may be empty, the whole sub-sequence is divided into three segments (the left is a section, the node itself is a paragraph, the right side is a paragraph), it is clear to determine which elements are left and right respectively. And for the preamble, post-order traversal, we will be the first step in the implementation of the time we have found its flaws. But when we look at the first bit of the pre-order traversal, it corresponds to the root node that represents the entire tree. But if we look at the last node of the post-iteration, we will find that it will also be the root node of the whole tree. In other words, we do not have a "find" process in this process, because the last node of the post-order traversal is already the same as the first node of the previous sequence traversal. In this way, although the three traversal combinations have the same amount of characters, the information redundancy (that is, the duplication of information) exists in the pre-order and post-sequence traversal, so they contain not so much useful information. In this way, if the pre-order, middle-order traversal can be equivalent to a complete two-fork tree, in which the information is not many and many, then relative to its less informative pre-sequence, post-order traversal, nature will not be able to contain the full two-tree information.

B. The idea of still having doubts

But this explanation is not complete. Because we can find that not all pre-orders, post-order traversal cannot generate a unique two-fork tree. For example, the pre-sequence traversal is ABCDE, and the post-order traversal of the Cdbea tree has only one: a (B (C () ()) (D () ())) (E ())). The traversal and restore of this tree can actually be simplified to restore a tree with a pre-order of BCD. A little analysis will find that its tree is unique, because C determines the end position of the Zuozi, and the number of elements in the left subtree is exactly one, which determines the uniqueness of the tree. If we change the post to DCB, the result will be very different and our tree will return to a state of uncertainty.

At this point, we know that even if we lose some information in the pre-order and post-sequence traversal, we can still get the complete tree in some special cases. This means that there may be some redundancy in the pre-order sequence, or in the sequential traversal of the sequence. But this redundancy exists in a subtle way. Of course there may be another possibility, which is to use this idea to understand that the binary tree is not reliable at first. This is not the problem of information theory itself, but whether the message in this article I guess the way there is the information and physical above the strict correspondence. Because I do not have the basis of information theory, many times Google has not found a satisfactory answer, so only to make this very limited speculation and reasoning.

C. Summary

In fact, I never thought that I would encounter the final problem, but then I found out that maybe the information is a complex thing. For one of our traversal-generated strings, its own information is the arrangement of individual letters, but there are too many meanings in these permutations. There are times when I think I've dug up enough information from a single string, but when there is another string and another traversal order, I find that the similarities and differences between them are reflected in the part I didn't consider. It seems tempting to reveal something of an intrinsic nature, but how can we guarantee that we have a complete grasp of a piece of information that does not exist in this deeper nature? And in the relationship between the final information and the physical bit, I have not found the right information, perhaps my search scope is not enough, perhaps some people previously realized that this is a very complex problem. However, we have experienced a new perspective on this very basic data structure of the process, and I myself in the learning process, has always liked to take a more macroscopic perspective to grasp, such a feeling of fusion inside and out is also very good.

Postscript

This article, although it took a long time to finish writing, I also learned a lot. Because I did not have the basis of information theory, before writing this article just have a very vague impression of information theory. However, in order to force myself "from ignorance to knowing", I still feel that I should write such an article to strengthen my understanding of information theory. In fact, in the process of writing this article, I looked through Mr. Wu's "mathematical Beauty", but did not feel enough, and reference to some Mackay "information theory, Inference and learning Algorithms", A few years ago Toplang above the discussion posted, and looked at many times Google and wiki. Although finally did not solve my final question (perhaps I did not look carefully), but did learn a lot, also by the way to complete my first JS applet. And in the process of writing this article, new ideas continue to emerge, so have to "adding and deleting documents several times", also confirms that Pongba said "when writing, the new content is still coming out of the stream."


Why does the traversal of a Java two fork tree give only the preamble and the post-order traversal, and cannot generate a unique two-fork tree

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.