Java Essays--hashmap and red-black trees

Last Update:2017-12-10 Source: Internet

Author: User

Tags array length

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective:

HashMap is a very common data structure, its easy to use and fast, the next I will give you in-depth analysis of this data structure, so that we can use the time to know it, but also know its why.

A. Map

First of all, from the very basics, let's start by knowing what a map is. When we write the program often encounter data retrieval and other operations, for the hundreds of data of small programs, the storage of data or retrieval strategy has not much impact, but for big data, the efficiency is very far. Let's discuss the problem.

1. Linear Search:

Linear retrieval is the most straightforward way to iterate through all the data and find the data you need. The corresponding data structure is the array, linked list and other linear structure, this way for big data is very inefficient, the time complexity of O (n).

2. Two points search:

Binary search is an improvement on linear search, for example, for "1,2,3,4,5,6,7,8", I want to search for a number (assuming 2), I will first compare this number with 4 (this number is generally selected) comparison, less than 4 in 4 of the left "three-in-one" to find, and 2 compared, equal, As a result, the advantage of this retrieval method is that you can save a lot of unnecessary searches, using only half of the elements found in the collection at a time. It has a time complexity of O (Logn). But there are limits, and his permutations need to be orderly.

Find in the 3.Hash table:

Okay, here's the point, the hash table is on the blink, it's a time-complexity O (1) search, which means that no matter how much data you have, you can find the target data only once. Isn't it amazing?? Well, actually, it's retarded. Let's see, everyone.

You can see that the value in this array is equal to its subscript, for example, I want to save 11, I put it in a[11] inside, so I want to find a number of time directly corresponding to its subscript on it. This is actually a sacrifice of space-time method, this will be a large memory footprint, but the retrieval speed is very fast, only need to search once to be able to find the target data.

Changes to the 4.Hash table

Look at the above hash table you would like to ask, if I only save a number 10000, then I do not want to exist a[10000], so that the other space is not wasted, OK, not exist. Hash table already has its coping method, that is the hash function. The essence of a hash table is that it can be quickly found by locating the element subscript of the lookup set by its characteristics. The general hash function is: to deposit the number of mod (redundancy) hash array length. For example, for the above array of length 9, 12 is the position of the MoD 9=3, that is, there is a A3, in this way can be placed relatively large data.

5.Hash Conflict Resolution Strategy

Read the above explanation, witty you must have found a problem, by seeking the remainder of the address can be the same. This kind of we call hash conflict, if the data volume is larger and the hash bucket is smaller, this kind of conflict is very serious. We take the following approach to conflict resolution.

We can see 12 and 0 of the position conflict, and then we put the array of each element into a linked table header, the elements of the conflict is placed in the list, so that the corresponding linked list will be found after the linked list, as for why the list is to save space, the list in memory is not continuous storage, So we can use memory more fully.

Java's HashMap

What does that have to do with the hashmap of our topic today? All right, do not square friends, into the subject. We know that the values in HashMap are all key,value, right, in fact, the storage here is very much like the above, the key will be mapped to the address of the data, and the value is in the list with this address as the head, this data structure is acquired very quickly. But the problem here is that if the hash bucket is small and the amount of data is large, it will cause the list to be very long. For example, above the length of 11 of the space I want to put 1000 numbers, no matter how sophisticated the hash function, followed by the chain list will be very long, so that the advantages of the hash table no longer exist, but tend to linear search. Well, the red and black trees are shining.

Red and black Trees

After the jdk1.8 version, Java made an improvement to HashMap, when the list length is greater than 8, the following data will be in the red and black trees to speed up the retrieval speed, we next talk about the red and black trees.

AVL Tree

To understand the red and black trees, first of all to know the AVL tree, to know the AVL tree, the first to know the binary tree, in fact, is very simple, the binary tree is each parent node below there are 0 one or two child nodes, roughly like.

We put the data in the binary tree when the number is larger than the parent node in the right node, the number of smaller than the parent node is placed on the left node, so that after we look at a number of times we need to compare it with the parent node, large then go to the right and recursive call, small go to the left recursion. But there is not enough, if the luck is very bad my data itself is orderly, such as "1,2,3,4,5,6,7", which will lead to the tree imbalance, the binary tree will degenerate into a linked list. So we launched the AVL tree.

The AVL tree, which is the balance tree, has improved the two-fork tree, and every time we insert a node, we must ensure that the tree height difference of the Saozi right subtree of each node does not exceed 1. If it is more than the balance, the specific balance operation is not here to say, is nothing more than four operations-left-handed, left-hand and then right-hand rotation, right-handed. It can end up with a two-fork tree on both sides of the tree, so that we can search by binary search and not degenerate into a linked list.

Two or three trees

There are many articles on the internet explaining the red and black trees, there are a variety of ways to explain, but bloggers like to put red and black trees with two or three of trees together. Let's take a look at what a two or three tree is.

Note: This image is from Baidu

In fact, it is well understood that the difference between a two or three tree and an ordinary binary tree is that he has two nodes and three nodes. There are two sub-nodes under the two node, two nodes can hold a value, and three nodes have three child nodes, and three nodes can hold two values. Here's a look at the two or three-number build.

Note: The picture is still from Baidu, Bo master drawing compared to garbage.
In fact, the construction of two or three trees is very simple, the M node in the figure is a two node, m to the left of the EJ node is a three node. Still big data on the right, small data on the left. At this point we weigh the tree if the number can be placed directly into the two node, directly into the, but if you just need to put in the three node, as in the picture, Z is exactly in the SX. Then we need to split the node into two nodes and refer to the middle number to the parent node, as if the x were placed next to r in the diagram. Of course, if the child node refers to the parent node, resulting in more than two of the parent node, continue to lift up until satisfied.

Red and black Trees

The red and black trees are very similar to the two or three trees, which is basically a variant of the two or three trees.
Red and black trees are traditionally defined as the following five characteristics are required:
(1) Each node is either black or red.
(2) The root node is black.
(3) Each leaf node (NIL) is black. [Note: Here the leaf node is the leaf node that is empty (nil or null)! ]
(4) If a node is red, its child nodes must be black.
(5) The same number of black nodes are included on all paths from one node to the descendant nodes of the node.
It is characterized by the addition of a color attribute to each node of the number, which is balanced by the color transform and the rotation of the node during the insertion process. In fact, bloggers are not very fond of the above definition, there is a perspective is to compare it with two or three trees.

Of course, the above picture is also searched.
Red and black trees can also be described as:
⑴ red links are left links.
⑵ does not have a single node connected to two red links at the same time.
⑶ The tree is perfectly black balanced, that is, the number of black links on the path to any empty link to the root node is the same.
The connections between nodes are red and black, replacing the definition of red and black nodes (essentially the same), defining the previous black height as equal to the number of black connections. More intuitive.
And, in fact, each step of the red and black tree operation corresponds to the operation of two or three trees, if the two node is a black connection, three nodes, the number of two is a red connection.

The advantages of red and black trees

Compared to the AVL tree, the red and black trees are almost as efficient in retrieval as they are by balancing the two points. However, it is much more efficient to insert and delete operations. Red black tree is not like the AVL tree to pursue the absolute balance, he allows a small amount of incomplete balance, so for the efficiency of small, but save a lot of unnecessary balance operation, AVL tree tuning balance is sometimes more expensive, so efficiency is not as red and black trees, in many places is the bottom of the world is red and black trees ~

Summarize

HashMap in the inside is a link list with a structure of red and black trees, so that the use of the linked list of memory usage and the efficient retrieval of red and black trees, is a very happy data structure.

Small welfare at the end of text

The author used C + + handwritten AVL tree implementation, sophomore data structure course design a bit of a friend can refer to.

Next stage Notice

Redis and Memcache Cache

Java Essays--hashmap and red-black trees

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More