Hash table of Data Structure and algorithm 07

Last Update:2016-04-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A hash table is also called a hash table. It is a data structure directly accessed based on the key value. That is to say, It maps the keyword value to a location to access records to speed up the search. This ing function is called a hash function (also called a hash function). The ing process is called hashing, And the array storing records is called a hash list. For example, we can use the following method to map the keyword to the subscript of the array: arrayIndex = hugeNumber % arraySize.

After hashing, a problem is inevitable, that is, for different keywords, the same hash address may be obtained, that is, the same Array subscript. This phenomenon is called conflict, so how can we deal with conflicts? One method is the open address method, that is, the system method is used to find another blank space in the array and fill in the data, instead of using the array subscript obtained by the hash function, because data already exists at this position, another method is to create an array that stores the linked list, and the data is not directly stored in the array. In this case, when a conflict occurs, the new data item is directly connected to the linked list referred to by this array subscript. This method is called the link address method. The two methods are discussed below.

For the open address method, we first look at the linear detection method, the so-called linear detection, that is, linear search for Blank units. If 21 is the location where the data is to be inserted, but it is occupied, 22, 23, and so on. The subscript of the array increments until blank spaces are found. The following is the implementation code of the hash table based on the linear Probing Method:

Public class HashTable {private DataItem [] hashArray; // The DateItem class is a data item that encapsulates the data information private int arraySize; private int itemNum; // how many private DataItem nonitems are currently stored in the array; // public HashTable () {arraySize = 13; hashArray = new DataItem [arraySize] used to delete items; nonItem = new DataItem (-1); // deleted item key is-1} public boolean isFull () {return (itemNum = arraySize);} public boolean isEmpty () {return (itemNum = 0);} public Void displayTable () {System. out. print ("Table:"); for (int j = 0; j <arraySize; j ++) {if (hashArray [j]! = Null) {System. out. print (hashArray [j]. getKey () + "");} else {System. out. print ("**") ;}} System. out. println ("");} public int hashFunction (int key) {return key % arraySize; // hash function} public void insert (DataItem item) {if (isFull ()) {// extended hash table System. out. println ("the hash table is full and is rehashed .. "); extendHashTable ();} int key = item. getKey (); int hashVal = hashFunction (key); while (hashArray [hashVal]! = Null & hashArray [hashVal]. getKey ()! =-1) {++ hashVal; hashVal % = arraySize;} hashArray [hashVal] = item; itemNum ++;}/** the array has a fixed size and cannot be expanded, therefore, the extended hash table can only create another larger array, and then insert the data in the old array into the new array. However, the hash table calculates the location of the given data based on the array size. Therefore, these data items cannot be directly copied because they are no longer placed in the same position as the old array, you need to traverse the old array in order and insert each data item into the new Array Using the insert method. This is called re-hashing. This is a time-consuming process, but this process is required if the array is to be extended. */Public void extendHashTable () {// extended hash table int num = arraySize; itemNum = 0; // record, because the original data will be transferred to the new expanded array, arraySize * = 2; // The array size doubles DataItem [] oldHashArray = hashArray; hashArray = new DataItem [arraySize]; for (int I = 0; I <num; I ++) {insert (oldHashArray [I]) ;}} public DataItem delete (int key) {if (isEmpty ()) {System. out. println ("Hash table is empty! "); Return null;} int hashVal = hashFunction (key); while (hashArray [hashVal]! = Null) {if (hashArray [hashVal]. getKey () = key) {DataItem temp = hashArray [hashVal]; hashArray [hashVal] = nonItem; // nonItem indicates an empty Item, and its key is-1itemNum --; return temp; }++ hashVal; hashVal % = arraySize;} return null;} public DataItem find (int key) {int hashVal = hashFunction (key ); while (hashArray [hashVal]! = Null) {if (hashArray [hashVal]. getKey () = key) {return hashArray [hashVal] ;}++ hashVal; hashVal % = arraySize;} return null ;}} class DataItem {private int iData; public DataItem (int data) {iData = data;} public int getKey () {return iData ;}}

Linear detection has a drawback, that is, data may be clustered. Once clustering is formed, it will become larger and larger. Data items that are hashed and fall within the clustering range must be moved step by step and inserted at the end of the aggregation, so that the aggregation will become larger. The larger the cluster, the faster it grows. This results in a part of the hash table containing a large number of aggregation, while the other part is sparse.

To solve this problem, we can use a secondary Probe: a secondary probe is a way to prevent aggregation. The idea is to detect a unit that is far apart, rather than a unit adjacent to the original location. In linear detection, if the original subscript of hash function calculation is x, the linear test is x + 1, x + 2, x + 3, and so on. In the secondary test, the probe process is x + 1, x + 4, x + 9, x + 16, and so on. The distance to the original position is the square of the number of steps. Although secondary detection eliminates the original clustering problem, it produces another more detailed clustering problem, called secondary clustering: for example, inserting 184,302,420 and 544 into the table in sequence, their ing is 7, so 302 needs to be tested in step 1, 420 needs to be tested in Step 4, and 544 needs to be tested in Step 9. As long as a keyword is mapped to 7, a longer step is required for testing. This phenomenon is called secondary aggregation. Secondary clustering is not a serious problem, but it is not often used in secondary detection because there are good solutions, such as the re-Hash method.

To eliminate original aggregation and secondary aggregation, one method is to generate a test sequence that depends on the keyword, instead of the same for each keyword. That is, different keywords can use different test sequences even if mapped to the same Array subscript. Then, the hash method uses different hash functions to hash the keywords and uses the result as the step size. For the specified keywords, The step size remains unchanged throughout the test, different keywords use different step sizes and experience descriptions. The second hash function must have the following features:

1. Different from the first hash function;

2. Do not output 0 values (otherwise, there is no step size. Every exploration is in the same position and the algorithm will enter an endless loop ).

Experts have found that the following hash functions work very well: stepSize = constant-key % constant; where constant is the prime number and smaller than the array capacity.

The hash method requires that the table capacity be a prime number. If the table length is 15 (0-14) and the table is not a prime number, a specific keyword is mapped to 0 and the step size is 5, the probe sequence is 0, 5, 10, 0, 5, 10, and so on. The algorithm only tries these three units, so it is impossible to find some blank units, and eventually the algorithm crashes. If the array capacity is 13, prime number, the probe sequence will eventually access all units. That is, continue, as long as there is a blank space in the table, you can detect it. Next let's look at the hash code:

Public class HashDouble {private DataItem [] hashArray; private int arraySize; private int itemNum; private DataItem nonItem; public HashDouble () {arraySize = 13; hashArray = new DataItem [arraySize]; nonItem = new DataItem (-1);} public void displayTable () {System. out. print ("Table:"); for (int I = 0; I <arraySize; I ++) {if (hashArray [I]! = Null) {System. out. print (hashArray [I]. getKey () + "");} else {System. out. print ("**") ;}} System. out. println ("");} public int hashFunction1 (int key) {// first hash functionreturn key % arraySize;} public int hashFunction2 (int key) {// second hash functionreturn 5-key % 5;} public boolean isFull () {return (itemNum = arraySize);} public boolean isEmpty () {return (itemNum = 0);} public void insert (DataIt Em item) {if (isFull () {System. out. println ("the hash table is full and is rehashed .. "); extendHashTable ();} int key = item. getKey (); int hashVal = hashFunction1 (key); int stepSize = hashFunction2 (key); // use hashFunction2 to calculate the number of probe steps while (hashArray [hashVal]! = Null & hashArray [hashVal]. getKey ()! =-1) {hashVal + = stepSize; hashVal % = arraySize; // perform backward detection with the specified number of steps} hashArray [hashVal] = item; itemNum ++ ;} public void extendHashTable () {int num = arraySize; itemNum = 0; // record, because the original data needs to be transferred to the new expanded array. arraySize * = 2; // double the array size DataItem [] oldHashArray = hashArray; hashArray = new DataItem [arraySize]; for (int I = 0; I <num; I ++) {insert (oldHashArray [I]) ;}} public DataItem delete (int key) {if (isEmpty () {System. ou T. println ("Hash table is empty! "); Return null;} int hashVal = hashFunction1 (key); int stepSize = hashFunction2 (key); while (hashArray [hashVal]! = Null) {if (hashArray [hashVal]. getKey () = key) {DataItem temp = hashArray [hashVal]; hashArray [hashVal] = nonItem; itemNum --; return temp;} hashVal + = stepSize; hashVal % = arraySize;} return null;} public DataItem find (int key) {int hashVal = hashFunction1 (key); int stepSize = hashFunction2 (key ); while (hashArray [hashVal]! = Null) {if (hashArray [hashVal]. getKey () = key) {return hashArray [hashVal];} hashVal + = stepSize; hashVal % = arraySize;} return null ;}}

In the open address method, we use the hash method to find a vacant space to solve the conflict problem. The other method is to set a linked list (that is, the link address method) in each unit of the hash table ), the keyword value of a data item is mapped to the unit of the hash table as usual, and the data item itself is inserted into the linked list of this unit. Other data items mapped to this position only need to be added to the linked list, and do not need to find vacancies in the original array. Let's take a look at the code of the link address method:

Public class HashChain {private SortedList [] hashArray; // The chain table private int arraySize in the array; public HashChain (int size) {arraySize = size; hashArray = new SortedList [arraySize]; // new each empty linked list initialization array for (int I = 0; I <arraySize; I ++) {hashArray [I] = new SortedList ();}} public void displayTable () {for (int I = 0; I <arraySize; I ++) {System. out. print (I + ":"); hashArray [I]. displayList () ;}} public int hashFunction (int key) {return key % arraySize;} public void insert (LinkNode node) {int key = node. getKey (); int hashVal = hashFunction (key); hashArray [hashVal]. insert (node); // directly add it to the linked list.} public LinkNode delete (int key) {int hashVal = hashFunction (key); LinkNode temp = find (key ); hashArray [hashVal]. delete (key); // find the data item to be deleted from the linked list and directly delete return temp;} public LinkNode find (int key) {int hashVal = hashFunction (key ); linkNode node = hashArray [hashVal]. find (key); return node ;}}

The code for the Linked List class is as follows:

public class SortedList {private LinkNode first;public SortedList() {first = null;}public boolean isEmpty() {return (first == null);}public void insert(LinkNode node) {int key = node.getKey();LinkNode previous = null;LinkNode current = first;while(current != null && current.getKey() < key) {previous = current;current = current.next;}if(previous == null) {first = node;}else {node.next = current;previous.next = node;}}public void delete(int key) {LinkNode previous = null;LinkNode current = first;if(isEmpty()) {System.out.println("chain is empty!");return;}while(current != null && current.getKey() != key) {previous = current;current = current.next;}if(previous == null) {first = first.next;}else {previous.next = current.next;}}public LinkNode find(int key) {LinkNode current = first;while(current != null && current.getKey() <= key) {if(current.getKey() == key) {return current;}current = current.next;}return null;}public void displayList() {System.out.print("List(First->Last):");LinkNode current = first;while(current != null) {current.displayLink();current = current.next;}System.out.println("");}}class LinkNode {private int iData;public LinkNode next;public LinkNode(int data) {iData = data;}public int getKey() {return iData;}public void displayLink() {System.out.print(iData + " ");}}

In the case of no conflict, the insertion or deletion operations in the hash table can reach the O (1) Time level, which is quite fast. If there is a conflict, the access time depends on the later length. When searching or deleting the data, you must judge it one by one, but the worst is the O (N) level.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hash table of Data Structure and algorithm 07

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hash table of Data Structure and algorithm 07

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support