Search Algorithm Summary (2) hash list, search algorithm summary list

In terms of time complexity, the average insertion, search, and deletion of the red and black trees all reach the time complexity of lgN.

Is there a Data Structure with higher search efficiency? The answer is that this article will introduce the Hash Table)

What is a hash table?

A hash table is a structure that stores data with a key-value (key-indexed). You only need to enter the value to be searched, that is, the key, to find the corresponding value.

The idea of hashing is very simple. If all keys are integers, you can use a simple unordered array: The key is used as the index, and the value is the corresponding value, in this way, you can quickly access the value of any key. This is for simple keys. We extend them to keys that can process more complex types.

There are two steps to use Hash Lookup:

**A hash table is a classic example of trade-offs between time and space.**. If there is no memory limit, you can directly use the key as the index of the array. The time complexity of all queries is O (1). If there is no time limit, we can use unordered arrays and perform sequential searches, so that we only need a small amount of memory. Hash Tables use moderate time and space to strike a balance between these two extremes. You only need to adjust the hash function algorithm to make a trade-off between time and space.

Hash Functions

The first step of hash search is to use the hash function**Key ing to index**. This ing function is a hash function. If we have an array that saves 0-m, we need an index that can convert any key into the range of the array (0 ~ M-1.**Hash functions must be easy to calculate and evenly distribute all keys.**. For example, in a simple example, the last three digits of a mobile phone number are better than the first three digits as the key, because the first three digits have a high repetition rate. For example, using an ID card number is better than using the first digit.

In reality, our keys are not all numbers, but may be strings and combinations of several values. Therefore, we need to implement our own hash function.

1. Positive Integer

Obtain**Positive Integer**The most common method of hash value is to use**Except the remaining remainder**. That is, for an array of prime numbers M, for any positive integer k, calculate the remainder of k divided by M. M is generally a prime number.

2. String

When using a string as the key, we can also use it as a large integer and use the retained division method. We can hash each character value of the string, for example

Public int GetHashCode (string str) {char [] s = str. toCharArray (); int hash = 0; for (int I = 0; I <s. length; I ++) {hash = s [I] + (31 * hash);} return hash ;}

The default Implementation of String in java is similar to this.

The hash value above is the method used by Horner to calculate the string hash value. The formula is as follows:

*H = s [0] · 31l-1 +... + S [L-3] · 312 + s [L-2] · 311 + s [L-1] · 310*

For example, to obtain the "call" hash value, unicode of string c is 99, unicode of string a is 97, and unicode of string c is 108, therefore, the hash value of the string "call" is 3045982 = 99 · 313 + 97 · 312 + 108 · 311 + 108 · 310 = 108 + 31 · (108 + 31 · (97 + 31 · (99 )))

If the hash value for each character may be time-consuming, you can use N characters to obtain the Hasse value at intervals to save time. For example, you can obtain the hash value every 8-9 characters:

public int GetHashCode(string str){ char[] s = str.ToCharArray(); int hash = 0; int skip = Math.Max(1, s.Length / 8); for (int i = 0; i < s.Length; i+=skip) { hash = s[i] + (31 * hash); } return hash;}

However, in some cases, different strings will generate the same Hash value, which is the Hash Collisions mentioned above, such as the following four strings:

If we hash each 8 characters, we will get the same hash value. So the following describes how to solve the hash collision:

Avoid hash conflicts and zipping

Through the hash function, we can convert keys to the index of the array (0-1), but when two or more keys have the same index value, we need a way to deal with such conflicts.

A direct method is to direct each element of an array of M to a linked list. each node in the linked list stores the hash value as the key-value pair of the index, this is the zipper method.

The implementation of hash Based on the zipper method is simple. It may be the most block (and the most widely used) symbol table in the application of key sequence is not important.

Linear Detection Method

Linear probing is an open addressing Method to Solve hash conflicts. The basic principle is to use an array of M to save N key-value pairs, where M> N, we need to use the space in the array to solve the collision conflict.

The simplest open addressing method is the linear Probing Method: When a collision occurs, the hash value of a key is occupied by another key, directly check the next position in the hash list to add the index value to 1. Three results will appear in this linear test:

An empty element (marked as null) can be used as a marker for the end of a search.

Like the zipper method, the performance of the three lists of open address classes also depends on the ratio of a = N/M, which is called the usage of the hash list.

For the zipper method, a is the length of each chain, so it is generally greater than 1. For the hash list based on the first probe, a is the proportion of the occupied space in the table, it cannot be greater than 1,

To ensure performance. We dynamically adjust the array size to ensure the usage is between 1/8 and 1/2.