Python data structure and algorithm 29-1 hash lookup

Source: Internet
Author: User
Tags ord

In the previous chapters, we used the relative position information of the elements in the dataset to improve the performance of the lookup algorithm.

For example, you know that the list is ordered and you can find it using binary points. In this section we go farther and create a data structure that improves the lookup performance to O(1). Called Hash lookup.

To do this, we need to know the possible location of the element. Assuming that each element is in the position where he should be, it is only necessary to find the answer once, but the following will be seen. That's not what happened.

A hash table is a collection of data in which an easy-to -find location exists when the element is saved. Each location in a hash table, commonly called a slot, can hold a data element and be named with an integer(from0Start). So we'll have0number slots. 1number of slots and so on. At the beginning. The hash table has no data and the slot is empty. This allows you to initialize the slot values when you build the hash tableNone, Figure4displays a size of Onethe hash table, or is that therema hash table of slots. mfrom0to the.

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvchl0ag9umjaxna==/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/center ">


the mapping between the elements in the diagram and the saved slots is called a hash function, and the hash function takes an element as a parameter and returns a0to them-1Integer as the slot name. If we have a set of integers Wu, -, the. -, theand the to, our first hash function is able to use the remainder method. Simply divide the element by the size of the table, returning the remainder as the hash value. (h(Item)=Item% One). Table4is the hash value of the integer set above.

Table 4:simple Hash Function Using remainders

Item

Hash Value

54

10

26

4

93

5

17

6

77

0

31

9


Note that the remainder method is generally stored in all of the hash functions in some form. Because its results must be within the slot range.

Once the hash value is computed, the element is inserted into the specified position in the hash table.

5 what you see, attention. 6 Slots and One slot is empty, this introduces the concept of full-load factor, which is generally expressed as:

λ= number of elements/hash table capacity


Over here. It is


Λ=6/11

Now when we look for it, we simply use the hash function to calculate the slot value. Then check in the table to see if it exists, and this lookup action is O(1), due to the time of the hash value being computed. And the time to find it in the table is a constant. Assume that everything is in its place. We have found a constant-level search algorithm.

you may have noticed that this technique works only when each element corresponds to a single location, for example, in the example above, suppose to add a - . Then its hash value is 0. But The value of the number is 0. Then the problem comes out. 2 values corresponding to the same slot, which is called "collision", is very obvious,collision The hashing technique is difficult, and we discuss it in detail later.


hash Function

For a given set of data, the hash function maps each element to a single slot. Called the "perfect hash function", suppose we know that elements and sets are fixed. It may be possible to construct a perfect hash function. The bad news is that for a random collection of data, there is no system method to construct the perfect hash function, the good news is. A hash function that is imperfect can also provide good performance.


Suppose you must have a perfect hash function, one way is to make a hash table so that every element has its own index.

Although it is possible to do so in the case of little data, it is not feasible to assume that the data is very large.

For example, suppose a data item is a 8-digit number, which requires 1 billion slots. It would be too much if we only used to keep the number of 25 students.

Our goal is: Collision minimum, simple calculation, uniform distribution. There are several scenarios for extending the remainder method, several of which are discussed below.

Folding method: This method divides the elements into equal pieces (the last one may not be equal). And then put the pieces together as a hash value. For example, our data item is number 436-555-4601, then the number should be divided into 2 groups, and then add up, that is, 43+65+55+46+01, get 210. If the hash table has 11 slots. So one more step with 11 except 210来 to get the slots. That is 210%11=1. So the hash value of the number 436-555-4601 is 1. Some folding methods take one step further, before adding, the data bit order is reversed, in the example above. That is, 43+56+55+64+01=219 calculates 219 11=10.

There is also an algorithm called "Square take in the method", first calculate the square value of the element, then extract several numbers from it. For example, for element 44. Calculate 442=1936 first. Extract the middle two bit 93, then take the remainder method, get 5 (93%11=5)

Table 5:comparison of remainder and Mid-square Methods

Table 5 Comparison between the remainder method and the square take method

Item

Remainder

Mid-square

54

10

3

26

4

7

93

5

9

17

6

8

77

0

4

31

9

6

A hash function can also be created for character class elements, and the word cat can be seen as a number string

Ord (' C ')
99
Ord (' a ')
97
Ord (' t ')
116

We add up these three numbers and use the remainder method to calculate the hash value.

The following is a function that computes the hash value of a string:

Listing 1

def Hash (Astring, Tablesize):
sum = 0
 for inch Range (len(astring)):
sum = sum + Ord (Astring[pos])
return sum %tablesize

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvchl0ag9umjaxna==/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/center ">

Interestingly, in the above algorithm. The same letters in different order of words get the same hash value, the solution is to add the position of the letter as the weight. Figure 7 shows the use of the position as a weight factor. The modified hash function as an exercise.

You can also think of several ways to calculate hashes, but it is essential to remember that hashing functions must be simple and efficient. Cannot be the main burden of computation. Assuming that the hash function is too complex, the time to calculate the slot name exceeds the time of a simple sequential lookup or a two-minute lookup. So what's the point of a hash function?




Python data structure and algorithm 29-1 hash lookup

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.