Python data structure and algorithm--hash table

Source: Internet
Author: User
Tags add time

Hash Table Learning notes

Reference translation from: "Complexity thinking" and the corresponding online version: http://greenteapress.com/complexity/html/thinkcomplexity004.html

Use hash tables for very fast lookup operations, time constants, and no element ordering

Python's built-in data type: dictionary, which is implemented with a hash table

To explain how the hash table works, let's try to implement the hash table structure without using a dictionary.

We need to define a data structure that contains a key-value mapping, while implementing the following two operations:

Add (k, v):

ADD a new item that maps from key K to value v.

With a Python dictionary,D, this operation is written d[k] = v.

get (target):

Look up and return the value, corresponds to key target.

With a Python dictionary, D, this operation is written D[target] or d.get (target).

A simple implementation method is to establish a linear table, using tuples to achieve key-value mapping relationship

1 classLinearmap (object):2     """linear table Structure"""3     def __init__(self):4Self.items = []5     6     defAdd (self, k, V):#adding elements to a table7 Self.items.append ((k,v))8     9     defGet (self, k):#find elements in a linear fashionTen          forKey, ValinchSelf.items: One             ifKey==k:#key exists, return value, otherwise throws an exception A                 returnVal -         RaiseKeyerror

We can keep the items list in order when we use Add to add elements, and take a binary look-up when using get , with a time complexity of O (log n). However, inserting a new element into the list is actually a linear operation, so this method is not the best approach. At the same time, we still do not meet the requirements of constant lookup times.

We can make the following improvements by splitting the total query table into several smaller lists, such as 100 sub-segments. The hash value of a key is obtained through the hash function, which is then calculated to add or find to which sub-segment. The time is greatly shortened relative to searching the list from the beginning. Although the get operation's growth is still linear, the Bettermap class allows us to move closer to the hash table:

1 classBettermap (object):2     """build faster query tables with LINEARMAP objects as sub-tables"""3     def __init__(self,n=100):4Self.maps = []#General Form5          forIinchRange (N):#Create n empty sub-tables based on the size of n6 Self.maps.append (Linearmap ())7     8     defFind_map (self,k):#Calculate index value by hash function9Index = hash (k)%Len (self.maps)Ten         returnSelf.maps[index]#returns a reference to an indexed child table One  A     #find the appropriate sub-table (Linearmap object) to add and find -     defAdd (self, k, V): -m =Self.find_map (k) the M.add (k,v) -      -     defGet (self, k): -m =Self.find_map (k) +         returnM.get (k)

Test it:

1 if __name__=="__main__":2Table =Bettermap ()3Pricedata = [("Hohner257", 257),4("SW1664", 280),5("SCX64", 1090),6("SCX48", 830),7("Super64", 2238),8("CX12", 1130),9("Hohner270", 620),Ten("f64c", 9720), One("S48", 1988)] A      -      forItem, PriceinchPricedata: -Table.add (K=item, v=Price ) the      -     PrintTable.get ("CX12") -     #>>> 1130 -     PrintTable.get ("QIMEI1248") +     # >>> Raise Keyerror

Since the hash value of each key must be different, so the value of the hash value is basically different.

When n=100, Bettermap's search speed is about 100 times times that of Linearmap.

Obviously, the Bettermap's search speed is limited by the parameter n, while the length of each linearmap is not fixed, so that the elements in the sub-segments remain linear lookups. If we were able to limit the maximum length of each sub-segment so that the time-to-responsibility of finding in a single sub-segment had a fixed upper limit, the time complexity of the Linearmap.get method became a constant. As a result, we only need to trace the number of elements, each time the number of elements in a linearmap exceeds the threshold, the entire Hashtable is re-ranked, while adding more linearmap, this way to ensure that the lookup is a constant.

The following is the implementation of Hashtable:

1 classHashMap (object):2     def __init__(self):3         #Initialize the table with a table with a capacity of 2 (with two sub-tables)4Self.maps = Bettermap (2)5Self.num = 0#number of data in table6     7     defGet (self,k):8         returnSelf.maps.get (k)9     Ten     defAdd (self, k, V): One         #if the current number of elements reaches the critical value (total number of child tables), the rearrangement operation A         #Expand the total table and increase the number of child tables by two times the number of current elements!  -         ifSelf.num = =Len (self.maps.maps): - self.resize () the          -         #add a new element to the Self.map after the reflow - Self.maps.add (k, v) -Self.num + = 1 +          -     defResize (self): +         """Rearrange operations, add new tables, pay attention to the need for linear time rearrangement""" A         #first create a new table, number of sub-tables = 2 * Number of elements atNew_maps = Bettermap (Self.num * 2) -          -          forMinchSelf.maps.maps:#retrieving each old child table -              forKvinchM.items:#Copy the elements of a child table to a new child table - New_maps.add (k, v) -          inSelf.maps = New_maps#make the current table a new table

Focus on the Add section, which examines the number of elements and the size of the Bettermap, and, if they are equal, the average number of elements in each linearmap is 1, and then calls the Resize method.

Resize creates a new table that is twice times the size of the original, and then hashes the elements in the old table "Rehashes" again, and puts them in the new table.

The resize process is linear and does not sound as good as the hashtable we require has a constant time. However, it is necessary to know that we do not need to perform frequent rearrangement operations, so the add operation is constant for most of the time and is occasionally linear. Because the total time of the add operation on n elements is proportional to n, the average time of each add is a constant!

Suppose we want to add 32 elements, the process is as follows:

1. Since the initial length is 2, the first two add does not need to reflow, the 1th, 2 times the total time is 2

2.3rd Add, Reflow to 4, time 2, 3rd times 3

3.4th time Add, time 1 so far, total 6

4.5th Add, reflow to 8, time 4, 5th times 5

5. The 6th to 8th time is a total of 3 so far, the total time is 6+5+3 = 14

6.9th Add, rearrange 16, time 8, 9th time is 9

7. The 10th to 16th time, a total of 7, so far, the total time is 14+9+7 = 30

After 32 times Add, the total time is 62 units of time, from the above process can find a rule, after n elements add, when n is the power of 2, the current total unit time is 2n-2, so the average add time is absolutely less than 2 unit time.

When n is a power of 2, for the most appropriate quantity, when n becomes large, the average time is slightly higher, but importantly, we reach O (1).

Python data structure and algorithm--hash table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.