Python Dictionary implementation

Last Update:2014-09-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Python dictionaries is implemented as hash tables.
Hash tables must allow for hash collisions i.e. even if both keys have same hash value, the implementation of the Table must has a strategy to insert and retrieve the key and value pairs unambiguously.
Python Dict uses open addressing to resolve hash collisions (explained below) (see dictobject.c:296-297).
Python Hash table is just a continguous block of memory (the sort of like an array, so can do O(1) lookup by index).
Each slot in the table can store one and only one entry. This is important
Each entry in the table actually a combination of the three values-. Implemented as a C struct (see DICTOBJECT.H:51-56)

The figure below is a logical representation of a Python hash table. In the figure below, 0, 1, ..., I, ... on the left is indices of the slots in the hash table (they is just for Illustrative purposes and is not stored along with the table obviously!).

# Logical model of Python Hash table-+-----------------+0| <Hash|Key|Value>|-+-----------------+1| ... |-+-----------------+.|   ...  |-+-----------------+i| ...  |-+-----------------+.|   ...  |-+-----------------+n| ...  |-+-----------------+

When a new dict was initialized it starts with 8 slots. (See DICTOBJECT.H:49)
When adding entries to the table, we start with some slots, that's based on the hash of the i key. CPython uses initial i = hash(key) & mask . where mask = PyDictMINSIZE - 1 , but this ' s not really important). Just Note that the initial slots, I, that's checked depends on the hash of the key.
If that slot was empty, the entry is added to the slot (by entry, I mean, ). But what if the slot is occupied!? Most likely because another entry have the same hash (hash collision!)

 
 If The slot is occupied, CPython (and even pypy) compares the   the hash and the key    by comp is I mean   = =   comparison not the   was   comparison) of the entry in the slot AG Ainst the key of the current entry to be inserted (dictobject.c:337,344-345). if   both   match, then it thinks the entry already exists, gives up and moves on to the next entry to be I Nserted. If either hash or the key don ' t match, it starts  probing . 
 
Probing just means it searches the slots by slot into the find an empty slot. Technically we could just go one by one, i+1, i+2, ... and use the first available one (that's linear probing). But for reasons explained beautifully in the comments (see dictobject.c:33-126), CPython uses random probing. In the random probing, the next slot was picked in a pseudo random order. The entry is added to the first empty slot. For this discussion, the actual algorithm used to pick the next slot are not really important (seedictobject.c:33-126 for t He algorithm for probing). What's important is, the slots is probed until first empty slots is found.
 
The same thing happens for lookups, just starts with the initial slots I (where I depends on the hash of the key). If the hash and the key both don ' t match the entry in the slot, it starts probing, until it finds a-slot with a match. If all slots is exhausted, it reports a fail.
 
BTW, the dict'll be resized if it's two-thirds full. This avoids slowing down lookups. (seedictobject.h:64-65)

The rules that python implements are: In the initial case, the hash table size of dict is 8 (pydict_minsize constant), and when the dict hash table usage reaches 2/3, it will resize to ensure fewer index collisions. When the number of keys is less than 50k,size*4, when the number of keys is greater than 50k,size*2. It is important to note that all keys are reinserted each time resize (from the perspective of the detection algorithm above, I changes and index needs to be recalculated), so the order of key is likely to change again.

English part http://stackoverflow.com/questions/327311/how-are-pythons-built-in-dictionaries-implemented

Chinese part http://zhoutall.com/archives/497

Concrete implementation of http://www.laurentluce.com/posts/python-dictionary-implementation/

Python Dictionary implementation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Dictionary implementation

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support