In-depth understanding of the PHP kernel (vi) hash table and the implementation of PHP hashes
Original link: http://www.orlion.ga/241/
One, hash table (HashTable)
A hash table is used in most of the implementations of dynamic languages, and a hash table is a hash function that maps a particular key to a particular worth of data
Structure, which maintains a one by one correspondence between the key and the value.
Key: A marker used to manipulate data, such as an index in a PHP array, a string key, and so on.
Slot (Slot/bucket): A cell in a hash table that is used to hold data, that is, the container in which the array is actually stored.
hash function: A function that maps a key to the location of the slot where the data should be stored.
Hash conflict (Hash collision): A hash function maps two different keys to the same index.
There are two methods for resolving hash Conflicts: Link method and open addressing method.
1. Conflict resolution
(1) Linking method
The link method resolves the conflict by using a linked list to hold the slot values, that is, when different keys are mapped to a slot, use the linked list
To save the values. (This is the way PHP is used);
(2) Open addressing method
The use of open addressing is that the slot itself holds the data directly, and when the data is inserted, if the index to which the key is mapped already has data, this indicates a conflict,
This will look for the next slot, and if the slot is also occupied, continue looking for the next slot until it finds a slot that is not occupied, as is the case when looking for it.
2, the implementation of the hash table
The implementation of the hash table is mostly done with only three points:
* Implement hash function
* Resolution of conflicts
* Implementation of the Operation interface
(1) Data structure
First we need a container to Cao Cun our hash table, the hash table needs to save the content is mainly saved in the data, at the same time, in order to conveniently know the number of elements stored in the Hashtable, you need to save a small section, the second is to save the data container. The following will implement a simple hash table, the basic data structure of the main two, one for saving the hash table itself, and the other is used to actually save the data of the single linked list, defined as follows:
typedef struct _bucket{ char *key; void *value; struct _bucket *next; } buckets; typedef struct _hashtable{ int size; bucket* buckets;} HashTable;
The above definition is similar to the implementation in PHP, in order to simplify the key's data type as a string, and the stored structure can be any type.
The bucket structure is a single linked list, which is to solve the hash conflict. Link conflicting elements When multiple keys are mapped to the same index
(2) Hash function implementation
We implement the simplest hashing algorithm by adding all the characters of the key string and then modulo the hash table size with the result so that the index falls within the range of the array index.
static int hash_str (char *key) { int hash = 0; char *cur = key; while (* (cur++)! = ' + ') { hash + = *cur; } return hash;} Use this macro to obtain the key in the hash Table of index # define HASH_INDEX (HT, key) (Hash_str (key))% (HT)->size)
The hash algorithm used by PHP is called djbx33a. In order to manipulate a hash table, several operation functions are defined:
int Hash_init (HashTable *ht); Initialize hash table int Hash_lookup (HashTable *ht, char *key, void **result); Find content based on key int Hash_insert (HashTable *ht, char *key, void *value); Interpolates the contents of the hash table in int hash_remove (HashTable *ht, Char *key); Delete the content that key points to int Hash_destroy (HashTable *ht);
The following is an example of an insert and get Operation function:
int Hash_insert (HashTable *ht, char *key, void *value) {//check if we need to res Ize the Hashtable resize_hash_table_if_needed (HT); Hash table is not fixed size, when inserted content quickly fills the hash table's storage space//will expand the hash table to accommodate all elements int index = HASH_INDEX (HT, Key); Find the index to which the key is mapped Bucket *org_bucket = ht->buckets[index]; Bucket *bucket = (bucket *) malloc (sizeof (bucket)); Request space for new element bucket->key = StrDup (key); Save the contents of the value, here is simply to point the pointer to the content to be stored, and not copy the content bucket->value = value; Log_msg ("Insert data p:%p\n", value); Ht->elem_num + = 1; Record the number of elements in the hash table now if (org_bucket! = NULL) {//Collide, place new elements in the list's head log_msg ("Index collision found with org Hasht Able:%p\n ", org_bucket); Bucket->next = Org_bucket; } ht->buckets[index]= buckets; Log_msg ("Element inserted at index%i, now we have:%i elements\n", index, Ht->elem_num); return SUCCESS;}
Find the location of the element first, and if there is an element, then the key of all elements in the list and the key to be searched are compared in turn until a consistent element is found, otherwise the value has no matching content.
int Hash_lookup (HashTable *ht, char *key, void **result) { int index = HASH_INDEX (HT, key); Bucket *bucket = ht->buckets[index]; if (bucket = = NULL) return FAILED; Find this list in order to find the correct element, usually this list should be only one element, it will be different multiple loops //To ensure that there is a suitable hashing algorithm. While (bucket) { if (strcmp (Bucket->key, key) = = 0) { log_msg ("HashTable found key in index:%i with key:%s Value:%p \ n ", index, key, bucket->value); *result = bucket->value; return SUCCESS; } Bucket = bucket->next; } Log_msg ("HashTable lookup missed the key:%s\n", key); return FAILED;}
The array in PHP is implemented based on a hash table, in order to add elements to the array, the elements are ordered, and here the hash table is physically almost evenly distributed, so that it is not possible to obtain these elements according to the order of insertion, In the implementation of PHP, the bucket structure also maintains another pointer field to maintain the relationship between elements.
Second, the PHP hash table implementation
1, the hash implementation of PHP
The hash table in PHP is a very important data interface, basically most of the language features are based on the hash table, for example: the scope of variables and the storage of variables, the implementation of the class and the Zend engine internal data are many are stored in the Hashtable.
(1) Data structure and description
Zend uses a doubly linked list to hold data in order to preserve the relationship between data
(2) Hash table structure
PHP hash table implementation in ZEND/ZEND_HASH.C, PHP using the following two data structures to implement a hash table, hashtable structure for the entire hash table to hold the basic information required, and the bucket structure for the preservation of specific data content, as follows:
typedef struct _HASHTABLE { uint ntablesize; The size of the hash bucket, the minimum is 8, to 2x growth uint Ntablemask; NTableSize-1, optimization of index value uint nnumofelements; The number of elements currently present in the hash bucket, and the count () function will return this value directly to ulong nnextfreeelement;//position of the next numeric index Bucket *pinternalpointer; The currently traversed pointer (one of the causes of foreach is faster than for fast) Bucket *plisthead; Store the number of head element pointers bucket *plisttail; Storage array tail element pointer bucket **arbuckets; Store hash array dtor_func_t pdestructor; Zend_bool Persistent; unsigned char napplycount; Mark the number of times the current hash bucket has been accessed recursively (prevent multiple recursion) Zend_bool bapplyprotection;//mark the current hash bucket allows multiple accesses not allowed, when not allowed, up to recursive 3 this # if Zend_ DEBUG int inconsistent; #endif} HashTable;
The Ntablesize field is used to indicate the capacity of the hash table, and the hash table has a minimum initialization capacity of 8. First look at the initialization function of the hash table:
Zend_api int _zend_hash_init (HashTable *ht, uint nSize, hash_func_t phashfunction, dtor_func_t pdestructor, Zend_ BOOL Persistent zend_file_line_dc) { UINT i = 3; //... if (nSize >= 0x80000000) {/ * prevent overflow */ ht->ntablesize = 0x80000000; } else {while (1U & lt;< i) < nSize) { i++; } Ht->ntablesize = 1 << i; } // ... Ht->ntablemask = ht->ntablesize-1; /* Uses Ecalloc () so that bucket* = = NULL */ if (persistent) { TMP = (Bucket * *) calloc (ht->ntablesize, Sizeo F (Bucket *)); if (!tmp) { return FAILURE; } Ht->arbuckets = tmp; } else { TMP = (bucket * *) Ecalloc_rel (ht->ntablesize, sizeof (bucket *)); if (TMP) { ht->arbuckets = tmp; } } return SUCCESS;}
For example, if you set the initial size to 10, the algorithm above will resize to 16. That is, always resize to a whole number of 2 near the initial size
Why is it so adjusted? Let's take a look at how Hashtable maps hashes to slots:
h = Zend_inline_hash_func (Arkey, nkeylength); nIndex = h & ht->ntablemask;
From the _zend_hash_init () function above, the size of the Ht->ntablemask is ht->ntablesize–1. The use of & instead of the modulo is used here because the relative cost of the modulo operation and the bitwise AND operation are much larger.
After setting the size of the hash table, you need to request storage space for the hash table, such as the above initialization code, depending on whether the need to persist and invoke a different memory application method, it is necessary to persist in the previous PHP life cycle is described: persistent content can be accessed between multiple requests, In the case of non-persistent storage, the occupied space is freed at the end of the request. Specific content will be detailed in memory management
The Nnumofelements field in Hashtable a good understanding that this field is updated every time an element is inserted or unset deleted, so that the count () function can be quickly returned when it counts the number of elements in the array.
The Nnextfreeelement field is useful to look at a PHP code first:
' Hello '); $a [] = ' TIPI '; var_dump ($a); Ouputarray (2) { [10]=> string (5) "Hello" [11]=> string (5) "TIPI"}
In PHP, you can add elements to an array without specifying an index value, and the number is used by default as an index, similar to the enumeration in C, where the index of the element is more or less determined by the Nnextfreeelement field. If there is a numeric key in the array, the most recently used key+1 will be used by default, as the previous example already has an element of 10 as key, so that the newly inserted default index is 11.
Here's a look at the volume of the slot data structure that holds the hash table:
typedef struct BUCKET { ulong H; The value of the hash after the Char *key, or the user-specified numeric index value uint Nkeylength; The length of the hash keyword, if the array index is a number, this value is 0 void *pdata; Point to value, which is typically a copy of the user's data, or, if it is pointer data, to pdataptr void *pdataptr; If it is an array of pointers, this value points to true value, while above pdata points to this value . struct bucket *plistnext; The next element of the entire hash table is the struct bucket *plistlast; The last element of the entire hash table is a struct bucket *pnext; The next element in the same hash bucket is the struct bucket *plast; The previous element stored in the same hash bucket char arkey[1]; /* stores the character index, which must be placed at the very end, because only 1 bytes are defined here, which is actually a value that points to char *key, which means that the cost of re-assignment can be omitted, and sometimes this value is not needed, so it also saves space at the same time. */} Buckets;
such as the comments for each field above. The H field holds the hash value of the hash table key. In PHP, you can use a string or a number as an index to an array. Because the index of the number is unique. If you do a hash again, it will be very wasteful. The Nkeylength field after the H field is marked as the key length, and if the index is a number, then Nkeylength is 0. When you define an array in PHP, the string can be converted to a number. So in PHP for example ' 10 ', ' 11 ' such as character index and numeric index 10,11 no difference
The bucket structure maintains two doubly linked lists, with the Pnext and Plast pointers pointing to the relationship of the linked list where the slots are located.
The Plistnext and plistlast pointers, however, point to a link between all the data in the entire hash table. The Plisthead and Plisttail in the Hashtable structure maintain pointers to the head element and the last element of the entire hash table
Operation Interface of hash table:
PHP provides the following types of operation interfaces:
Initialization operations, such as the Zend_hash_init () function, are used to initialize the hash table interface, allocate space, and so on.
Find, insert, delete, and update operation interfaces, which are more general operations.
Iterations and loops, such interfaces are used to iterate over a hash table operation.
Copy, sort, invert and destroy operations.
http://www.bkjia.com/PHPjc/1115246.html www.bkjia.com true http://www.bkjia.com/PHPjc/1115246.html techarticle deep understanding of PHP kernel (vi) hash table and PHP implementation, in-depth understanding of the original link: http://www.orlion.ga/241/, hash table (HashTable) Most of the dynamic language implementation of the ...