Hashtable in PHP Kernel

Source: Internet
Author: User
Tags table definition
I. Hash table definition

A hash table (or a hash table) maps key names to a record in a table after hash calculation based on the specified hash function, this array is a hash table.
Here, hash refers to any function, such as MD5, CRC32, sha1, or your custom function implementation.

Ii. hashtable Performance

Hashtable is a data structure with high query performance. hashtable is implemented in many languages.
Ideally, the performance of hashtable is O (1), and the consumption of performance is mainly concentrated in Hash (key), which is used to directly locate records in the table.
In actual situations, key1 is often used! = Key2, but hash (key1) = hash (key2). In this case, the hash collision problem occurs. The lower the collision probability, the better the hashtable performance. Of course, the hash algorithm is too complex and affects hashtable performance.

3. Understand PHP Hash Table Implementation

Hashtable is also widely used in the PHP kernel, including thread security, global variables, and resource management.
In addition, arrays in PHP scripts (hashtable is the essence of PHP arrays) are also widely used, such as configuration files in the form of arrays and database query results, which can be said to be ubiquitous.
So since PHP's array usage is so high, how is it implemented internally? How does it solve hash collision and achieve even distribution? What should I pay attention to when using arrays in PHP scripts?

First, we will give a general understanding of the implementation of PHP hashtable through diagrams. Correction: when PHP solves the hahs conflict, the linked list uses a one-way linked list.
View the zend_hash_move_backwards_ex method of \ Zend \ zend_hash.c and the zend_hash_del_key_or_index method, the two-way linked list is actually used.


The following code is used for further analysis.

1) Implementation of hashtable in PHP Kernel

PHP implements hashtable mainly through two data structures: bucket and hashtable.
From the PHP script end, hashtable is equivalent to an array object, while bucket is equivalent to an element in an array object. In fact, a multi-dimensional array stores another hashtable in a bucket of hashtable.

Hashtable structure:
Typedef struct _ hashtable {uint ntablesize; // table length, not the number of elements uint ntablemask; // table mask, always equal to nTableSize-1 uint nnumofelements; // Number of stored elements ulong nnextfreeelement; // point to the next empty element location bucket * pinternalpointer; // during the foreach loop, it is used to record the currently traversed element location bucket * plisthead; bucket * plisttail; bucket ** arbuckets; // The stored element array dtor_func_t pdestructor; // The Destructor zend_bool persistent; // whether to save permanently. From this we can find that the PHP array can be permanently stored in the memory without reloading every request. Unsigned char napplycount; zend_bool bapplyprotection;} hashtable;


Bucket structure:
Typedef struct bucket {ulong h; // array index uint nkeylength; // string index length void * pdata; // actual data storage address void * pdataptr; // The introduced data storage address struct bucket * plistnext; struct bucket * plistlast; struct bucket * pnext; // the address of the next element of the bidirectional linked list struct bucket * plast; // address of the next element of the two-way linked list: Char arkey [1];/* must be last element */} bucket;


The hash function of the PHP kernel hash table is very simple. The result of (hashtable-> ntablesize & hashtable-> ntablemask) is used as the implementation of the hash function. This may also aim to reduce the complexity of the hash algorithm and improve the performance.


1.1) How to Create a hashtable
$array = new Array();
// Some code is omitted and the main logic zend_api int _ zend_hash_init (hashtable * HT, uint nsize, hash_func_t phashfunction, extends pdestructor, zend_bool persistent listener) {uint I = 3; bucket ** TMP; set_inconsistent (ht_ OK); If (nsize> = 0x80000000) {// the maximum length of the array is 2147483648 in decimal format/* prevent overflow */HT-> ntablesize = 0x80000000 ;} else {// The length of the array is rounded to the power of 2. // For example, if there are 10 elements in the array, the actually allocated hashtable length is 16. 100 elements are allocated with a length of 128 // the minimum length of hashtable is 8, rather than 0. Because the default value is to shift 1 to the right three places, 1 <3 = 8 while (1u <I) <nsize) {I ++ ;} ht-> ntablesize = 1 <I;} HT-> ntablemask = HT-> ntablesize-1 ;.... return success ;}
As shown in the preceding figure, even if an empty array or an array with less than 8 elements is initialized in PHP, eight hashtable lengths are created. Similarly, if you create an array of 100 elements, hashtable with a length of 128 will be allocated. And so on.

1.2) How does the kernel Add a digital index to PhP? In the PHP array, the key name can be a number or a string. In the kernel, only digital indexes are allowed. For string indexes, the kernel uses the time33 algorithm to convert the string to an integer. The specific implementation is described in detail below.
$array[0] = "hello hashtable";
// Some code is omitted and the main logic zend_api int _ partition (hashtable * HT, ulong H, void * pdata, uint ndatasize, void ** pdest, int flag zend_file_line_dc) is proposed) {ulong h; uint nindex; bucket * P; // some code is omitted, and the main logic nindex = H & HT-> ntablemask is proposed; P = HT-> arbuckets [nindex]; P = (bucket *) pemalloc_rel (sizeof (bucket)-1, HT-> persistent); If (! P) {return failure;} p-> nkeylength = 0;/* numeric indices are marked by making the nkeylength = 0 */p-> H = H; init_data (HT, p, pdata, ndatasize); If (pdest) {* pdest = p-> pdata;} HT-> arbuckets [nindex] = P; HT-> nnumofelements ++; return success ;}
The above also shows that the hash table hash function in the kernel is a simple H & HT-> ntablemask, where H represents the index number set in PHP, ntablemask is equal to the hash table allocation length-1.


1.3) How does the Kernel Handle string indexes in PHP?
$array['index'] = "hello hashtable";

Compared with the numeric index, only one step is needed to convert the string into an integer. The algorithm used is time33.
The following shows the algorithm implementation, which is the result of converting each character in a string to an ascii code multiplied by 33 and adding them together.

Static inline ulong zend_inline_hash_func (const char * arkey, uint nkeylength) {register ulong hash = 5381;/* variant with the hash unrolled eight times */For (; nkeylength> = 8; nkeylength-= 8) {hash = (hash <5) + hash) + * arkey ++; hash = (hash <5) + hash) + * arkey ++; hash = (hash <5) + hash) + * arkey ++; hash = (hash <5) + hash) + * arkey ++; hash = (hash <5) + hash) + * arkey ++; Ha SH = (hash <5) + hash) + * arkey ++; hash = (hash <5) + hash) + * arkey ++; hash = (hash <5) + hash) + * arkey ++;} switch (nkeylength) {Case 7: Hash = (hash <5) + hash) + * arkey ++;/* fallthrough... */case 6: Hash = (hash <5) + hash) + * arkey ++;/* fallthrough... */case 5: Hash = (hash <5) + hash) + * arkey ++;/* fallthrough... */case 4: Hash = (hash <5) + hash) + * arkey ++ ;/* Fallthrough... */Case 3: Hash = (hash <5) + hash) + * arkey ++;/* fallthrough... */Case 2: Hash = (hash <5) + hash) + * arkey ++;/* fallthrough... */Case 1: Hash = (hash <5) + hash) + * arkey ++; break; Case 0: break;} return hash ;} zend_hash.c // some code is omitted below, and the main logic zend_api int _ zend_hash_add_or_update (hashtable * HT, const char * arkey, uint nkeylength, void * pdata, uint ndatasize, void ** PDEs T, int flag zend_file_line_dc) {ulong h; uint nindex; bucket * P; H = zend_inline_hash_func (arkey, nkeylength); // string to integer nindex = H & HT-> ntablemask; P = HT-> arbuckets [nindex]; P = (bucket *) pemalloc_rel (sizeof (bucket)-1, HT-> persistent); If (! P) {return failure;} p-> nkeylength = 0;/* numeric indices are marked by making the nkeylength = 0 */p-> H = H; init_data (HT, p, pdata, ndatasize); If (pdest) {* pdest = p-> pdata;} HT-> arbuckets [nindex] = P; HT-> nnumofelements ++; return success ;}


2) how to achieve even distribution in the kernel and solve the hash Collision Problem

2.1) Even Distribution
Uniform Distribution refers to the distribution of all elements to be stored evenly to hashtable.
The function responsible for calculating the specific distribution to the table is what the hash function does. Therefore, the implementation of the hash function is directly related to the efficiency of even distribution.
As mentioned above, PHP kernel is implemented in a simple way: H & HT-> ntablemask;

2.1) hash collision

Hash collision means that key1 will appear in the value obtained by the hash algorithm! = Key2, but hash (key1) is equal to hash (key2), which is a collision problem.
In the PHP kernel, there will be key1! = Key2, but key1 & HT-> ntablemask is equal to key2 & HT-> ntablemask.
The PHP kernel uses a two-way linked list to store conflicting data. That is, the bucket itself is also a two-way linked list. When a conflict occurs, the data is sorted backward in order.
If no conflict occurs, the bucket is a two-way linked list with a length of 1.

Zend_api int zend_hash_find (const hashtable * HT, const char * arkey, uint nkeylength, void ** pdata) {ulong h; uint nindex; bucket * P; is_consistent (HT ); H = zend_inline_hash_func (arkey, nkeylength); nindex = H & HT-> ntablemask; P = HT-> arbuckets [nindex]; // when an element is found, it is not returned immediately, instead, compare H and nkeylength to prevent hash collision. This code is used to traverse the chain table until the end of the chain table. While (P! = NULL) {If (P-> H = h) & (p-> nkeylength = nkeylength) {If (! Memcmp (p-> arkey, arkey, nkeylength) {* pdata = p-> pdata; return success ;}} P = p-> pnext;} return failure ;}

Then, I will write an introduction to distributed storage using the hash algorithm.

Address: http://blog.csdn.net/a600423444/article/details/8850617

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.