Php kernel parsing: hash table in PHP

Last Update:2018-04-01 Source: Internet

Author: User

Tags key string

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The most frequently used data types in PHP are non-strings and arrays. PHP is easier to use and also benefits from the flexible array types. It is necessary to introduce the hash table (HashTable) before giving a detailed introduction to these data types ). Hash tables are especially critical data structures in PHP implementation. The most frequently used data types in PHP are non-strings and arrays. PHP is easier to use and also benefits from flexible array types. It is necessary to introduce the hash table (HashTable) before giving a detailed introduction to these data types ). Hash table is a key data structure in PHP implementation.

Hash tables are widely used in practice. for example, the compiler usually maintains a symbol table to store tags. many advanced languages also explicitly support hash tables. Hash tables generally provide Search, Insert, and Delete operations. these operations have the same performance as the linked list in the worst case ). However, it is usually not so bad. a hash algorithm with proper design can effectively avoid such situations. Generally, the time complexity of these operations in a hash table is O (1 ). This is also why it is loved.
It is precisely because of the ease of use and efficiency of hash tables that are currently used in most dynamic language implementations.

To facilitate readers to read the following content, we will list the basic concepts in HashTable implementation in advance. A hash table is a data structure that maps a specific key to a specific value through a hash function. it maintains a one-to-one correspondence between keys and values.
Key: the identifier used to operate data, such as an index in a PHP array or a string key.

Slot/bucket: a unit used to store data in a hash table, that is, the container where data is actually stored.

Hash function: a function that maps keys to the location of slots where data is stored.

Hash collision: the hash function maps two different keys to the same index.

A hash table can be understood as an extension of an array or an associated array. an array uses numeric subscript to address it. if the key field has a small range and is a number, we can directly use arrays to complete the hash table. if the keyword range is too large, we need to apply for space for all possible keys if arrays are used directly. In many cases, this is unrealistic. Even if the space is sufficient, the space utilization will be low, which is not ideal. At the same time, Keys may not be numbers, especially in PHP, so people use a ing function (hash function) to map keys to specific domains:

The code is as follows:
H (key)-> index

By properly designing the hash function, we can map the key to a suitable range, because our key space can be large (such as string key ), when mapped to a small space, two different keys may be mapped to the same index. this is what we call a conflict. Currently, there are two main methods to solve hash conflicts: link method and open addressing method.

Conflict resolution

Link method: the link method uses a linked list to store slot values to resolve conflicts, that is, when different keys are mapped to a slot, they are saved using a linked list. Therefore, the link method is used in the worst case, that is, all keys are mapped to the same slot. The time complexity of operating the linked list is O (n ). Therefore, it is critical to select a suitable hash function. Currently, HashTable is implemented in PHP to solve conflicts.
Open addressing: there is usually another method to resolve conflicts: open addressing. The open addressing method is used to store data directly by the slot itself. when inserting data, if the index mapped to the key already has data, this indicates that a conflict occurs and this will find the next slot, if the slot is occupied, continue to find the next slot until it finds the slot that is not occupied. The same rule is also used for searching.

Implementation of hash tables

After learning about the principle of a hash table, it is easy to implement a hash table. there are only three tasks to complete:
Implement hash functions
Conflict resolution
Operation interface implementation
First, we need a container to save our hash table. the content to be saved in the hash table is mainly the data stored in it. at the same time, in order to conveniently know the number of elements stored in the hash table, you need to save a size field, and the second is the container that saves the data. As an example, a simple hash table is implemented below. There are two basic data structures: one for saving the hash table itself, and the other for actually saving the data in a single-chain table. The definition is as follows:

The code is as follows:
Typedef struct _ Bucket
{
Char * key;
Void * value;
Struct _ Bucket * next;

} Bucket;

Typedef struct _ HashTable
{
Int size;
Bucket * buckets;
} HashTable;

The above definition is similar to the implementation in PHP. to make it easier to understand and crop most irrelevant details, in this section, to simplify the data type of the key, the stored data type can be any type.
The Bucket struct is a single-chain table, which is used to solve the problem of multiple key hash conflicts, that is, the link method mentioned above. When multiple keys are mapped to the same index, the conflicting elements are linked.
Hash functions need to map different keys to different slots (slot or buckets) as much as possible. First, we adopt the simplest hashing algorithm: add all the characters in the key string, and then modulo the hash table size based on the results, so that the index falls within the range of the array index.

The code is as follows:
Static int hash_str (char * key)
{
Int hash = 0;

Char * cur = key;

While (* (cur ++ )! = '\ 0 '){
Hash + = * cur;
}

Return hash;
}

// Use this macro to obtain the index of the key in the hash table
# Define HASH_INDEX (ht, key) (hash_str (key) % (ht)-> size)

This hash algorithm is relatively simple and does not work well. it is not used in actual scenarios. for example, the DJBX33A algorithm is used in PHP. Mysql is listed here, openSSL and other open-source software use hash algorithms. For more information, see.
Operation interface implementation
To operate the hash table, the following operation functions are implemented:

The code is as follows:
Int hash_init (HashTable * ht); // initialize the hash table
Int hash_lookup (HashTable * ht, char * key, void ** result); // search for content based on the key
Int hash_insert (HashTable * ht, char * key, void * value); // insert the content into the hash table
Int hash_remove (HashTable * ht, char * key); // delete the content pointed to by the key
Int hash_destroy (HashTable * ht );

The following uses the insert and retrieve operation functions as an example:

The code is as follows:
Int hash_insert (HashTable * ht, char * key, void * value)
{
// Check if we need to resize the hashtable
Resize_hash_table_if_needed (ht); // The size of the hash table is not fixed. when the inserted content quickly occupies the storage space of the table
// Expand the hash table to accommodate all elements

Int index = HASH_INDEX (ht, key); // locate the index mapped to the key

Bucket * org_bucket = ht-> buckets [index];
Bucket * bucket = (Bucket *) malloc (sizeof (Bucket); // Apply for space for the new element

Bucket-> key = strdup (key );
// Save the value content. here, we simply point the pointer to the content to be stored, instead of copying the content.
Bucket-> value = value;

LOG_MSG ("Insert data p: % p \ n", value );

Ht-> elem_num + = 1; // record the number of elements in the current hash table

If (org_bucket! = NULL) {// a collision occurs. place the new element in the head of the linked list.
LOG_MSG ("Index collision found with org hashtable: % p \ n", org_bucket );
Bucket-> next = org_bucket;
}

Ht-> buckets [index] = bucket;

LOG_MSG ("Element inserted at index % I, now we have: % I elements \ n ",
Index, ht-> elem_num );

Return SUCCESS;
}

The insert operation of the hash table above is relatively simple. it is simple to hash by key, locate the location where the elements should be stored, and check whether the location already has content, if a collision occurs, the new element is linked to the head of the original element linked list. When searching, find the location of the element according to the same policy. If an element exists, compare the keys of all elements in the linked list with the keys to be searched, if a consistent element is found, the value does not match.

The code is as follows:
Int hash_lookup (HashTable * ht, char * key, void ** result)
{
Int index = HASH_INDEX (ht, key );
Bucket * bucket = ht-> buckets [index];

If (bucket = NULL) return FAILED;

// Search for this linked list to find the correct element. Generally, this linked list should have only one element, so it does not need to be used multiple times.
// Loop. To ensure this, a suitable hash algorithm is required. For more information, see the previous hash function link.
While (bucket)
{
If (strcmp (bucket-> key, key) = 0)
{
LOG_MSG ("HashTable found key in index: % I with key: % s value: % p \ n ",
Index, key, bucket-> value );
* Result = bucket-> value;
Return SUCCESS;
}

Bucket = bucket-> next;
}

LOG_MSG ("HashTable lookup missed the key: % s \ n", key );
Return FAILED;
}

In PHP, arrays are implemented based on hash tables. when elements are added to arrays in sequence, elements are sequentially arranged, the hash table here is close to the average distribution in physical locations, so that these elements cannot be obtained in the order of insertion, in PHP implementation, the Bucket struct also maintains another pointer field to maintain the relationship between elements. The specific content is described in HashTable in the next section of PHP. The above example is a lite version implemented in PHP.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More