PHP kernel parsing: Hash table in PHP

PHP kernel parsing: Hash table in PHP _php tutorial

Last Update:2016-07-13 Source: Internet

Author: User

Tags key string

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The most frequently used data types in PHP are non-strings and arrays, and PHP is easier to get started with because of the very flexible array types. It is necessary to introduce a hash table (HashTable) before you begin to describe these data types in detail. A hash table is a particularly critical data structure in PHP implementations.

Hash tables are used extensively in practice, such as a symbol table that compilers typically maintain to save tags, and in many high-level languages, hash tables are also explicitly supported. A hash table usually provides operations such as find (search), insert, Delete, and so on, in the worst case, the same as the performance of the list O (n). But usually not so bad, a reasonably designed hashing algorithm can effectively avoid such situations, usually the hash table of these operating time complexity is O (1). This is why it is beloved.
It is because of the convenience and efficiency of the use of the hash table that most of the dynamic languages are now implemented using hash tables.

To make it easier for readers to read the following, the basic concepts appearing in Hashtable implementations are listed in advance. A hash table is a data structure that maps a specific key to a specific value through a hash function that maintains a one by one correspondence between the key and the value.
Key: A marker used to manipulate data, such as an index in a PHP array, a string key, and so on.

Slot (Slot/bucket): A cell in a hash table used to hold data, that is, the container in which the data is actually stored.

hash function: A function that maps a key to the location of the slot where the data should be stored.

Hash conflict (Hash collision): A hash function maps two different keys to the same index.

A hash table can be understood as an array of extensions or associative arrays, arrays are addressed using numeric subscripts, and if the range of keywords (key) is small and numeric, we can use arrays to complete the hash table, and if the keyword range is too large, we need to apply for space for all possible keys if we use the array directly. In many cases this is unrealistic. Even if space is sufficient, space utilization will be low, which is not ideal. Keys may not be numbers at the same time, especially in PHP, so people use a mapping function (a hash function) to map a key to a specific domain:

Copy the Code code as follows:
H (key), index

By properly designing the hash function, we can map the key to the appropriate range, because our key space can be very large (such as the string key), when mapping to a smaller space, there may be two different key mappings to the same index of the case, which is what we said there is a conflict. There are two main ways to resolve hash conflicts: Link method and open addressing method.

Conflict resolution

Link method: The link method resolves the conflict by using a linked list to hold the slot values, that is, when different keys are mapped to a slot, the linked list is used to hold the values. So using the link method is in the worst case, where all keys are mapped to the same slot, and the time complexity of the operation list is O (n). So choosing an appropriate hash function is the most critical. The current implementation of PHP in Hashtable is to use this approach to resolve conflicts.
Open addressing: There is usually another way to resolve conflicts: the open addressing method. Using open addressing is the slot itself storing data directly, when inserting data if the key is mapped to an index that already has data, this indicates a conflict, which is to look for the next slot, and if the slot is also occupied, continue looking for the next slot until the slot is not occupied, and the same rules are used for the search.

Implementation of a hash table

It is also easy to implement a hash table after understanding the principle of a hash table, with only three points of work to be done:
Implementing a hash function
Resolution of conflicts
Implementation of the Operation interface
First we need a container to save our hash table, the hash table needs to save the content is mainly stored in the data, at the same time, in order to conveniently know the number of elements stored in the Hashtable, need to save a large small section, the second need is to save the data container. As an example, the following will implement a simple hash table. There are two basic data structures, one for saving the hash table itself, and the other is a single linked list for actually saving the data, as defined below:

Copy the Code code as follows:
typedef struct _BUCKET
{
Char *key;
void *value;
struct _bucket *next;

} buckets;

typedef struct _HASHTABLE
{
int size;
bucket* buckets;
} HashTable;

The definition above is similar to the implementation in PHP, where most extraneous details are cropped for ease of understanding, in this section the data type of key is a string, and the stored data type can be any type.
The bucket structure is a single-linked list, which is to solve the problem of multiple key hash collisions, which is the link method mentioned earlier. When multiple keys are mapped to the same index, the conflicting elements are linked together.
The hash function needs to map the different keys to different slots (slots or buckets) as much as possible, first we use one of the simplest hashing algorithms: add all the characters of the key string, and then modulo the size of the hash table with the result, so that the index falls within the range of the array index.

Copy the Code code as follows:
static int hash_str (char *key)
{
int hash = 0;

char *cur = key;

while (* (cur++)! = ') ') {
hash + = *cur;
}

return hash;
}

Use this macro to find the index of the key in the hash table
#define HASH_INDEX (HT, key) (Hash_str (key))% (HT)->size)

This hashing algorithm is relatively simple, its effect is not good, in the actual scenario will not use this hashing algorithm, for example, PHP used is called the djbx33a algorithm, here are listed Mysql,openssl and other open source software used by the hashing algorithm, interested readers can go to the reference.
Implementation of the Operation interface
In order to manipulate the hash table, several operation functions are implemented:

Copy the Code code as follows:
int Hash_init (HashTable *ht); Initializing a hash table
int Hash_lookup (HashTable *ht, char *key, void **result); Find content by key
int Hash_insert (HashTable *ht, char *key, void *value); Inserting content into a hash table
int Hash_remove (HashTable *ht, char *key); Delete the content that key points to
int Hash_destroy (HashTable *ht);

The following is an example of an insert and get Operation function:

Copy CodeThe code is as follows:
int Hash_insert (HashTable *ht, char *key, void *value)
{
Check if we need to resize the hashtable
Resize_hash_table_if_needed (HT); Hash table is not fixed size when the inserted content is quickly occupied by the storage space of the HA
The hash table will be expanded to accommodate all the elements

int index = HASH_INDEX (HT, key); Find the index to which the key is mapped

Bucket *org_bucket = ht->buckets[index];
Bucket *bucket = (bucket *) malloc (sizeof (bucket)); Request space for a new element

Bucket->key = StrDup (key);
Saving the contents of the value is simply to point the pointer to the content to be stored and not to copy the content.
Bucket->value = value;

Log_msg ("Insert data p:%p\n", value);

Ht->elem_num + = 1; Record the number of elements in the hash table now

if (org_bucket! = NULL) {//A collision occurred, placing the new element in the head of the linked list
Log_msg ("Index collision found with org Hashtable:%p\n", org_bucket);
Bucket->next = Org_bucket;
}

ht->buckets[index]= buckets;

Log_msg ("Element inserted at index%i, now we have:%i elements\n",
index, Ht->elem_num);

return SUCCESS;
}

The above hash table insert operation is relatively simple, simply hash the key, find where the element should be stored, and check if the location already has the content, if a collision occurs, the new element is linked to the original element list header. Also follow the same strategy when searching, find the location of the element, if there is an element, then the key of all elements of the list and the key to be looked in sequence, until a consistent element is found, otherwise the value does not match the content.

Copy the Code code as follows:
int Hash_lookup (HashTable *ht, char *key, void **result)
{
int index = HASH_INDEX (HT, key);
Bucket *bucket = ht->buckets[index];

if (bucket = = NULL) return FAILED;

Find the list to find the right element, usually the list should be only one element, and not many times
Cycle. To ensure this requires a suitable hashing algorithm, see the link to the preceding related hash function.
while (bucket)
{
if (strcmp (Bucket->key, key) = = 0)
{
Log_msg ("HashTable found key in index:%i with key:%s value:%p\n",
Index, key, Bucket->value);
*result = bucket->value;
return SUCCESS;
}

Bucket = bucket->next;
}

Log_msg ("HashTable lookup missed the key:%s\n", key);
return FAILED;
}

The array in PHP is implemented based on the hash table, in order to add elements to the array, the elements are sequential, and here the hash table in the physical location is obviously close to the average distribution, so that can not be obtained according to the order of insertion of these elements, In the implementation of PHP, the bucket structure also maintains another pointer field to maintain the relationship between elements. Details are detailed in the following section, Hashtable, in PHP. The example above is a lite version implemented in PHP.

http://www.bkjia.com/PHPjc/728096.html www.bkjia.com true http://www.bkjia.com/PHPjc/728096.html techarticle The most frequently used data types in PHP are non-strings and arrays, and PHP is easier to get started with because of the very flexible array types. Before you begin to detail these data types, there must be ...



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More