PHP kernel parsing: Hash table _php Techniques in PHP

Source: Internet
Author: User
Tags data structures int size key string

The most frequently used data types in PHP are not strings and arrays, and PHP is easy to use and benefits from very flexible array types. It is necessary to introduce a hash table (HashTable) before you begin to describe these data types in detail. A hash table is a particularly critical data structure in the PHP implementation.

Hash tables are widely used in practice, such as a symbolic table that the compiler typically maintains to hold tokens, and a Hashtable is also explicitly supported in many high-level languages. Hash tables usually provide search, insert (insert), delete (delete) operations, which, in the worst case, are the same as the performance of the linked list O (n). But generally not so bad, a reasonably designed hashing algorithm can effectively avoid this kind of situation, usually the hash table these operation time complexity is O (1). This is why it is loved.
Because of the convenience and efficiency of the hash table, a hash table is used in most dynamic language implementations.

In order to facilitate readers to read the later content, here in advance to enumerate the hashtable implementation of the basic concepts emerged. A hash table is a data structure that maps a particular key to a specific value through a hash function, maintaining a one by one correspondence between the key and the value.
Key: A marker used to manipulate data, such as an index in a PHP array, or a string key, and so on.

Slot (Slot/bucket): A cell in a hash table that is used to hold data, which is the container in which the data is actually stored.

hash function: A function that maps a key mapping (map) to the location of the slot where the data should be stored.

Hash collision: Hash function maps two different keys to the same index.

A hash table can understand an array of extensions or associative arrays, arrays use digital subscripts to address, and if the keyword (key) is small and numeric, we can use the array to complete the hash table, and if the keyword range is too large, we need to request space for all possible keys if we use the array directly. In many cases this is unrealistic. Even if there is enough space, the utilization of space will be very low, which is not ideal. Keys may not be numbers, especially in PHP, so people use a mapping function (hash function) to map a key to a specific domain:

Copy Code code as follows:

H (key)-> index

With a reasonably designed hash function, we can map the key to a suitable range, because our key space can be very large (such as String key), there may be two different key mappings being mapped to the same index in a smaller space, which is what we call a conflict. There are two main methods to solve hash conflict: Link method and open addressing method.

Conflict resolution

Link method: The link method solves the conflict by using a linked list to hold the slot value, that is, when the different keys are mapped to a slot, they are saved using a linked list. So using the link method is at worst, where all the keys are mapped to the same slot, and the time complexity of the operation list is O (n). So choosing an appropriate hash function is the most critical. The current implementation of Hashtable in PHP is to resolve conflicts in this way.
Open addressing method: there is usually another way to resolve conflicts: open addressing. Using open addressing is the slot itself that holds the data directly, and if the key is mapped to an index that has data when the data is inserted, this means that there is a conflict, which will look for the next slot, and if the slot is occupied, continue to look for the next slot until you find the slot that is not occupied and use the same rule in the search.

Implementation of hash table

It is also easy to implement a hash table after understanding the principle of the hash table, which requires only three of the work to be done:
Implementing a hash function
Resolution of the conflict
Implementation of the Operation interface
First we need a container to hold our hash table, the hash table needs to save the content is mainly stored in the data, at the same time, in order to easily know the number of elements stored in the hash tables, need to save a large small section, the second need is to save the data container. As an example, a simple hash table is implemented below. There are two basic data structures, one for saving the hash table itself, the other for the single linked list that actually holds the data, defined as follows:

Copy Code code as follows:

typedef struct _BUCKET
{
Char *key;
void *value;
struct _bucket *next;

} Bucket;

typedef struct _HASHTABLE
{
int size;
bucket* buckets;
} HashTable;

The above definition is similar to the implementation in PHP, and in order to make it easier to understand that most irrelevant details are cropped, in this section, for simplicity, the data type of the key is a string and the stored data type can be of any type.
The bucket structure is a single linked list, which is to solve the problem of multiple key hash conflicts, the link method mentioned earlier. When multiple keys are mapped to the same index, the conflicting elements are linked.
Hash functions need to map different keys to different slots (slot or bucket) as much as possible. First we use the simplest hash algorithm: Add all the characters of a key string and then take the result to the size of the hash table so that the index falls within the range of the array index.

Copy Code code as follows:

static int hash_str (char *key)
{
int hash = 0;

char *cur = key;

while (* (cur++)!= ' ") {
hash + *cur;
}

return hash;
}

Use this macro to find the index of the key in the hash table
#define HASH_INDEX (HT, key) (Hash_str ((key))% (HT)->size)

This hashing algorithm is simpler, its effect is not good, in the actual scene does not use this hashing algorithm, for example uses in PHP is called the DJBX33A algorithm, here enumerated Mysql,openssl and so on open source software uses hashing algorithm, the interested reader may go to the reference.
Implementation of the Operation interface
To manipulate the hash table, several action functions are implemented:

Copy Code code as follows:

int Hash_init (HashTable *ht); Initializing a hash table
int Hash_lookup (HashTable *ht, char *key, void **result); Find content based on key
int Hash_insert (HashTable *ht, char *key, void *value); Inserting content into a hash table
int Hash_remove (HashTable *ht, char *key); Delete what the key points to
int Hash_destroy (HashTable *ht);

The following is an example of an insert and fetch operation function:

Copy Code code as follows:

int Hash_insert (HashTable *ht, char *key, void *value)
{
Check if we need to resize the hashtable
Resize_hash_table_if_needed (HT); Hash table is not fixed size, when the inserted content quickly occupied the storage space of HA
The hash table will be enlarged to accommodate all elements

int index = HASH_INDEX (HT, key); Find the index to which the key is mapped

Bucket *org_bucket = ht->buckets[index];
Bucket *bucket = (Bucket *) malloc (sizeof (Bucket)); Request space for new elements

Bucket->key = StrDup (key);
Saving the value content is simply to point the pointer to the content you want to store without copying the content.
Bucket->value = value;

Log_msg ("Insert data p:%p\n", value);

Ht->elem_num + 1; Record the number of elements in the hash table now

if (Org_bucket!= NULL) {//collision occurred, placing new elements in the head of the list
Log_msg ("Index collision found with org Hashtable:%p\n", org_bucket);
Bucket->next = Org_bucket;
}

Ht->buckets[index]= bucket;

Log_msg ("Element inserted at index%i, now we have:%i elements\n",
index, Ht->elem_num);

return SUCCESS;
}

The above hash table insert operation is relatively simple, simple with key hash, find the location where the element should be stored, and check whether the location has content, if a collision will link the new element to the original element linked to the head of the list. The same policy is found where the element is located and, if there is an element, the key of all elements of the list is compared to the key to look for until a consistent element is found, otherwise the value does not match.

Copy Code code as follows:

int Hash_lookup (HashTable *ht, char *key, void **result)
{
int index = HASH_INDEX (HT, key);
Bucket *bucket = ht->buckets[index];

if (bucket = NULL) return FAILED;

Look up this list to find the right element, usually the list should be one element, not many times.
Cycle. To ensure that this requires a suitable hash algorithm, see the previous related hash function link.
while (bucket)
{
if (strcmp (Bucket->key, key) = = 0)
{
Log_msg ("HashTable found key in index:%i with key:%s value:%p\n",
Index, key, Bucket->value);
*result = bucket->value;
return SUCCESS;
}

Bucket = bucket->next;
}

Log_msg ("HashTable lookup missed the key:%s\n", key);
return FAILED;
}

The array in PHP is implemented based on a hash table, which in turn adds elements to the array in a sequential order, and the hash table here is apparently nearly evenly distributed in the physical location, so that the elements cannot be acquired according to the order in which they are inserted. In the implementation of PHP, the bucket structure also maintains another pointer field to maintain the relationship between elements. The details are detailed in the Hashtable in the latter section of PHP. The example above is a compact version of the implementation in PHP.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.