In-depth understanding of the PHP kernel (vi) hash table and PHP implementation, in-depth understanding of the _php tutorial

Source: Internet
Author: User
Tags key string

In-depth understanding of the PHP kernel (vi) hash table and the implementation of PHP hashes


Original link: http://www.orlion.ga/241/

One, hash table (HashTable)

A hash table is used in most of the implementations of dynamic languages, and a hash table is a hash function that maps a particular key to a particular worth of data

Structure, which maintains a one by one correspondence between the key and the value.

Key: A marker used to manipulate data, such as an index in a PHP array, a string key, and so on.

Slot (Slot/bucket): A cell in a hash table that is used to hold data, that is, the container in which the array is actually stored.

hash function: A function that maps a key to the location of the slot where the data should be stored.

Hash conflict (Hash collision): A hash function maps two different keys to the same index.

There are two methods for resolving hash Conflicts: Link method and open addressing method.

1. Conflict resolution

(1) Linking method

The link method resolves the conflict by using a linked list to hold the slot values, that is, when different keys are mapped to a slot, use the linked list

To save the values. (This is the way PHP is used);

(2) Open addressing method

The use of open addressing is that the slot itself holds the data directly, and when the data is inserted, if the index to which the key is mapped already has data, this indicates a conflict,

This will look for the next slot, and if the slot is also occupied, continue looking for the next slot until it finds a slot that is not occupied, as is the case when looking for it.

2, the implementation of the hash table

The implementation of the hash table is mostly done with only three points:

* Implement hash function

* Resolution of conflicts

* Implementation of the Operation interface

(1) Data structure

First we need a container to Cao Cun our hash table, the hash table needs to save the content is mainly saved in the data, at the same time, in order to conveniently know the number of elements stored in the Hashtable, you need to save a small section, the second is to save the data container. The following will implement a simple hash table, the basic data structure of the main two, one for saving the hash table itself, and the other is used to actually save the data of the single linked list, defined as follows:

typedef struct _bucket{    char *key;    void *value;    struct _bucket *next; } buckets; typedef struct _hashtable{    int size;    bucket* buckets;} HashTable;

The above definition is similar to the implementation in PHP, in order to simplify the key's data type as a string, and the stored structure can be any type.

The bucket structure is a single linked list, which is to solve the hash conflict. Link conflicting elements When multiple keys are mapped to the same index

(2) Hash function implementation

We implement the simplest hashing algorithm by adding all the characters of the key string and then modulo the hash table size with the result so that the index falls within the range of the array index.

static int hash_str (char *key) {    int hash = 0;     char *cur = key;     while (* (cur++)! = ' + ') {        hash + = *cur;    }     return hash;} Use this macro to obtain the key in the hash Table of index # define HASH_INDEX (HT, key) (Hash_str (key))% (HT)->size)

The hash algorithm used by PHP is called djbx33a. In order to manipulate a hash table, several operation functions are defined:

int Hash_init (HashTable *ht);                               Initialize hash table int Hash_lookup (HashTable *ht, char *key, void **result);   Find content based on key int Hash_insert (HashTable *ht, char *key, void *value);     Interpolates the contents of the hash table in int hash_remove (HashTable *ht, Char *key);                  Delete the content that key points to int Hash_destroy (HashTable *ht);

The following is an example of an insert and get Operation function:

int Hash_insert (HashTable *ht, char *key, void *value) {//check if we need to res    Ize the Hashtable resize_hash_table_if_needed (HT); Hash table is not fixed size, when inserted content quickly fills the hash table's storage space//will expand the hash table to accommodate all elements int index = HASH_INDEX (HT,    Key);    Find the index to which the key is mapped Bucket *org_bucket = ht->buckets[index]; Bucket *bucket = (bucket *) malloc (sizeof (bucket));    Request space for new element bucket->key = StrDup (key);       Save the contents of the value, here is simply to point the pointer to the content to be stored, and not copy the content bucket->value = value;     Log_msg ("Insert data p:%p\n", value); Ht->elem_num + = 1; Record the number of elements in the hash table now if (org_bucket! = NULL) {//Collide, place new elements in the list's head log_msg ("Index collision found with org Hasht        Able:%p\n ", org_bucket);    Bucket->next = Org_bucket;     } ht->buckets[index]= buckets;     Log_msg ("Element inserted at index%i, now we have:%i elements\n", index, Ht->elem_num); return SUCCESS;} 

Find the location of the element first, and if there is an element, then the key of all elements in the list and the key to be searched are compared in turn until a consistent element is found, otherwise the value has no matching content.

int Hash_lookup (HashTable *ht, char *key, void **result) {    int index = HASH_INDEX (HT, key);    Bucket *bucket = ht->buckets[index];     if (bucket = = NULL) return FAILED;     Find this list in order to find the correct element, usually this list should be only one element, it will be different multiple loops    //To ensure that there is a suitable hashing algorithm. While    (bucket)    {        if (strcmp (Bucket->key, key) = = 0)        {            log_msg ("HashTable found key in index:%i with  key:%s Value:%p \ n ",                index, key, bucket->value);            *result = bucket->value;                return SUCCESS;        }         Bucket = bucket->next;    }     Log_msg ("HashTable lookup missed the key:%s\n", key);    return FAILED;}

The array in PHP is implemented based on a hash table, in order to add elements to the array, the elements are ordered, and here the hash table is physically almost evenly distributed, so that it is not possible to obtain these elements according to the order of insertion, In the implementation of PHP, the bucket structure also maintains another pointer field to maintain the relationship between elements.

Second, the PHP hash table implementation

1, the hash implementation of PHP

The hash table in PHP is a very important data interface, basically most of the language features are based on the hash table, for example: the scope of variables and the storage of variables, the implementation of the class and the Zend engine internal data are many are stored in the Hashtable.

(1) Data structure and description

Zend uses a doubly linked list to hold data in order to preserve the relationship between data

(2) Hash table structure

PHP hash table implementation in ZEND/ZEND_HASH.C, PHP using the following two data structures to implement a hash table, hashtable structure for the entire hash table to hold the basic information required, and the bucket structure for the preservation of specific data content, as follows:

typedef struct _HASHTABLE {     uint ntablesize;        The size of the hash bucket, the minimum is 8, to 2x growth    uint Ntablemask;        NTableSize-1, optimization of index value    uint nnumofelements;    The number of elements currently present in the hash bucket, and the count () function will return this value directly to    ulong nnextfreeelement;//position of the next numeric index    Bucket *pinternalpointer;   The currently traversed pointer (one of the causes of foreach is faster than for fast)    Bucket *plisthead;          Store the number of head element pointers    bucket *plisttail;          Storage array tail element pointer    bucket **arbuckets;         Store hash array    dtor_func_t pdestructor;    Zend_bool Persistent;    unsigned char napplycount; Mark the number of times the current hash bucket has been accessed recursively (prevent multiple recursion)    Zend_bool bapplyprotection;//mark the current hash bucket allows multiple accesses not allowed, when not allowed, up to recursive 3 this # if Zend_ DEBUG    int inconsistent; #endif} HashTable;

The Ntablesize field is used to indicate the capacity of the hash table, and the hash table has a minimum initialization capacity of 8. First look at the initialization function of the hash table:

Zend_api int _zend_hash_init (HashTable *ht, uint nSize, hash_func_t phashfunction,                    dtor_func_t pdestructor, Zend_ BOOL Persistent zend_file_line_dc) {    UINT i = 3;    //...    if (nSize >= 0x80000000) {/        * prevent overflow */        ht->ntablesize = 0x80000000;    } else {while        (1U & lt;< i) < nSize) {            i++;        }        Ht->ntablesize = 1 << i;    }    // ...    Ht->ntablemask = ht->ntablesize-1;     /* Uses Ecalloc () so that bucket* = = NULL */    if (persistent) {        TMP = (Bucket * *) calloc (ht->ntablesize, Sizeo F (Bucket *));        if (!tmp) {            return FAILURE;        }        Ht->arbuckets = tmp;    } else {        TMP = (bucket * *) Ecalloc_rel (ht->ntablesize, sizeof (bucket *));        if (TMP) {            ht->arbuckets = tmp;        }    }     return SUCCESS;}

For example, if you set the initial size to 10, the algorithm above will resize to 16. That is, always resize to a whole number of 2 near the initial size

Why is it so adjusted? Let's take a look at how Hashtable maps hashes to slots:

h = Zend_inline_hash_func (Arkey, nkeylength); nIndex = h & ht->ntablemask;

From the _zend_hash_init () function above, the size of the Ht->ntablemask is ht->ntablesize–1. The use of & instead of the modulo is used here because the relative cost of the modulo operation and the bitwise AND operation are much larger.

After setting the size of the hash table, you need to request storage space for the hash table, such as the above initialization code, depending on whether the need to persist and invoke a different memory application method, it is necessary to persist in the previous PHP life cycle is described: persistent content can be accessed between multiple requests, In the case of non-persistent storage, the occupied space is freed at the end of the request. Specific content will be detailed in memory management

The Nnumofelements field in Hashtable a good understanding that this field is updated every time an element is inserted or unset deleted, so that the count () function can be quickly returned when it counts the number of elements in the array.

The Nnextfreeelement field is useful to look at a PHP code first:

 
  ' Hello '); $a [] = ' TIPI '; var_dump ($a); Ouputarray (2) {  [10]=>  string (5) "Hello"  [11]=>  string (5) "TIPI"}

In PHP, you can add elements to an array without specifying an index value, and the number is used by default as an index, similar to the enumeration in C, where the index of the element is more or less determined by the Nnextfreeelement field. If there is a numeric key in the array, the most recently used key+1 will be used by default, as the previous example already has an element of 10 as key, so that the newly inserted default index is 11.

Here's a look at the volume of the slot data structure that holds the hash table:

typedef struct BUCKET {    ulong H;            The value of the hash after the Char *key, or the user-specified numeric index value    uint Nkeylength;    The length of the hash keyword, if the array index is a number, this value is 0    void *pdata;        Point to value, which is typically a copy of the user's data, or, if it is pointer data, to pdataptr    void *pdataptr;     If it is an array of pointers, this value points to true value, while above pdata points to this value    . struct bucket *plistnext;   The next element of the entire hash table is the    struct bucket *plistlast;   The last element of the entire hash table is a    struct bucket *pnext;       The next element in the same hash bucket is the    struct bucket *plast;       The previous element stored in the same hash bucket    char arkey[1];      /*    stores the character index, which must be placed at the very end, because only 1 bytes are defined here, which is actually a value that points to char *key,    which means that the cost of re-assignment can be omitted, and sometimes this value is not needed, so it also saves space at the same time.    */} Buckets;

such as the comments for each field above. The H field holds the hash value of the hash table key. In PHP, you can use a string or a number as an index to an array. Because the index of the number is unique. If you do a hash again, it will be very wasteful. The Nkeylength field after the H field is marked as the key length, and if the index is a number, then Nkeylength is 0. When you define an array in PHP, the string can be converted to a number. So in PHP for example ' 10 ', ' 11 ' such as character index and numeric index 10,11 no difference

    • The bucket structure maintains two doubly linked lists, with the Pnext and Plast pointers pointing to the relationship of the linked list where the slots are located.

    • The Plistnext and plistlast pointers, however, point to a link between all the data in the entire hash table. The Plisthead and Plisttail in the Hashtable structure maintain pointers to the head element and the last element of the entire hash table

Operation Interface of hash table:

PHP provides the following types of operation interfaces:

    • Initialization operations, such as the Zend_hash_init () function, are used to initialize the hash table interface, allocate space, and so on.

    • Find, insert, delete, and update operation interfaces, which are more general operations.

    • Iterations and loops, such interfaces are used to iterate over a hash table operation.

    • Copy, sort, invert and destroy operations.

http://www.bkjia.com/PHPjc/1115246.html www.bkjia.com true http://www.bkjia.com/PHPjc/1115246.html techarticle deep understanding of PHP kernel (vi) hash table and PHP implementation, in-depth understanding of the original link: http://www.orlion.ga/241/, hash table (HashTable) Most of the dynamic language implementation of the ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.