PHP array/Hash table implementation and operations

Source: Internet
Author: User
Tags array definition key string php definition rehash
Implementation and Operation of the PHP array Hash table catalogue
1. PHP Hash table 1. PHP array definition
1. PHP Hash table

0x1: Basic concepts

Hash tables are widely used in practice. for example, the compiler usually maintains a symbol table to store tags. many advanced languages also explicitly support hash tables. Hash tables generally provide Search, Insert, and Delete operations. these operations are the worst case.

The same performance as the linked list is O (n ). However, it is usually not so bad. a hash algorithm with proper design can effectively avoid such situations. Generally, the time complexity of these operations in a hash table is O (1 ). This is why it is loved.

It is precisely because of the ease of use and efficiency of hash tables that are currently used in most dynamic language implementations.

A hash table is a data structure that maps a specific key to a specific value through a hash function. it maintains a one-to-one correspondence between keys and values.

1. key: indicates the operation data, such as the index in the PHP array or the string key. slot (slot/bucket): a unit used to store data in a hash table, that is, the container where data is actually stored. hash function: map the key to the location of the slot where the data should be stored. hash collision: the hash function maps two different keys to the same index.

A hash table can be understood as an extension of an array or an associated array. an array uses numeric subscript to address it. if the key field has a small range and is a number, we can directly use arrays to complete the hash table. if the keyword range is too large, we need to apply for space for all possible keys if arrays are used directly. In many cases, this is unrealistic. Even if the space is sufficient, the space utilization will be low, which is not ideal. At the same time, Keys may not be numbers, especially in PHP, so people use a ing function (hash function) to map keys to specific domains.

h(key) -> index

By properly designing the hash function, we can map the key to a suitable range, because our key space can be large (such as string key ), when mapped to a small space, two different keys may be mapped to the same index. this is what we call a conflict. Currently, there are two main methods to solve hash conflicts: link method and open addressing method.

1. conflict resolution: link method

By using a linked list to store slot values, the linked list is used to store these values when different keys are mapped to a slot. Therefore, the link method is used in the worst case, that is, all keys are mapped to the same slot. in this way, the hash table is degraded into a linked list, in this case, the time complexity of operating the linked list is O (n), so that the hash table has no performance advantages. Therefore, selecting a suitable hash function is the most critical.

Currently, hash table implementations in most programming languages are open-source, and hash algorithms in most languages are open algorithms. although hash algorithms can evenly distribute keys, the premise is that keys are random. due to algorithm certainty, hackers with ulterior motives can use the certainty of known algorithms to construct some special keys, by ing these keys to the same slot, the hash table degrades to a single-chain table, resulting in a sharp decline in program performance, resulting in a sharp decline in throughput of some applications, especially for highly concurrent applications, a large number of similar requests can cause DoS attacks to the server)

The most fundamental weakness of hash tables used by hash conflict attacks is:

The certainty and predictability of open-source algorithms and hashing are implemented so that attackers can use specially constructed keys to launch attacks. To solve this problem, attackers cannot easily construct a sequence of keys that can be attacked.

Currently, the solution to HashTable hash conflicts in PHP is the link method.

2. conflict resolution: open addressing

There is usually another way to solve the conflict: the open addressing method. The open addressing method is used to store data directly by the slot itself. when inserting data, if the index mapped to the key already has data, this indicates that a conflict occurs and this will find the next slot, if the slot is occupied, continue to find the next slot until it finds the slot that is not occupied, the same policy is also used for search, because the open addressing method occupies the space of other slots when dealing with conflicts, this may cause hash conflicts more easily when key is inserted. Therefore, the loading factor of hash tables using the open addressing method cannot be too high. Otherwise, the performance may decrease.

The load factor is the ratio of the number of elements saved in the hash table to the size of the hash table. generally, the link method is used to load conflicting Hash tables. it is recommended that the factor not be greater than 1 (equal to 1 means that the Hash table is already full, the key values that will be saved later will cause conflicts, that is, the increase of the linked list, and the efficiency of the linked list is lower than that of the Hash table.) 2. the hash table using the open addressing method should preferably not exceed 0.5

0x2: Implementation of hash tables

It is also easy to implement a hash table. The main task to be done is only three points.

1. implement Hash Functions 2. resolve conflicts 3. implement Operation interfaces

Before learning how to implement the Hash table of the PHP native kernel, we can manually implement a simple Hash table.

1. basic data structure definition

#ifndef _HASH_TABLE_H_#define _HASH_TABLE_H_ 1typedef struct _Bucket{    char *key;    void *value;    struct _Bucket *next;} Bucket;typedef struct _HashTable{    int size;    //HashTable size/lines    int elem_num;    //total elements count    Bucket** buckets;} HashTable;#endif

2. implementation of hash functions

Hash functions need to map different keys to different slots (slot or buckets) as much as possible. First, we adopt the simplest hashing algorithm: add all the characters in the key string, and then modulo the hash table size based on the results. then, the index can fall within the range of the array index.

//hashtable.c#include 
 
  #include 
  
   #include 
   
    #include "hashtable.h" int hash_str(char *key); int hash_str(char *key){    int hash = 0;    char *cur = key;    while(*cur != '\0')        {        hash += *cur;        ++cur;        }        return hash;}//hashtable.h#define HASH_INDEX(ht, key) (hash_str((key)) % (ht)->size)
   
  
 

3. implementation of Operation interfaces

To operate the hash table, the following operation interface functions are implemented:

Int hash_init (HashTable * ht); // initialize the hash table int hash_lookup (HashTable * ht, char * key, void ** result ); // search for the content int hash_insert (HashTable * ht, char * key, void * value) based on the key; // insert the content into the hash table int hash_remove (HashTable * ht, char * key); // delete the int hash_destroy (HashTable * ht) content pointed to by the key );

4. complete source code

Hashtable. c

#include 
 
  #include 
  
   #include 
   
    #include "hashtable.h"static void resize_hash_table_if_needed(HashTable *ht);static int hash_str(char *key);int hash_init(HashTable *ht){    ht->size         = HASH_TABLE_INIT_SIZE;    ht->elem_num     = 0;    ht->buckets        = (Bucket **)calloc(ht->size, sizeof(Bucket *));    if(ht->buckets == NULL) return FAILED;    LOG_MSG("[init]\tsize: %i\n", ht->size);    return SUCCESS;}int hash_lookup(HashTable *ht, char *key, void **result){    int index = HASH_INDEX(ht, key);    Bucket *bucket = ht->buckets[index];    if(bucket == NULL) goto failed;    while(bucket)    {        if(strcmp(bucket->key, key) == 0)        {            LOG_MSG("[lookup]\t found %s\tindex:%i value: %p\n",                key, index, bucket->value);            *result = bucket->value;                return SUCCESS;        }        bucket = bucket->next;    }failed:    LOG_MSG("[lookup]\t key:%s\tfailed\t\n", key);    return FAILED;}int hash_insert(HashTable *ht, char *key, void *value){    // check if we need to resize the hashtable    resize_hash_table_if_needed(ht);    int index = HASH_INDEX(ht, key);    Bucket *org_bucket = ht->buckets[index];    Bucket *tmp_bucket = org_bucket;    // check if the key exits already    while(tmp_bucket)    {        if(strcmp(key, tmp_bucket->key) == 0)        {            LOG_MSG("[update]\tkey: %s\n", key);            tmp_bucket->value = value;            return SUCCESS;        }        tmp_bucket = tmp_bucket->next;    }    Bucket *bucket = (Bucket *)malloc(sizeof(Bucket));    bucket->key      = key;    bucket->value = value;    bucket->next  = NULL;    ht->elem_num += 1;    if(org_bucket != NULL)    {        LOG_MSG("[collision]\tindex:%d key:%s\n", index, key);        bucket->next = org_bucket;    }    ht->buckets[index]= bucket;    LOG_MSG("[insert]\tindex:%d key:%s\tht(num:%d)\n",        index, key, ht->elem_num);    return SUCCESS;}int hash_remove(HashTable *ht, char *key){    int index = HASH_INDEX(ht, key);    Bucket *bucket  = ht->buckets[index];    Bucket *prev    = NULL;    if(bucket == NULL) return FAILED;    // find the right bucket from the link list     while(bucket)    {        if(strcmp(bucket->key, key) == 0)        {            LOG_MSG("[remove]\tkey:(%s) index: %d\n", key, index);            if(prev == NULL)            {                ht->buckets[index] = bucket->next;            }            else            {                prev->next = bucket->next;            }            free(bucket);            return SUCCESS;        }        prev   = bucket;        bucket = bucket->next;    }    LOG_MSG("[remove]\t key:%s not found remove \tfailed\t\n", key);    return FAILED;}int hash_destroy(HashTable *ht){    int i;    Bucket *cur = NULL;    Bucket *tmp = NULL;    for(i=0; i < ht->size; ++i)    {        cur = ht->buckets[i];        while(cur)        {            tmp = cur;            cur = cur->next;            free(tmp);        }    }    free(ht->buckets);    return SUCCESS;}static int hash_str(char *key){    int hash = 0;    char *cur = key;    while(*cur != '\0')    {        hash +=    *cur;        ++cur;    }    return hash;}static int hash_resize(HashTable *ht){    // double the size    int org_size = ht->size;    ht->size = ht->size * 2;    ht->elem_num = 0;    LOG_MSG("[resize]\torg size: %i\tnew size: %i\n", org_size, ht->size);    Bucket **buckets = (Bucket **)calloc(ht->size, sizeof(Bucket **));    Bucket **org_buckets = ht->buckets;    ht->buckets = buckets;    int i = 0;    for(i=0; i < org_size; ++i)    {        Bucket *cur = org_buckets[i];        Bucket *tmp;        while(cur)         {            // rehash: insert again            hash_insert(ht, cur->key, cur->value);            // free the org bucket, but not the element            tmp = cur;            cur = cur->next;            free(tmp);        }    }    free(org_buckets);    LOG_MSG("[resize] done\n");    return SUCCESS;}// if the elem_num is almost as large as the capacity of the hashtable// we need to resize the hashtable to contain enough elementsstatic void resize_hash_table_if_needed(HashTable *ht){    if(ht->size - ht->elem_num < 1)    {        hash_resize(ht);        }}
   
  
 

Hashtable. h

# Ifndef _ HASH_TABLE_H _ # define _ HASH_TABLE_H _ 1 # define HASH_TABLE_INIT_SIZE 6 # define HASH_INDEX (ht, key) (hash_str (key) % (ht)-> size) # if defined (DEBUG) # define LOG_MSG printf # else # define LOG_MSG (...) # endif # define SUCCESS 0 # define FAILED-1 typedef struct _ Bucket {char * key; void * value; struct _ Bucket * next;} Bucket; typedef struct _ HashTable {int size; // size of the hash table int elem_num; // number of stored elements Bucket ** buckets;} HashTable; int hash_init (HashTable * ht ); int hash_lookup (HashTable * ht, char * key, void ** result); int hash_insert (HashTable * ht, char * key, void * value); int hash_remove (HashTable * ht, char * key); int hash_destroy (HashTable * ht); # endif

Main. c

#include 
 
  #include 
  
   #include #include 
   
    #include "hashtable.h"#define TEST(tcase) printf(">>> [START CASE] " tcase "<<<\n")#define PASS(tcase) printf(">>> [PASSED] " tcase " <<<\n")int main(int argc, char **argv){    HashTable *ht = (HashTable *)malloc(sizeof(HashTable));    int result = hash_init(ht);    assert(result == SUCCESS);    /* Data */    int  int1 = 10;    int  int2 = 20;    char str1[] = "Hello TIPI";    char str2[] = "Value";    /* to find data container */    int *j = NULL;    char *find_str = NULL;    /* Test Key insert */    TEST("Key insert");    hash_insert(ht, "KeyInt", &int1);    hash_insert(ht, "asdfKeyStrass", str1);    hash_insert(ht, "K13eyStras", str1);    hash_insert(ht, "KeyStr5", str1);    hash_insert(ht, "KeyStr", str1);    PASS("Key insert");    /* Test key lookup */    TEST("Key lookup");    hash_lookup(ht, "KeyInt", (void **)&j);    hash_lookup(ht, "KeyStr", (void **)&find_str);    assert(strcmp(find_str, str1) == 0);    assert(*j = int1);    PASS("Key lookup");    /* Test Key update */    TEST("Test key update");    hash_insert(ht, "KeyInt", &int2);    hash_lookup(ht, "KeyInt", (void **)&j);    assert(*j = int2);    PASS("Test key update");    TEST(">>>     Test key not found        <<< ");    result = hash_lookup(ht, "non-exits-key", (void **)&j);    assert(result == FAILED);    PASS("non-exist-key lookup");    TEST("Test key not found after remove");    char strMyKey[] = "My-Key-Value";    find_str = NULL;    hash_insert(ht, "My-Key", &strMyKey);    result = hash_remove(ht, "My-Key");    assert(result == SUCCESS);    result = hash_lookup(ht, "My-Key", (void **)&find_str);    assert(find_str == NULL);    assert(result == FAILED);    PASS("Test key not found after remove");    PASS(">>>     Test key not found        <<< ");    TEST("Add many elements and make hashtable rehash");    hash_insert(ht, "a1", &int2);    hash_insert(ht, "a2", &int1);    hash_insert(ht, "a3", &int1);    hash_insert(ht, "a4", &int1);    hash_insert(ht, "a5", &int1);    hash_insert(ht, "a6", &int1);    hash_insert(ht, "a7", &int1);    hash_insert(ht, "a8", str2);    hash_insert(ht, "a9", &int1);    hash_insert(ht, "a10", &int1);    hash_insert(ht, "a11", &int1);    hash_insert(ht, "a12", &int1);    hash_insert(ht, "a13", &int1);    hash_insert(ht, "a14", &int1);    hash_insert(ht, "a15", &int1);    hash_insert(ht, "a16", &int1);    hash_insert(ht, "a17", &int1);    hash_insert(ht, "a18", &int1);    hash_insert(ht, "a19", &int1);    hash_insert(ht, "a20", &int1);    hash_insert(ht, "a21", &int1);    hash_insert(ht, "a22", &int1);    hash_insert(ht, "a23", &int1);    hash_insert(ht, "a24", &int1);    hash_insert(ht, "a24", &int1);    hash_insert(ht, "a24", &int1);    hash_insert(ht, "a25", &int1);    hash_insert(ht, "a26", &int1);    hash_insert(ht, "a27", &int1);    hash_insert(ht, "a28", &int1);    hash_insert(ht, "a29", &int1);    hash_insert(ht, "a30", &int1);    hash_insert(ht, "a31", &int1);    hash_insert(ht, "a32", &int1);    hash_insert(ht, "a33", &int1);    hash_lookup(ht, "a23", (void **)&j);    assert(*j = int1);    hash_lookup(ht, "a30", (void **)&j);    assert(*j = int1);    PASS("Add many elements and make hashtable rehash");    hash_destroy(ht);    free(ht);    printf("Woohoo, It looks like HashTable works properly\n");    return 0;}
   
  
 

Compile and run

gcc -g -Wall -DDEBUG -o a.out main.c hashtable.c

0x3: Data structure

All the data in PHP, variables, constants, classes, attributes, arrays use Hash table to achieve \ php-5.6.17 \ Zend \ zend_hash.h

Typedef struct bucket {ulong h;/* Used for numeric indexing */uint nKeyLength; // key length void * pData; // point to the data pointer void * pDataPtr saved by Bucke; // pointer data struct bucket * pListNext; // The next element pointer struct bucket * pListLast; // the previous element pointer struct bucket * pNext; struct bucket * pLast; const char * arKey;} Bucket; typedef struct _ hashtable {uint nTableSize; // size of HashTable uint nTableMask; // equal to nTableSize-1 uint nNumOfElements; // number of objects ulong nNextFreeElement; // point to the next null element location nTableSize + 1 Bucket * pInternalPointer;/* Used for element traversal save the pointer of the current traversal */Bucket * pListHead; // The header element pointer Bucket * pListTail; // The tail element pointer Bucket ** arBuckets; // stores the hash array data dtor_func_t pDestructor; // similar to the destructor zend_bool persistent; // which method is used to allocate memory space PHP uniformly manages the memory or uses common malloc unsigned char nApplyCount; // The number of times the current hash bucket is accessed and whether data has been traversed, prevent infinite recursive loops zend_bool bApplyProtection; # if ZEND_DEBUG int inconsistent; # endif} H

Relevant Link:

http://www.imsiren.com/archives/6http://www.php-internals.com/book/?p=chapt03/03-01-01-hashtable https://github.com/reeze/tipi/tree/master/book/sample/chapt03/03-01-01-hashtable 
1. PHP array definition

Arrays in PHP are actually an ordered ING. Valuing is a type that associates values with keys. This type has been optimized in many ways, so you can regard it

1. Real array 2. List (vector) 3. hash list (an implementation of Ing) 4. dictionary 5. Set 6. Stack 7. queue and more possibilities

The value of an array element can also be another array. Tree structures and multi-dimensional arrays are also allowed. arrays are often used in PHP. The biggest advantage of using arrays is speed! Read/write operations can be completed in O (1), because each element has the same size. as long as you know the subscript, you can instantly calculate the position of the corresponding element in the memory, to directly retrieve or write data

Most PHP functions are implemented through HashTable, which includes arrays.

HashTable has the advantage of a two-way linked list. the variables defined in PHP are stored in a symbol table, and this symbol table is actually a HashTable. every element of HashTable is a variable of the zval * type. In addition, containers that save user-defined functions, classes, resources, and other resources are implemented in the kernel in the form of HashTable.

Therefore, PHP's array read and write operations can be completed in O (1), which is very efficient. Therefore, the overhead is created in contrast to C ++ and Java, that is, hashtable, let's take a look at the PHP definition array.

 

Using macros in the kernel

0x1: Array initialization

Zend/zend_vm_execute.h

static int ZEND_FASTCALL  ZEND_INIT_ARRAY_SPEC_CV_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS){    USE_OPLINE    array_init(&EX_T(opline->result.var).tmp_var);    if (IS_CV == IS_UNUSED) {        ZEND_VM_NEXT_OPCODE();#if 0 || IS_CV != IS_UNUSED    } else {        return ZEND_ADD_ARRAY_ELEMENT_SPEC_CV_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS_PASSTHRU);#endif    }}

Php-5.6.17 \ Zend \ zend_API.c

ZEND_API int _array_init(zval *arg, uint size ZEND_FILE_LINE_DC) /* {{{ */{    ALLOC_HASHTABLE_REL(Z_ARRVAL_P(arg));    _zend_hash_init(Z_ARRVAL_P(arg), size, ZVAL_PTR_DTOR, 0 ZEND_FILE_LINE_RELAY_CC);    Z_TYPE_P(arg) = IS_ARRAY;    return SUCCESS;}

Php-5.6.17 \ Zend \ zend_hash.c

ZEND_API int _ zend_hash_init (HashTable * ht, uint nSize, extends pDestructor, zend_bool persistent listener) {uint I = 3; SET_INCONSISTENT (HT_ OK); if (nSize> = 0x80000000) {/* prevent overflow * // when the HASH table size is greater than 0x80000000, the initialization is 0x80000000 ht-> nTableSize = 0x80000000;} else {while (1U <I) <nSize) {I ++;} // the requested array Size is adjusted to the npower of 2, which facilitates memory alignment, I = 3, the minimum nTableSize value is 8 ht-> nTableSize = 1 <I;} ht-> nTableMask = 0; /* 0 means that ht-> arBuckets is uninitialized */ht-> pDestructor = pDestructor; // A function pointer. when HashTable is added, delete, call ht-> arBuckets = (Bucket **) & uninitialized_bucket; ht-> pListHead = NULL; ht-> pListTail = NULL; ht-> nNumOfElements = 0; ht-> nNextFreeElement = 0; ht-> pInternalPointer = NULL; ht-> persistent = persistent; // If persisient is TRUE, use the memory allocation function of the operating system to allocate memory to the Bucket. Otherwise, use the PHP memory allocation function ht> nApplyCount = 0; ht-> bApplyProtection = 1; return SUCCESS ;} ZEND_API int _ partition (HashTable * ht, uint nSize, extends pDestructor, zend_bool persistent, extends bApplyProtection listener) {int retval = _ partition (ht, nSize, pDestructor, persistent listener ); ht-> bApplyProtection = bApplyProtection; return retval ;}

0x2: Add a key value to the array

0x3: operate the PHP array API

// Initialize the PHP array array_init (zval * arg); array_init_size (zval * arg, uint size); // The related array assignment operation function, equivalent to $ array [$ stringKey] = $ value; add_assoc_null (zval * aval, char * key); add_assoc_bool (zval * aval, char * key, zend_bool bval ); add_assoc_long (zval * aval, char * key, long lval); add_assoc_double (zval * aval, char * key, double dval); add_assoc_string (zval * aval, char * key, char * strval, int dup); add_assoc_stringl (zval * aval, char * key, char * strval, uint strlen, int dup); add_assoc_zval (zval * aval, char * key, zval * value); // All the above functions are macro functions, which are the operation functions that encapsulate the add_assoc _ * _ ex function and assign values to the numeric index array, equivalent to $ array [$ numKey] = $ value; ZEND_API int add_index_long (zval * arg, ulong idx, long n); ZEND_API int add_index_null (zval * arg, ulong idx ); ZEND_API int add_index_bool (zval * arg, ulong idx, int B); ZEND_API int add_index_resource (zval * arg, ulong idx, int r); ZEND_API int add_index_double (zval * arg, ulong idx, double d); ZEND_API int add_index_string (zval * arg, ulong idx, const char * str, int duplicate); ZEND_API int add_index_stringl (zval * arg, ulong idx, const char * str, uint length, int duplicate); ZEND_API int add_index_zval (zval * arg, ulong index, zval * value ); // The Operation function that assigns values to arrays with built-in numeric indexes, equivalent to $ array [] = $ value; ZEND_API int add_next_index_long (zval * arg, long n ); ZEND_API int add_next_index_null (zval * arg); ZEND_API int add_next_index_bool (zval * arg, int B); ZEND_API int add_next_index_resource (zval * arg, int r ); ZEND_API int add_next_index_double (zval * arg, double d); ZEND_API int add_next_index_string (zval * arg, const char * str, int duplicate); ZEND_API int struct (zval * arg, const char * str, uint length, int duplicate); ZEND_API int add_next_index_zval (zval * arg, zval * value); // array element assignment and return, equivalent to {$ array [$ key] = $ value; return $ value;} ZEND_API int add_get_assoc_string_ex (zval * arg, const char * key, uint key_len, const char * str, void ** dest, int duplicate); ZEND_API int add_get_assoc_stringl_ex (zval * arg, const char * key, uint key_len, const char * str, uint length, void ** dest, int duplicate); # define add_get_assoc_string (_ arg, _ key, _ str, _ dest, _ duplicate) add_get_assoc_string_ex (_ arg, _ key, strlen (_ key) + 1, _ str, _ dest, _ duplicate) # define add_get_assoc_stringl (_ arg, _ key, _ str, _ length, _ dest, _ duplicate) add_get_assoc_stringl_ex (_ arg, _ key, strlen (_ key) + 1, _ str, _ length, _ dest, _ duplicate) ZEND_API int add_get_index_long (zval * arg, ulong idx, long l, void ** dest); ZEND_API int add_get_index_double (zval * arg, ulong idx, double d, void ** dest); ZEND_API int add_get_index_string (zval * arg, ulong idx, const char * str, void ** dest, int duplicate ); ZEND_API int add_get_index_stringl (zval * arg, ulong idx, const char * str, uint length, void ** dest, int duplicate );

Relevant Link:

http://thiniki.sinaapp.com/?p=155http://www.imsiren.com/archives/250http://www.cnblogs.com/ohmygirl/p/internal-4.htmlhttp://weizhifeng.net/write-php-extension-part2-1.htmlhttp://blog.csdn.net/a600423444/article/details/7073854

Copyright (c) 2016 Little5ann All rights reserved

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.