Implementation and operation of PHP array/hash table

Source: Internet
Author: User
Tags array definition key string php definition rehash strcmp

Catalogue

1. PHP hash Table 1. PHP Array Definition

1. PHP Hash Table

0x1: Basic Concepts

Hash tables are used extensively in practice, such as a symbol table that compilers typically maintain to save tags, and in many high-level languages, hash tables are also explicitly supported. Hash tables usually provide operations such as finding (search), inserting (insert), deleting (delete), and these operations in the worst case

The same as the performance of the linked list is O (n). But usually not so bad, a reasonably designed hashing algorithm can effectively avoid such situations, usually the hash table of these operating time complexity is O (1). And that's why it's so beloved.

It is precisely because of the convenience and efficiency of the use of hash tables, most of the current dynamic language implementations are using a hash table

A hash table is a data structure that maps a specific key to a specific value through a hash function that maintains a one by one correspondence between a key and a value

A hash table can be understood as an array of extensions or associative arrays, arrays are addressed using numeric subscripts, and if the range of keywords (key) is small and numeric, we can use arrays to complete the hash table, and if the keyword range is too large, we need to apply for space for all possible keys if we use the array directly. In many cases this is unrealistic. Even if space is sufficient, space utilization will be low, which is not ideal. Keys may not be numbers at the same time, especially in PHP, so people use a mapping function (hash function) to map a key to a specific domain

H (key), index

By properly designing the hash function, we can map the key to the appropriate range, because our key space can be very large (such as the string key), when mapping to a smaller space, there may be two different key mappings to the same index of the case, which is what we said there is a conflict. There are two main ways to resolve hash conflicts: Link method and Open addressing method

1. Conflict Resolution: Link law

The link method resolves the conflict by using a linked list to hold the slot values, that is, when different keys are mapped to a slot, the linked list is used to hold the values. So using the link method is in the worst case, that is, all the keys are mapped to the same slot, so that the hash table degenerate into a linked list, so that the time complexity of the operation of the list becomes O (n), so that the performance advantage of the hash table is not, so the choice of a suitable hash function is the most critical

Because most of the programming language hash table implementation is open source, most of the language hash algorithm is a public algorithm, although the current hashing algorithm can be good to compare the key to a uniform distribution, and this if the premise is that key is random, because of the certainty of the algorithm, This leads to the use of a known algorithm of the deterministic hacker can construct some special key, so that these keys are mapped to the same slot causes the hash table to degenerate into a single-linked list, resulting in a sharp decline in the performance of the program, resulting in some applications of the throughput of a sharp decline, especially for the high concurrency of the application of great impact, A large number of similar requests can cause the server to suffer DOS (Denial of service attack)

The most fundamental weaknesses of hash conflict exploits are:

Currently, the Hashtable hash conflict resolution method in PHP is the link method

2. Conflict Resolution: Open addressing Method

There is usually another way to resolve conflicts: open addressing. Using open addressing is the slot itself storing data directly, when inserting data if the key is mapped to an index that already has data, this indicates a conflict, which is to look for the next slot, and if the slot is also occupied, continue to look for the next slot until the slot is not occupied, and the same strategy is used for the search. Since open addressing is used to deal with conflicts in the space of other slots, which may lead to subsequent key at the time of insertion more prone to hash conflicts, so the open addressing method of the hash table loading factor can not be too high, otherwise prone to performance degradation

0X2: The implementation of a hash table

Implementing a hash table is also easy, the main need to complete the work only three points

1. Implement the hash function 2. Resolution of the conflict 3. Implementation of the Operation interface

Before starting to learn the PHP native kernel hash table implementation, we can first manually implement a simple version of the hash table

1. Basic data structure definition

#ifndef _hash_table_h_#define _hash_table_h_ 1typedef struct _bucket{    char *key;    void *value;    struct _bucket *next;} bucket;typedef struct _hashtable{    int size;    HashTable size/lines    int elem_num;    Total elements Count    bucket** buckets;} HashTable; #endif

2. Hash function implementation

The hash function needs to map the different keys to different slots (slots or buckets) as much as possible, first we use one of the simplest hashing algorithms: add all the characters of the key string and then model the size of the hash table with the result so that the index falls within the range of the array index.

Hashtable.c#include 
 
  
   
  #include 
  
   
    
   #include 
   
    
     
    #include "hashtable.h" int Hash_str (char *key); int Hash_str (char *key) {    int hash = 0;    char *cur = key;    while (*cur! = ') "        {        hash + = *cur;        ++cur;        }        return hash;} Hashtable.h#define Hash_index (
   ht, key) (Hash_str (key))% (HT)->size)
    
  
   
 
  

3. Implementation of the Operation interface

In order to manipulate the hash table, several operation interface functions are implemented.

int Hash_init (HashTable *ht);                               Initialize hash table int Hash_lookup (HashTable *ht, char *key, void **result);   Find content based on key int Hash_insert (HashTable *ht, char *key, void *value);     Insert content into the hash table int Hash_remove (HashTable *ht, Char *key);                  Delete the content that key points to int Hash_destroy (HashTable *ht);

4. Full Source code

Hashtable.c

#include
 
  #include
  
   #include
   
    
#include "hashtable.h" static void Resize_hash_table_if_needed (Hashtable *ht); static int hash_str (char *key); int Hash_    Init (HashTable *ht) {ht->size = hash_table_init_size;    Ht->elem_num = 0;    Ht->buckets = (Bucket * *) calloc (ht->size, sizeof (bucket *));    if (ht->buckets = = NULL) return FAILED;    Log_msg ("[Init]\tsize:%i\n", ht->size); return SUCCESS;}    int Hash_lookup (HashTable *ht, char *key, void **result) {int index = HASH_INDEX (HT, key);    Bucket *bucket = ht->buckets[index];    if (bucket = = NULL) goto failed; while (bucket) {if (strcmp (Bucket->key, key) = = 0) {log_msg ("[Lookup]\t found%s\tindex:%i            Value:%p\n ", key, Index, bucket->value);                *result = bucket->value;        return SUCCESS;    } bucket = bucket->next;    }failed:log_msg ("[Lookup]\t key:%s\tfailed\t\n", key); return FAILED;} int Hash_insert (HashTable *ht, char *key, void *vAlue) {//check if we need to resize the Hashtable resize_hash_table_if_needed (HT);    int index = HASH_INDEX (HT, key);    Bucket *org_bucket = ht->buckets[index];    Bucket *tmp_bucket = Org_bucket;            Check if the key exits already while (Tmp_bucket) {if (strcmp (key, tmp_bucket->key) = = 0) {            Log_msg ("[Update]\tkey:%s\n", key);            Tmp_bucket->value = value;        return SUCCESS;    } Tmp_bucket = tmp_bucket->next;    } Bucket *bucket = (bucket *) malloc (sizeof (bucket));    Bucket->key = key;    Bucket->value = value;    Bucket->next = NULL;    Ht->elem_num + = 1;        if (org_bucket! = NULL) {log_msg ("[collision]\tindex:%d key:%s\n", index, key);    Bucket->next = Org_bucket;    } ht->buckets[index]= buckets;    Log_msg ("[insert]\tindex:%d key:%s\tht (num:%d) \ n", index, key, Ht->elem_num); return SUCCESS;} int Hash_remove (HashTable *ht, char *key) {int index = Hash_index (HT, key);    Bucket *bucket = ht->buckets[index];    Bucket *prev = NULL;    if (bucket = = NULL) return FAILED;            Find the right bucket from the link list while (bucket) {if (strcmp (Bucket->key, key) = = 0) {            Log_msg ("[Remove]\tkey: (%s) Index:%d\n", key, index);            if (prev = = NULL) {Ht->buckets[index] = bucket->next;            } else {prev->next = bucket->next;            } free (bucket);        return SUCCESS;        } prev = bucket;    Bucket = bucket->next;    } log_msg ("[remove]\t key:%s not found remove \tfailed\t\n", key); return FAILED;}    int Hash_destroy (HashTable *ht) {int i;    Bucket *cur = NULL;    Bucket *tmp = NULL;        for (i=0; i < ht->size; ++i) {cur = ht->buckets[i];            while (cur) {tmp = cur;            Cur = cur->next;      Free (TMP);  }} free (ht->buckets); return SUCCESS;}    static int hash_str (char *key) {int hash = 0;    char *cur = key;        while (*cur! = ') "{hash + = *cur;    ++cur; } return hash;}    static int hash_resize (HashTable *ht) {//double the size int org_size = ht->size;    Ht->size = ht->size * 2;    Ht->elem_num = 0;    Log_msg ("[resize]\torg Size:%i\tnew size:%i\n", Org_size, ht->size);    Bucket **buckets = (Bucket * *) calloc (ht->size, sizeof (bucket * *));    Bucket **org_buckets = ht->buckets;    Ht->buckets = buckets;    int i = 0;        for (i=0; i < org_size; ++i) {Bucket *cur = org_buckets[i];        Bucket *tmp;            while (cur) {//Rehash:insert again Hash_insert (HT, Cur->key, Cur->value);            Free the org bucket, and not the element tmp = cur;            Cur = cur->next;        Free (TMP);    }} free (org_buckets); Log_msg ("[Resize] Done\ n "); return SUCCESS;}  If the elem_num is almost as large as the capacity of the hashtable//we need to resize the hashtable to contain enough elementsstatic void resize_hash_table_if_needed (HashTable *ht) {if (Ht->size-ht->elem_num < 1) {h        Ash_resize (HT); }}
   
  
 

Hashtable.h

#ifndef _hash_table_h_#define _hash_table_h_ 1#define hash_table_init_size 6#define hash_index (HT, key) (Hash_str (key ))% (HT)->size) #if defined (DEBUG) #  define log_msg printf#else# define  log_msg (...) #endif # define SUCCESS 0#define failed-1typedef struct _bucket{    char *key;    void *value;    struct _bucket *next;} bucket;typedef struct _hashtable{    int size;        The size of the hash table    int elem_num;    The number of elements that have been saved    Bucket **buckets;} Hashtable;int hash_init (HashTable *ht); int Hash_lookup (HashTable *ht, char *key, void **result); int Hash_insert ( HashTable *ht, char *key, void *value), int hash_remove (HashTable *ht, char *key); int Hash_destroy (HashTable *ht); #endif

Main.c

#include
 
  #include
  
   #include #include
   
    
#include "hashtable.h" #define TEST (tcase) printf (">>> [START case]" tcase "<<<\n") #define PASS (tcase ) printf (">>> [PASSED]" tcase "<<<\n") int main (int argc, char **argv) {HashTable *ht = (HashTable *    ) malloc (sizeof (HashTable));    int result = Hash_init (HT);    ASSERT (Result = = SUCCESS);    /* Data */int int1 = 10;    int int2 = 20;    Char str1[] = "Hello TIPI";    Char str2[] = "Value";    /* To find data container */int *j = NULL;    char *find_str = NULL;    /* Test Key Insert */Test ("Key insert");    Hash_insert (HT, "KeyInt", &int1);    Hash_insert (HT, "Asdfkeystrass", str1);    Hash_insert (HT, "K13eystras", str1);    Hash_insert (HT, "KEYSTR5", str1);    Hash_insert (HT, "Keystr", str1);    PASS ("Key insert");    /* Test Key Lookup */Test ("Key Lookup");    Hash_lookup (HT, "KeyInt", (void * *) &j);    Hash_lookup (HT, "Keystr", (void * *) &find_str);    ASSERT (strcmp (find_str, str1) = = 0);    ASSERT (*j = int1); PaSS ("Key lookup");    /* Test Key UPDATE */test ("Test key Update");    Hash_insert (HT, "KeyInt", &int2);    Hash_lookup (HT, "KeyInt", (void * *) &j);    ASSERT (*j = Int2);    PASS ("Test key Update");    Test (">>> test key not found <<<");    result = Hash_lookup (HT, "Non-exits-key", (void * *) &j);    ASSERT (Result = = FAILED);    PASS ("Non-exist-key lookup");    Test ("Test key not found after remove");    Char strmykey[] = "My-key-value";    Find_str = NULL;    Hash_insert (HT, "My-key", &strmykey);    result = Hash_remove (HT, "My-key");    ASSERT (Result = = SUCCESS);    result = Hash_lookup (HT, "My-key", (void * *) &find_str);    ASSERT (Find_str = = NULL);    ASSERT (Result = = FAILED);    PASS ("Test key not found after remove");    PASS (">>> Test key not found <<<");    TEST ("Add many elements and make Hashtable rehash");    Hash_insert (HT, "A1", &int2);    Hash_insert (HT, "A2", &int1); Hash_insert (HT, "A3", &int1);    Hash_insert (HT, "A4", &int1);    Hash_insert (HT, "A5", &int1);    Hash_insert (HT, "A6", &int1);    Hash_insert (HT, "A7", &int1);    Hash_insert (HT, "A8", str2);    Hash_insert (HT, "A9", &int1);    Hash_insert (HT, "A10", &int1);    Hash_insert (HT, "A11", &int1);    Hash_insert (HT, "A12", &int1);    Hash_insert (HT, "A13", &int1);    Hash_insert (HT, "A14", &int1);    Hash_insert (HT, "A15", &int1);    Hash_insert (HT, "A16", &int1);    Hash_insert (HT, "A17", &int1);    Hash_insert (HT, "A18", &int1);    Hash_insert (HT, "A19", &int1);    Hash_insert (HT, "A20", &int1);    Hash_insert (HT, "A21", &int1);    Hash_insert (HT, "A22", &int1);    Hash_insert (HT, "A23", &int1);    Hash_insert (HT, "A24", &int1);    Hash_insert (HT, "A24", &int1);    Hash_insert (HT, "A24", &int1);    Hash_insert (HT, "A25", &int1);    Hash_insert (HT, "A26", &int1);    Hash_insert (HT, "A27", &int1); HAsh_insert (HT, "A28", &int1);    Hash_insert (HT, "A29", &int1);    Hash_insert (HT, "A30", &int1);    Hash_insert (HT, "A31", &int1);    Hash_insert (HT, "A32", &int1);    Hash_insert (HT, "A33", &int1);    Hash_lookup (HT, "A23", (void * *) &j);    ASSERT (*j = int1);    Hash_lookup (HT, "A30", (void * *) &j);    ASSERT (*j = int1);    PASS ("Add many elements and make Hashtable rehash");    Hash_destroy (HT);    Free (HT);    printf ("Woohoo, It looks like HashTable works properly\n"); return 0;}
   
  
 

Compile run

Gcc-g-wall-ddebug-o a.out main.c hashtable.c

0X3: Data structure

All data in PHP, variables, constants, classes, attributes, arrays are all implemented using hash tables \PHP-5.6.17\ZEND\ZEND_HASH.H

typedef struct Bucket {ulong H;                /* Used for numeric indexing */UINT nkeylength;                    Key length void *pdata;                    Pointer to Bucke save the data pointers void *pdataptr;        Pointer data struct bucket *plistnext;        Next element pointer struct bucket *plistlast;    Previous element pointer struct bucket *pnext;    struct bucket *plast; const char *arkey;}            bucket;typedef struct _hashtable {uint ntablesize;            Hashtable of size uint Ntablemask;        equals nTableSize-1 uint nnumofelements;        Number of objects ulong nnextfreeelement;    Point to the next empty element position ntablesize+1 Bucket *pinternalpointer;            /* Used for element traversal saves the currently traversed pointer */Bucket *plisthead;            Head element pointer Bucket *plisttail;            The tail element pointer Bucket **arbuckets;    Storing hash array data dtor_func_t pdestructor;        Similar to a destructor zend_bool persistent;    Which method allocates the memory space PHP Unified management memory or with ordinary malloc unsigned char napplycount; The number of times the current hash bucket has been accessed, whether the data has been traversedTo prevent infinite recursive loops zend_bool bapplyprotection; #if zend_debug int inconsistent; #endif} H 

Relevant Link:

1. PHP array definition

The array in PHP is actually an ordered map. A map is a type that associates values to the keys. This type is optimized in many ways, so you can think of it as

1. Real Array 2. List (vector) 3. Hash table (is an implementation of the mapping) 4. Dictionary 5. Collection 6. Stack 7. Queues and more possibilities

The value of an array element can also be another array. Tree structure and multidimensional arrays are also allowed, PHP often use arrays, the greatest advantage of using arrays is speed! Read and write can be done in O (1), because it is the size of each element is consistent, as long as the subscript, you can instantly calculate the corresponding element in memory position, so that directly remove or write

Most of PHP's functionality is implemented through Hashtable, which includes arrays

Hashtable is the advantage of having a doubly linked list, the variables defined in PHP are stored in a symbol table, and this symbol is actually a hashtable, and each of its elements is a zval*-type variable. In addition, containers that store user-defined functions, classes, resources, and so on are implemented in the kernel in the form of Hashtable.

Therefore, the PHP array read and write can be done in O (1), which is very efficient, so the cost and C + +, compared to Java is Hashtable created, we look at the PHP definition array

 
  

Use macros in the kernel to implement

0x1: Array initialization

Zend/zend_vm_execute.h

static int Zend_fastcall  Zend_init_array_spec_cv_const_handler (Zend_opcode_handler_args) {    use_opline    Array_init (&ex_t (Opline->result.var). Tmp_var);    if (IS_CV = = is_unused) {        zend_vm_next_opcode (); #if 0 | | IS_CV! = is_unused    } else {        return Zend_add_array_element_spec_cv_const_handler (zend_opcode_handler_args_ PASSTHRU); #endif    }}

\php-5.6.17\zend\zend_api.c

Zend_api int _array_init (zval *arg, uint size ZEND_FILE_LINE_DC)/* * {{*/{    Alloc_hashtable_rel (z_arrval_p (ARG)); C12/>_zend_hash_init (Z_arrval_p (ARG), size, zval_ptr_dtor, 0 zend_file_line_relay_cc);    Z_type_p (ARG) = Is_array;    return SUCCESS;}

\php-5.6.17\zend\zend_hash.c

Zend_api int _zend_hash_init (HashTable *ht, uint nSize, dtor_func_t pdestructor, Zend_bool persistent zend_file_line_dc)    {UINT i = 3;    Set_inconsistent (HT_OK); if (nSize >= 0x80000000) {/* Prevent overflow *///hash table size greater than 0x80000000 is initialized to 0x80000000 Ht->ntab    Lesize = 0x80000000;        } else {while ((1U << i) < nSize) {i++;    }//The requested array size space is adjusted to 2 N-square, which facilitates memory alignment, i=3,ntablesize minimum value is 8 ht->ntablesize = 1 << i;    } ht->ntablemask = 0;    /* 0 means that ht->arbuckets is uninitialized */ht->pdestructor = Pdestructor;    A function pointer, when hashtable occurs, Ht->arbuckets = (bucket**) &uninitialized_bucket;    Ht->plisthead = NULL;    Ht->plisttail = NULL;    ht->nnumofelements = 0;    ht->nnextfreeelement = 0;    Ht->pinternalpointer = NULL;    Ht->persistent = persistent;    If Persisient is true, the bucket is allocated memory using the memory allocation function of the operating system itself, otherwise ht->napplycount = 0 using PHP's memory allocation function; Ht->bApplyprotection = 1; return SUCCESS;} Zend_api int _zend_hash_init_ex (HashTable *ht, uint nSize, dtor_func_t pdestructor, Zend_bool persistent, Zend_bool bAppl    Yprotection zend_file_line_dc) {int retval = _zend_hash_init (HT, nSize, pdestructor, persistent zend_file_line_cc);    Ht->bapplyprotection = bapplyprotection; return retval;}

0x2: Array Add key value

0X3: API for manipulating PHP arrays

Initializes the PHP array array_init (Zval *arg); Array_init_size (Zval *arg, uint size); An operation function that associates an array assignment, equivalent to $array[$stringKey] = $value; Add_assoc_null (Zval *aval, Char *key); Add_assoc_bool (Zval *aval, char * Key, Zend_bool bval); Add_assoc_long (Zval *aval, Char *key, long lval); Add_assoc_double (Zval *aval, Char *key, double dval)  ; add_assoc_string (Zval *aval, Char *key, char *strval, int dup); Add_assoc_stringl (Zval *aval, char *key,char *strval, UINT strlen, int dup); Add_assoc_zval (Zval *aval, Char *key, Zval *value);//The function is a macro function, all of the ADD_ASSOC_*_EX function of the package//numeric index array assignment operation function , equivalent to $array[$numKey] = $value; Zend_api int Add_index_long (zval *arg, ulong idx, long N); Zend_api int Add_index_null (zval *arg, ulong idx); Zend_api int Add_index_bool (zval *arg, ulong idx, int b); Zend_api int Add_index_resource (zval *arg, ulong idx, int R); Zend_api int add_index_double (zval *arg, ulong idx, double D); Zend_api int add_index_string (zval *arg, ulong idx, const char *STR, int duplicate); Zend_api int Add_index_stringl (zval *arg, ulong IDX, const char *STR, UINT length, int duplicate); Zend_api int Add_index_zval (zval *arg, ULONG Index, Zval *value); An operation function that uses an array assignment of a built-in numeric index, equivalent to $array[] = $value; Zend_api int Add_next_index_long (Zval *arg, long N); Zend_api int Add_next_index_null (Zval *arg); Zend_api int Add_next_index_bool (zval *arg, int b); Zend_api int Add_next_index_resource (zval *arg, int R); Zend_api int add_next_index_double (Zval *arg, double D); Zend_api int add_next_index_string (zval *arg, const char *STR, int duplicate); Zend_api int Add_next_index_stringl (zval *arg, const char *STR, UINT length, int duplicate); Zend_api int Add_next_index_zval (zval *arg, Zval *value); The array element is assigned and returned, which is equivalent to {$array [$key] = $value; return $value;} Zend_api int add_get_assoc_string_ex (zval *arg, const char *key, UINT key_len, const char *str, void **dest, int duplicate );  Zend_api int add_get_assoc_stringl_ex (zval *arg, const char *key, UINT key_len, const char *STR, uint length, void **dest, int duplicate); #define Add_get_assoc_string (__arg, __kEY, __str, __dest, __duplicate) add_get_assoc_string_ex (__arg, __key, strlen (__key) +1, __str, __dest, __duplicate) # Define ADD_GET_ASSOC_STRINGL (__arg, __key, __str, __length, __dest, __duplicate) add_get_assoc_stringl_ex (__arg, __key  , strlen (__key) +1, __str, __length, __dest, __duplicate) Zend_api int Add_get_index_long (zval *arg, ulong idx, long l, void **dest); Zend_api int add_get_index_double (zval *arg, ulong idx, double D, void **dest); Zend_api int add_get_index_string (zval *arg, ulong idx, const char *str, void **dest, int duplicate); Zend_api int Add_get_index_stringl (zval *arg, ulong idx, const char *STR, uint length, void **dest, int duplicate);

Relevant Link:

http://thiniki.sinaapp.com/?p=155http://www.imsiren.com/archives/250http://www.cnblogs.com/ohmygirl/p/ internal-4.htmlhttp://weizhifeng.net/write-php-extension-part2-1.htmlhttp://blog.csdn.net/a600423444/article/ details/7073854

Copyright (c) Little5ann All rights reserved

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.