Deep into PHP hashtable structure detailed _php skills

Source: Internet
Author: User
Tags data structures php source code zend

Hashtable is the most important and widely used data structure in the Zend engine, and it is used to store almost everything.
1.2.1 Data structure
The hashtable data structure is defined as follows:

Copy Code code as follows:

typedef struct BUCKET {
ULONG H; Storing hash
UINT Nkeylength;
void *pdata; Point to Value, is a copy of the user's data
void *pdataptr;
struct bucket *plistnext; Plistnext and Plistlast composition
struct bucket *plistlast; The whole Hashtable double linked list
struct bucket *pnext; Pnext and Plast are used to make up a hash correspondence
struct bucket *plast; The double linked list
Char arkey[1]; Key
} Bucket;

typedef struct _HASHTABLE {
UINT Ntablesize;
UINT Ntablemask;
UINT Nnumofelements;
ULONG Nnextfreeelement;
Bucket *pinternalpointer; /* Used for element traversal * *
Bucket *plisthead;
Bucket *plisttail;
Bucket **arbuckets; Hash array
dtor_func_t Pdestructor; Specified when Hashtable is initialized, called when destroying bucket
Zend_bool persistent; Whether to take a C memory allocation routine
unsigned char napplycount;
Zend_bool bapplyprotection;
#if Zend_debug
int inconsistent;
#endif
} HashTable;


In general, Zend's Hashtable is a linked-list hash, which is also optimized for linear traversal, as shown here:

The


Hashtable contains two data structures, a linked list hash and a doubly linked list used for fast key-value queries, which facilitate linear traversal and sorting, and a bucket in both data structures.
> A few explanations for this data structure: Why are two-way lists used in
list hashes?
A general list of linked lists is only required to operate on the key, only a single linked list is enough. However, Zend sometimes need to remove a given bucket from a list hash, and using a doubly linked list can be very efficient.
What does Ntablemask do? The value of
is used for the conversion of the hash value to the subscript of the arbuckets array. When initializing a hashtable,zend first allocates the ntablesize size of memory for the arbuckets array, Ntablesize takes the smallest 2^n that is not less than the user-specified size, that is, the binary 10*. Ntablemask = ntablesize–1, the 01* of the binary, at which time H & Ntablemask fall exactly in [0, ntablesize–1], Zend to access the Arbuckets array as index.
What does pdataptr do?
Typically, when a user inserts a key-value pair, Zend copies the value and points pdata to the value copy. The copy operation needs to call the Zend internal routine emalloc to allocate memory, which is a time-consuming operation and consumes a chunk of memory larger than the value (the extra memory is used to store cookies), which can be a big waste if the value is small. Given that Hashtable is used to hold pointer values, Zend introduces Pdataptr, and when value is as small as the pointer, Zend copies it directly into Pdataptr and points pdata to pdataptr. This avoids the emalloc operation, but also helps to improve the cache hit rate.
Arkey size why only 1? Why not use pointer management key? The
Arkey is an array of keys, but its size is only 1, and is not sufficient to lay down the key. The following code can be found in the initialization function of Hashtable:

Copy Code code as follows:

p = (Bucket *) pemalloc (sizeof (Bucket)-1 + nkeylength, ht->persistent);

Visible, Zend for a bucket allocated a piece enough to put down their and key memory, the upper half is bucket, the lower part is key, and Arkey "just" is the last element of bucket, so you can use Arkey to access key. This approach is most common in memory management routines, when allocating memory, it is actually allocating more memory than the specified size, the upper half of which is usually called a cookie, which stores information about the memory, such as block size, previous pointer, next pointer, etc. This method is used by the transmit program of Baidu.
Without pointer management key, it is to reduce the emalloc operation, but also can improve the cache hit rate. Another necessary reason is that key is fixed in most cases, and will not cause the whole bucket to be redistributed because the key becomes longer. This also explains why value is not allocated together as an array--because value is variable.

1.2.2 php array
about Hashtable there's still one question unanswered, what is nnextfreeelement?
Unlike a generic hash, the Zend Hashtable allows the user to specify the hash value directly, ignoring the key and even not specifying the key (at this point, the Nkeylength is 0). At the same time, Hashtable also supports append operations, users can not specify the hash value, only need to provide value, at this time, Zend with Nnextfreeelement as a hash, after the nnextfreeelement increment. This behavior of the
Hashtable looks strange because it will not be able to access value by key, and is not a hash at all. The key to understanding the problem is that the PHP array is implemented using Hashtable-associative arrays add elements to the Hashtable using a normal k-v mapping, and the key to the user-specified string; Non-associative arrays use array subscripts directly as hash values, no key , and when the Association and the Association are mixed in an array, or when the Array_push operation is used, the nnextfreeelement is required.
again, the value of the value,php array directly uses the ZVAL general structure, and pdata points to zval*, which, as described in the previous section, will be stored directly in the zval*. Because the zval is used directly, the elements of the array can be any PHP type.
The traversal of an array, that is, foreach, each, and so on, is done through a Hashtable doubly linked list, pinternalpointer the current position as a cursor.

1.2.3 variable symbol table
In addition to arrays, Hashtable is also used to store many other data, such as PHP functions, variable symbols, loaded modules, class members, and so on.
A variable symbol table is equivalent to an associative array whose key is a variable name (visible, it is not a good idea to use a very long variable name), value is zval*.
at any one time PHP code can see two variables symbol table--symbol_table and active_symbol_table--are used to store global variables, called global symbol tables; the latter is a pointer to the currently active variable symbol table. Typically, the global symbol table. However, each time you enter a PHP function (this refers to a function created by the user using PHP code), Zend creates the variable symbol table for the function part and points the active_symbol_table to the local symbol table. Zend Always use active_symbol_table to access variables, which enables the scope control of local variables.
However, if a variable that is marked Global is accessed locally in the function, Zend does special processing-creating a reference to a variable of the same name in the symbol_table in Active_symbol_table, which is created first if there is no variable of the same name in Symbol_table.

1.3 Memory and file
programs have resources that typically include memory and files, and for the usual programs, these resources are process-oriented, and when the process is over, The operating system or C library automatically reclaims resources that we do not explicitly release.
However, the PHP program has its own specificity, it is based on the page, a page run will also apply for memory or files such resources, however, when the page is finished, the operating system or C library may not know the need for resource recovery. For example, we compile PHP as a module into Apache and run Apache in Prefork or worker mode. In this case, the Apache process or thread is reused, and the memory allocated by the PHP page will remain memory until the core.
to address this problem, Zend provides a set of memory allocation APIs that function like the corresponding functions in C, unlike functions that allocate memory from Zend's own memory pool, and they can implement automatic page-recycling. In our module, the memory allocated for the page should use these APIs instead of the C routines, otherwise Zend would try to efree out our memory at the end of the page, and the result is usually crush.
Emalloc ()
Efree ()
Estrdup ()
Estrndup ()
Ecalloc ()
Erealloc ()
In addition, Zend also provides a set of shapes such as Vcwd_ The macros in XXX Replace the corresponding file APIs for C libraries and operating systems that support the virtual working directory of PHP and should always be used in module code. See the PHP source code "tsrm/tsrm_virtual_cwd.h" for specific definitions of macros. You may notice that all of those macros do not provide a close operation because the close object is an open resource, does not involve a file path, so you can use the C or OS routines directly; Similarly, operations such as read/write are routines that use either C or the operating system directly.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.