Hashtable is also known as a hash table in the usual data structure textbook. The basic principle is simple (if you are unfamiliar with it, please refer to a random data structure textbook or search online), but PHP implementation has its unique place. Understanding the Hashtable data storage structure, it is very important for us to analyze the source code of PHP, especially the implementation of virtual machine in Zend engine. It helps us to simulate the image of a complete virtual machine in the brain. It is also the basis for an array of other data structures in PHP.
The implementation of Zend Hashtable combines the advantages of two-way linked list and vector (array) data structures, which provides a very efficient storage and querying mechanism for PHP.
Let ' s begin!
I. Data structure of Hashtable
The implementation code of Hashtable in Zend Engine mainly includes ZEND_HASH.H, zend_hash.c these two files. Zend Hashtable consists of two main data structures, one is bucket (bucket) structure and the other is the Hashtable structure. The bucket structure is a container for storing data, while the Hashtable structure provides a mechanism for managing all of these buckets (or buckets).
Copy the Code code as follows:
typedef struct BUCKET {
ULONG H; /* Used for numeric indexing */
UINT Nkeylength; /* Key Length */
void *pdata; /* Pointer to the data saved in the bucket */
void *pdataptr; /* Pointer data */
struct bucket *plistnext; /* point to the next element of the Hashtable bucket column */
struct bucket *plistlast; /* point to the previous element in the Hashtable bucket column */
struct bucket *pnext; /* point to the next element of a bucket column with the same hash value */
struct bucket *plast; /* point to the previous element of a bucket column with the same hash value */
Char arkey[1]; /* Must be the last member, key name */
} buckets;
In Zend Hashtable, each data element (Bucket) has a key name (key), which is unique throughout the hashtable and cannot be duplicated. The data elements in the Hashtable can be uniquely determined based on the key name. The key name is represented in two ways. The first method uses the string Arkey as the key name, and the length of the string is nkeylength. Notice that in the above data structure Arkey is only a 1 character array, but it does not mean that key can only be a single character. The bucket is actually a variable-length structure, and since Arkey is the last member variable of the bucket, a key with a length of nkeylength can be determined by combining Arkey with Nkeylength. This is one of the more common techniques in C language programming. Another key name is represented by the index, at which point the nkeylength is always 0, and the Long integer field h represents the key name of the data element. In simple terms, if nkeylength=0, the key name is H, otherwise the key name is Arkey and the key name is Nkeylength.
When Nkeylength > 0 o'clock, it does not mean that the H value at this point is meaningless. In fact, at this point it holds the hash value corresponding to the Arkey. Regardless of how the hash function is designed, conflicts are unavoidable, meaning that different arkey may have the same hash value. Buckets with the same hash value are stored in the bucket column corresponding to the same index of the Hashtable arbuckets array (see explanation below). This bucket column is a doubly linked list, its forward elements, and the back elements are represented by Plast, Pnext, respectively. The newly inserted bucket is placed at the front of the bucket column.
In buckets, the actual data is stored in a block of memory pointed to by the pdata pointer, which is usually allocated separately by the system. One exception is that, when the bucket holds the data as a pointer, Hashtable will not request the system to allocate additional space to hold the pointer, but instead directly save the pointer to Pdataptr and then point pdata to the address of the member of the struct. This can improve efficiency and reduce memory fragmentation. This allows us to see the subtleties of PHP hashtable design. If the data in the bucket is not a pointer, pdataptr is NULL.
All buckets in the Hashtable through Plistnext, Plistlast constitute a doubly linked list. The newly inserted bucket is placed at the end of this doubly linked list.
Note In general, buckets do not provide information about the size of the data it stores. Therefore, in the implementation of PHP, the data stored in buckets must have the ability to manage their own size.
Copy the Code code as follows:
typedef struct _HASHTABLE {
UINT Ntablesize;
UINT Ntablemask;
UINT Nnumofelements;
ULONG Nnextfreeelement;
Bucket *pinternalpointer;
Bucket *plisthead;
Bucket *plisttail;
Bucket **arbuckets;
dtor_func_t Pdestructor;
Zend_bool persistent;
unsigned char napplycount;
Zend_bool bapplyprotection;
#if Zend_debug
int inconsistent;
#endif
} HashTable;
Current 1/3 page 123 next page
The above describes the Hashtable PHP source code Analysis Zend Hashtable detailed 1th 3 pages, including the hashtable aspects of the content, I hope to be interested in PHP tutorial friends helpful.