PHP kernel exploration: Hash Table collision attack Principle

Source: Internet
Author: User
Recently, hashtable collision attacks (HashtablecollisionsasDOSattack) have been raised, and various languages have been involved. This article combines the PHP kernel source code to discuss the principles and implementation of such attacks.

Recently, Hashtable collision attacks (Hashtable collisions as DOS attack) have been mentioned, and many languages have been involved. This article combines the PHP kernel source code to discuss the principles and implementation of such attacks.

The following illustration demonstrates how to explore the PHP kernel: Hash Table collision attack principle.

Recently, Hashtable collision attacks (Hashtable collisions as DOS attack) have been mentioned, and many languages have been involved. This article combines the PHP kernel source code to discuss the principles and implementation of such attacks.

Basic Principles of hash table collision attacks

A hash table is a highly efficient data structure. Many languages implement a hash table internally. The hash table in PHP is an extremely important data structure, which is used not only to represent the Array data type, it is also used to store the context information inside the Zend Virtual Machine (the variables and functions in the execution context are stored in the hash table structure ).

Ideally, the time complexity of hash table insertion and search operations is O (1 ), any data item can calculate a hash value (key) within a period of time unrelated to the hash table length, and then locate a bucket (term bucket, position in the hash table ). Of course, this is an ideal situation. Because the length of any hash table is limited, different data items must have the same hash value. At this time, different data items are designated as the same bucket, it is called collision ). The implementation of the hash table needs to solve the collision problem. There are two ways to solve the collision problem. The first is to define the collision data to other buckets according to some principle, for example, linear detection-if a collision occurs when data is inserted, search for the bucket following this bucket in sequence and place it in the first unused bucket; the second strategy is that each bucket is not a location that can only accommodate a single data item, but a data structure that can accommodate multiple data items (such as a linked list or a red/black tree ), all collision data is organized in a certain data structure.

No matter which Collision Resolution Policy is used, the time complexity of insert and search operations is no longer O (1 ). Take the search as an example. If the bucket cannot be located by the key, it must be compared to whether the original key (that is, the key before hash) is equal. If not, use the same algorithm as insert to continue searching until the matching value is found or the data is not in the hash table.

PHP uses a single-chain table to store collision data. Therefore, the average search complexity of the PHP hash table is O (L), where L is the average length of the bucket list; the worst complexity is O (N). At this time, all data is collided, And the hash table degrades to a single-chain table. Normal and degraded hash tables in PHP.

The hash table collision attack means that by carefully constructing data, all data is collided, And the hash table is manually converted into a degraded single-chain table. At this time, the time for operations on the hash table is increased by an order of magnitude, therefore, a large amount of CPU resources are consumed, and the system cannot quickly respond to requests, thereby achieving DoS attacks.

As you can see, the premise of the hash collision attack is that the hash algorithm is particularly easy to identify the collision. If it is MD5 or SHA1, it is basically useless. Fortunately (it can be said that unfortunately) hash algorithms used by most programming languages are very simple (for efficiency), so we can create attack data without any effort. In the next section, we will analyze the Zend kernel code to find out how to attack the hash table collision attack PHP.
Internal implementation data structure of the Zend hash table
In PHP, a structure called Backet is used to represent buckets. All buckets with the same hash value are organized into a single-chain table. The hash table is represented by the HashTable struct. The source code is in zend/Zend_hash.h:

Typedef struct bucket {ulong h;/* Used for numeric indexing */uint nKeyLength; void * pData; void * pDataPtr; struct bucket * pListNext; struct bucket * pListLast; struct bucket * pNext; struct bucket * pLast; char arKey [1];/* Must be last element */} Bucket; typedef struct _ hashtable {uint nTableSize; uint nTableMask; uint nNumOfElements; ulong nNextFreeElement; Bucket * pInternalPointer;/* Used for element traversal */Bucket * pListHead; Bucket * pListTail; Bucket ** arBuckets; extends pDestructor; extends persistent; unsigned char nApplyCount; zend_bool bApplyProtection; # ifZEND_DEBUG int inconsistent; # endif} HashTable;

The field name clearly indicates its purpose, so it is not explained too much. The following fields are highlighted: "h" in the Bucket is used to store the original key; nTableMask in HashTable is a mask, which is generally set to nTableSize-1, which is closely related to the hash algorithm, the hash algorithm will be detailed later. arBuckets points to a pointer array, where each element is a header pointer to the Bucket linked list.
Hash Algorithm
The minimum capacity of the PHP hash table is 8 (2 ^ 3), and the maximum capacity is 0 × 80000000 (2 ^ 31 ), and round to the integer power of 2 (that is, the length is automatically extended to the integer power of 2, for example, the length of the hash table of 13 elements is 16; the length of the hash table of the 100 elements is 128 ). NTableMask is initialized to reduce the length of the hash table by 1 (after rounding. The specific code is in the _ zend_hash_init function of zend/Zend_hash.c. Here we take the sections related to this article and add a few comments.

ZEND_API metadata (HashTable * ht, uintnSize, hash_func_t pHashFunction, javaspdestructor, zend_bool persistent metadata) {uinti = 3; Bucket ** tmp; SET_INCONSISTENT (HT_ OK ); // The integer power of the length to 2 if (nSize> = 0x80000000) {/* prevent overflow */ht-> nTableSize = 0x80000000 ;} else {while (1U <I) <nSize) {I ++;} ht-> nTableSize = 1 <I ;} ht-> nTableMask = ht-> nTableSize-1;/* If Dry code... */ReturnSUCCESS ;}

It is worth mentioning that the method for PHP to round the integer power of 2 is very clever and can be used as needed.

The hash algorithm of Zend HashTable is simple:

The Code is as follows:


Hash (key) = key & nTableMask

That is, you can simply combine the original key of the data with the nTableMask of HashTable by bit.

If the original key is a string, the Times33 algorithm is used to convert the string into an integer and then use the bitwise and of nTableMask.

The Code is as follows:


Hash (strkey) = time33 (strkey) & nTableMask

The following code searches for hash tables in Zend source code:

ZEND_API int zend_hash_index_find (constHashTable * ht, ulong h, void ** pData) {uint nIndex; Bucket * p; IS_CONSISTENT (ht); nIndex = h & ht-> nTableMask; p = ht-> arBuckets [nIndex]; while (p! = NULL) {if (p-> h = h) & (p-> nKeyLength = 0) {* pData = p-> pData; returnSUCCESS ;} p = p-> pNext;} returnFAILURE;} ZEND_API int zend_hash_find (constHashTable * ht, constchar * arKey, uint nKeyLength, void ** pData) {ulong h; uint nIndex; bucket * p; IS_CONSISTENT (ht); h = zend_inline_hash_func (arKey, nKeyLength); nIndex = h & ht-> nTableMask; p = ht-> arBuckets [nIndex]; while (p! = NULL) {if (p-> h = h) & (p-> nKeyLength = nKeyLength) {if (! Memcmp (p-> arKey, arKey, nKeyLength) {* pData = p-> pData; returnSUCCESS ;}} p = p-> pNext;} returnFAILURE ;}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.