Recently, hashtable collision attacks (HashtablecollisionsasDOSattack) have been raised, and various languages have been involved. This article combines the PHP kernel source code to talk about the principles and implementation of such attacks. For more information, see the following illustration to show you how to explore the PHP kernel: hash table collision attack principles.

Recent**Hash table collision attack (Hashtable collisions as DOS attack)**The topic is constantly raised, and various languages are recruiting. This article combines the PHP kernel source code to discuss the principles and implementation of such attacks.

**Basic principles of hash table collision attacks**

A hash table is a highly efficient data structure. many languages implement a hash table internally. The hash table in PHP is an extremely important data structure, which is used not only to represent the Array data type, it is also used to store the context information inside the Zend virtual machine (the variables and functions in the execution context are stored in the hash table structure ).

Ideally, the time complexity of hash table insertion and search operations is O (1 ), any data item can calculate a hash value (key) within a period of time unrelated to the hash table length, and then locate a bucket (term bucket, position in the hash table ). Of course, this is an ideal situation. because the length of any hash table is limited, different data items must have the same hash value. at this time, different data items are designated as the same bucket, it is called collision ). The implementation of the hash table needs to solve the collision problem. There are two ways to solve the collision problem. The first is to define the collision data to other buckets according to some principle, for example, linear detection-if a collision occurs when data is inserted, search for the bucket following this bucket in sequence and place it in the first unused bucket; the second strategy is that each bucket is not a location that can only accommodate a single data item, but a data structure that can accommodate multiple data items (such as a linked list or a red/black tree ), all collision data is organized in a certain data structure.

No matter which collision resolution policy is used, the time complexity of insert and search operations is no longer O (1 ). Take the search as an example. if the bucket cannot be located by the key, it must be compared to whether the original key (that is, the key before hash) is equal. if not, use the same algorithm as insert to continue searching until the matching value is found or the data is not in the hash table.

PHP uses a single-chain table to store collision data. Therefore, the average search complexity of the PHP hash table is O (L), where L is the average length of the bucket list; the worst complexity is O (N). at this time, all data is collided, and the hash table degrades to a single-chain table. Normal and degraded hash tables in PHP.

The hash table collision attack means that by carefully constructing data, all data is collided, and the hash table is manually converted into a degraded single-chain table. at this time, the time for operations on the hash table is increased by an order of magnitude, therefore, a large amount of CPU resources are consumed, and the system cannot quickly respond to requests, thereby achieving DoS attacks.

As you can see, the premise of the hash collision attack is that the hash algorithm is particularly easy to identify the collision. if it is MD5 or SHA1, it is basically useless. Fortunately (it can be said that unfortunately) hash algorithms used by most programming languages are very simple (for efficiency), so we can create attack data without any effort. In the next section, we will analyze the Zend kernel code to find out how to attack the hash table collision attack PHP.

Internal Implementation data structure of the Zend hash table

In PHP, a structure called Backet is used to represent buckets. all buckets with the same hash value are organized into a single-chain table. The hash table is represented by the HashTable struct. The source code is in zend/Zend_hash.h:

typedef struct bucket { ulong h;/* Used for numeric indexing */ uint nKeyLength; void *pData; void *pDataPtr; struct bucket *pListNext; struct bucket *pListLast; struct bucket *pNext; struct bucket *pLast; char arKey[1]; /* Must be last element */} Bucket;typedef struct _hashtable { uint nTableSize; uint nTableMask; uint nNumOfElements; ulong nNextFreeElement; Bucket *pInternalPointer; /* Used for element traversal */ Bucket *pListHead; Bucket *pListTail; Bucket **arBuckets; dtor_func_t pDestructor; zend_bool persistent; unsigned char nApplyCount; zend_bool bApplyProtection;#ifZEND_DEBUG int inconsistent;#endif} HashTable;

The field name clearly indicates its purpose, so it is not explained too much. The following fields are highlighted: "h" in the Bucket is used to store the original key; nTableMask in HashTable is a mask, which is generally set to nTableSize-1, which is closely related to the hash algorithm, the hash algorithm will be detailed later. arBuckets points to a pointer array, where each element is a header pointer to the Bucket linked list.

Hash Algorithm

The minimum capacity of the PHP hash table is 8 (2 ^ 3), and the maximum capacity is 0 × 80000000 (2 ^ 31 ), and round to the integer power of 2 (that is, the length is automatically extended to the integer power of 2, for example, the length of the hash table of 13 elements is 16; the length of the hash table of the 100 elements is 128 ). NTableMask is initialized to reduce the length of the hash table by 1 (after rounding. The specific code is in the _ zend_hash_init function of zend/Zend_hash.c. Here we take the sections related to this article and add a few comments.

ZEND_API metadata (HashTable * ht, uintnSize, hash_func_t pHashFunction, javaspdestructor, zend_bool persistent metadata) {uinti = 3; Bucket ** tmp; SET_INCONSISTENT (HT_ OK ); // The integer power of the length to 2 if (nSize> = 0x80000000) {/* prevent overflow */ht-> nTableSize = 0x80000000 ;} else {while (1U <I) <nSize) {I ++;} ht-> nTableSize = 1 <I ;} ht-> nTableMask = ht-> nTableSize-1;/* if Dry code... */ReturnSUCCESS ;}

It is worth mentioning that the method for PHP to round the integer power of 2 is very clever and can be used as needed.

The hash algorithm of Zend HashTable is simple:

The code is as follows:

Hash (key) = key & nTableMask

That is, you can simply combine the original key of the data with the nTableMask of HashTable by bit.

If the original key is a string, the Times33 algorithm is used to convert the string into an integer and then use the bitwise AND of nTableMask.

The code is as follows:

Hash (strkey) = time33 (strkey) & nTableMask

The following code searches for hash tables in Zend source code:

ZEND_API int zend_hash_index_find(constHashTable *ht, ulong h, void **pData){ uint nIndex; Bucket *p; IS_CONSISTENT(ht); nIndex = h & ht->nTableMask; p = ht->arBuckets[nIndex]; while(p != NULL) { if((p->h == h) && (p->nKeyLength == 0)) { *pData = p->pData; returnSUCCESS; } p = p->pNext; } returnFAILURE;}ZEND_API int zend_hash_find(constHashTable *ht, constchar *arKey, uint nKeyLength, void **pData){ ulong h; uint nIndex; Bucket *p; IS_CONSISTENT(ht); h = zend_inline_hash_func(arKey, nKeyLength); nIndex = h & ht->nTableMask; p = ht->arBuckets[nIndex]; while(p != NULL) { if((p->h == h) && (p->nKeyLength == nKeyLength)) { if(!memcmp(p->arKey, arKey, nKeyLength)) { *pData = p->pData; returnSUCCESS; } } p = p->pNext; } returnFAILURE;}

Zend_hash_index_find is used to find the integer key, and zend_hash_find is used to find the string key. The logic is basically the same, but the string key will be converted to an integer key through zend_inline_hash_func, zend_inline_hash_func encapsulates the times33 algorithm, and the specific code will not be pasted out.

Basic attack

Knowing the algorithm of the PHP internal hash table, we can use its principle to construct the data for attack. One of the simplest ways is to use mask rules to create a collision. As mentioned above, the length of Zend HashTable nTableSize is rounded to an integer power of 2. if we construct a hash table of 2 ^ 16, the binary representation of nTableSize is: 1 0000 0000 0000 0000, while nTableMask = nTableSize-1 is: 0 1111 1111 1111. Next, we can use 0 as the initial value and 2 ^ 16 as the step to create enough data. we can get the following speculation:

0000 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0

0001 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0

0010 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0

0011 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0

0100 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0

......

In general, as long as the last 16 bits are all 0, all hash values after the mask is located are collided with 0.

The following is an attack code written using this principle:

<?php$size= pow(2, 16);$startTime= microtime(true);$array= array();for($key= 0, $maxKey= ($size- 1) * $size; $key<= $maxKey; $key+= $size) { $array[$key] = 0;}$endTime= microtime(true);echo $endTime- $startTime, " seconds";

This code was completed in nearly 88 seconds on my VPS (single CPU, MB memory), during which the CPU resources were almost exhausted:

However, it takes only 0.036 seconds to insert a hash table of the same size:

<?php$size= pow(2, 16);$startTime= microtime(true);$array= array();for($key= 0, $maxKey= ($size- 1) * $size; $key<= $size; $key+= 1) { $array[$key] = 0;}$endTime= microtime(true);echo $endTime- $startTime, " seconds";

It can be proved that the time for inserting N elements in the second code is at the O (N) level, while the first attack code requires O (N ^ 2) to insert N elements.

**POST attack**

Of course, in general, it is difficult for an attacker to directly modify the PHP code, but the attacker can still use some methods to indirectly construct a hash table for attacks. For example, PHP constructs the data in the received http post request as $ _ POST, which is an Array and is represented internally by Zend HashTable, therefore, attackers only need to construct a post request containing a large number of collision keys to achieve the attack. The specific practices will not be demonstrated.

**Protection against POST attacks**

Currently, PHP protects against POST-based hash collision attacks by controlling the number of POST data. In version> = PHP5.3.9, a configuration item max_input_vars is added to identify the maximum number of parameters received for an http request. the default value is 1000. Therefore, PHP5.3.x users can upgrade to 5.3.9 to avoid hash collision attacks. 5.2.x users can use this patch: http://www.laruence.com/2011/12/30/2440.html.

In addition, the protection method is to process at the Web server level, such as limiting the size of the http request body and the number of parameters. this is the most commonly used temporary solution. The specific practices are related to different Web servers.

Other protection

The above protection method only limits the number of POST data, and cannot completely solve this problem. For example, if a POST field is of the json data type and is subjected to PHP json_decode, a large json attack data can be constructed as long as the attack data is constructed. Theoretically, this problem may occur as long as the data of constructing an Array somewhere in the PHP code depends on external input. Therefore, a thorough solution should begin with the implementation of HashTable at the bottom layer of Zend. In general, there are two ways: one is to limit the maximum length of each bucket linked list; the other is to use other data structures such as the red/black tree to replace the linked list organization Collision hash (does not solve the hash collision, it only reduces the impact of attacks and reduces the operation time of N data records from O (N ^ 2) to O (NlogN). The cost is close to O (1) in normal cases) to O (logN )).

Currently, most of the attacks are POST data attacks. Therefore, we recommend that PHP in the production environment be upgraded or patched. There is no message to fix this problem in terms of data structure.

The above is all the content of this article. I hope you will like it.