PHP Kernel exploration: Hash table collision Attack Principle _php Example

Source: Internet
Author: User
Tags bitwise data structures http post http request pow zend

The following illustrated by the way to show you the PHP kernel exploration: Hash table collision attack principle.

The recent hash Table collision Attack (Hashtable collisions as DOS attack) has been the subject of constant mention, and various languages have been recruited. This article unifies the PHP kernel source code, chats this kind of attack the principle and realizes.

The basic principle of hash table collision attack

A hash table is a highly efficient data structure, and many languages implement a hash table internally. A hash table in PHP is an extremely important data structure that is used not only to represent the array data type, but also to store context information within the Zend virtual machine (the variables and functions of the execution context are stored using a hash table structure).

Ideally, the time complexity of the hash table insertion and lookup operation is O (1), any data item can compute a hash value (key) in a time unrelated to the hash table length, and then navigate to a bucket within a constant time (term bucket, which represents a location in the hash table). Of course, this is ideally, because any hash table length is limited, there must be a different data item with the same hash value, at this time different data items are set to the same bucket, called collision (collision). The implementation of the hash table needs to solve the collision problem there are generally two ways to deal with collisions, the first of which is to set the collision data into other buckets, such as linear probes, based on some principle--if the data collides at the time of insertion, look for the bucket behind the bucket and place it in the first bucket that is not in use. The second strategy is that each bucket is not a location that can hold only a single item of data, but rather a data structure (such as a linked list or red-black tree) that can hold multiple data, and all collisions are organized in the form of a data structure.

Regardless of which collision resolution strategy is used, the time complexity of inserting and finding operations is no longer O (1). For example, you cannot go through the key to the end of the bucket, you must also compare the original key (that is, the key before hashing) is equal, if not equal, you want to use the same algorithm as the insert to continue to find, until a matching value or confirm that the data is not in the hash table

PHP is the use of a single chain of data to store collisions, so in fact, the average PHP hash table lookup complexity is O (l), where L is the average length of the bucket list, and the worst complexity is O (N), when all the data collision, the hash table degenerated into a single linked list. Figure out the diagram of a normal hash table and a degraded hash table in PHP.

Hash table collision Attack is through the careful construction of data, so that all the data collision, man-made hash table into a degenerate single linked list, at this time the hash table all kinds of operations have elevated a order of magnitude, so will consume a lot of CPU resources, resulting in the system can not respond to requests quickly, so as to achieve denial of service attacks (DoS) The purpose.

As you can see, the premise of a hash collision attack is that the hash algorithm is particularly easy to find collisions, and if it's MD5 or SHA1 it's basically over, and fortunately (and unfortunately) most programming languages use hashing algorithms that are simple (for efficiency reasons), So the attack data can be constructed with effortless force. The next section will find a way to attack a hash table collision attack using PHP by analyzing the Zend related kernel code.
The internal implementation data structure of Zend hash table
In PHP, a structure called backet is used to denote a bucket, and all barrels of the same hash value are organized into a single linked list. The hash table is represented using the Hashtable structure. Related source code under the ZEND/ZEND_HASH.H:

 typedef struct BUCKET {ulong H;
 /* Used for numeric indexing * * UINT nkeylength;
 void *pdata;
 void *pdataptr;
 struct bucket *plistnext;
 struct bucket *plistlast;
 struct bucket *pnext;
 struct bucket *plast; Char arkey[1];
/* Must be the last element */} Bucket;
 typedef struct _HASHTABLE {UINT ntablesize;
 UINT Ntablemask;
 UINT Nnumofelements;
 ULONG Nnextfreeelement; Bucket *pinternalpointer;
 /* Used for element traversal * * Bucket *plisthead;
 Bucket *plisttail;
 Bucket **arbuckets;
 dtor_func_t Pdestructor;
 Zend_bool persistent;
 unsigned char napplycount;
Zend_bool bapplyprotection;
#ifZEND_DEBUG int inconsistent; 
#endif} HashTable; 

The field name clearly indicates its purpose, so no explanation is too much. Focus on the following fields: the "H" in bucket is used to store the Ntablemask in the original key;hashtable is a mask, typically set to NTableSize-1, and is closely related to the hashing algorithm, which is discussed later in the hash algorithm Arbuckets points to an array of pointers, where each element is a header pointer to a bucket list.
Hashing algorithm
The PHP hash table has a minimum capacity of 8 (2^3), the maximum capacity is 0x80000000 (2^31), and the Integer Power Circle of 2 (that is, the length will automatically expand to 2 of the integer power, such as 13 elements of the hash table length of 16; 100 elements of the hash table length of 128). Ntablemask is initialized to a hash table length (rounded) minus 1. The specific code in the ZEND/ZEND_HASH.C _zend_hash_init function, here intercepts the section related to this article and adds a few annotations.

Zend_api Int_zend_hash_init (HashTable *ht, uintnsize, hash_func_t phashfunction, dtor_func_t pdestructor, Zend_bool Persistent zend_file_line_dc)
{
 uinti = 3;
 Bucket **tmp;
 Set_inconsistent (HT_OK);
 An integer power round
 if (nsize >= 0x80000000) of length to 2 {/
 * prevent overflow/ht->ntablesize
 = 0x80000000;
 } else{while
 ((1U << i) < nsize) {
  i++;
 }
 Ht->ntablesize = 1<< i;
 }
 Ht->ntablemask = ht->ntablesize-1;
 /* Here are some code omitted ... * *
 returnsuccess;
}

It is worth mentioning that PHP to 2 of the whole number of power to round the whole method is very clever, you can recite it when needed to use.

The hash algorithm for Zend Hashtable is exceptionally simple:

Copy Code code as follows:

Hash (key) =key&ntablemask

That is, simply the original key of the data and the hashtable of the Ntablemask to the bitwise and can.

If the original key is a string, first use the TIMES33 algorithm to convert the string to an integer and Ntablemask bitwise with.

Copy Code code as follows:

Hash (strkey) =time33 (strkey) &ntablemask

Here is the code to find a hash table in the Zend Source:

 Zend_api int Zend_hash_index_find (consthashtable *ht, ulong h, void **pdata) {UINT Nind
 Ex
 Bucket *p;
 Is_consistent (HT);
 nindex = h & ht->ntablemask;
 p = ht->arbuckets[nindex];
  while (P!= NULL) {if (p->h = h) && (p->nkeylength = 0)) {*pdata = p->pdata;
 returnsuccess;
 } p = p->pnext;
} returnfailure;
 } zend_api int Zend_hash_find (consthashtable *ht, Constchar *arkey, uint nkeylength, void **pdata) {ulong H;
 UINT Nindex;
 Bucket *p;
 Is_consistent (HT);
 h = Zend_inline_hash_func (Arkey, nkeylength);
 nindex = h & ht->ntablemask;
 p = ht->arbuckets[nindex]; while (P!= NULL) {if (p->h = h) && (p->nkeylength = = nkeylength)) {if (!memcmp, P->arkey, Arkey
  Length)) {*pdata = p->pdata;
  returnsuccess;
 }} p = p->pnext;
} returnfailure; }

Where Zend_hash_index_find is used to find the integer key, Zend_hash_find is used to find the string key. Logic is basically consistent, but the string key will be zend_inline_hash_func to integer key,zend_inline_hash_func encapsulate the TIMES33 algorithm, the specific code is not posted.
Attack Base attack
Know the PHP internal hash table algorithm, you can use its principle to construct the data for the attack. One of the easiest ways to do this is to create collisions using masking rules. As mentioned above, the length of the Zend Hashtable ntablesize will be rounded to the integer of 2, assuming we construct a 2^16 hash table, then the ntablesize binary is represented as: 1 0000 0000 0000 0000, and Ntablemask = NTableSize-1 is: 0 1111 1111 1111 1111. Next, you can create enough data with 0 as the initial value, with 2^16 as the step, and you can get the following conjecture:

0000 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0

0001 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0

0010 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0

0011 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0

0100 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0

......

In general, as long as the 16 digits are guaranteed to be 0, the hash value that is located with the mask is all colliding at position 0.

Here is an attack code written using this principle:

<?php
$size = POW (2);
$startTime = Microtime (true);
$array = Array ();
for ($key = 0, $maxKey = ($size-1) * $size; $key <= $maxKey; $key + = $size) {
 $array [$key] = 0;
}
$endTime = Microtime (true);
Echo $endTime-$startTime, "seconds";

This code took nearly 88 seconds on my VPS (single cpu,512m memory), and CPU resources were almost exhausted during this time:

A common hash table of the same size is inserted for only 0.036 seconds:

<?php
$size = POW (2);
$startTime = Microtime (true);
$array = Array ();
for ($key = 0, $maxKey = ($size-1) * $size; $key <= $size; $key + + 1) {
 $array [$key] = 0;
}
$endTime = Microtime (true);
Echo $endTime-$startTime, "seconds";

It can be shown that the second code inserts n elements at an O (n) level, while the first attack code takes O (n^2) time to insert n elements.

Post attack

Of course, it is generally difficult to encounter an attacker who can directly modify the PHP code, but the attacker can still indirectly construct a hash table to attack. For example, PHP constructs the data in the received HTTP POST request as $_post, which is an array, internally through Zend Hashtable, so that an attacker can achieve the purpose of an attack by constructing a POST request that contains a large number of collision keys. The practice is no longer demonstrated.

Protection against post attacks

For the post-mode hash collision attack, the current PHP protection is to control the number of post data. A configuration entry Max_input_vars was added to the >=php5.3.9 version to identify the maximum number of parameters to receive for an HTTP request, by default 1000. Therefore php5.3.x users can avoid hash collision attacks by upgrading to 5.3 9来. 5.2.x users can use this patch:http://www.laruence.com/2011/12/30/2440.html.

Another way to do this is to process at the Web server level, such as restricting the size of the HTTP request body and the number of parameters, which is now the most temporary treatment. The specific approach is related to different Web servers and is no longer detailed.

Other protection

The protection method above simply limits the number of post data and does not solve the problem completely. For example, if a post field is a JSON data type that is Json_decode by PHP, it can be used to make a large JSON attack data so as to achieve the attack. Theoretically, as long as the data in the PHP code that constructs an array depends on external input, this problem can be caused, so a thorough solution is to start with the implementation of the Zend bottom Hashtable. In general there are two ways, one is to limit the maximum length of each bucket list; second, the use of other data structures such as red-black trees to replace the linked list of the collision hash (does not solve the Greek collision, but to mitigate the impact of the attack, the operation of N data from O (n^2) to O (Nlogn), at the cost of ordinary circumstances close to O (1) The operation becomes O (Logn)).

The most current use is still post data attacks, so it is recommended that PHP in the production environment be upgraded or patched. As for fixing the problem from the data structure level, there is no news yet.

The above is the entire content of this article, I hope you like.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.