PHP Kernel exploration: Hash table collision Attack Principle _php Instance

Source: Internet
Author: User
Here is an illustrated way to show you the PHP kernel exploration: Hash table collision attack principle.

Recently, the topic of hash Table collision attacks (Hashtable collisions as DOS attack) has been raised, and various languages have been recruited. This article unifies the PHP kernel source code, chats this kind of attack the principle and the realization.

The basic principle of hash table collision attack

A hash table is a very efficient data structure, and many languages implement a hash table internally. A hash table in PHP is an extremely important data structure that is used not only to represent the array data type, but also to store contextual information inside the Zend virtual machine (the variables and functions that execute the context are stored using the hash table structure).

Ideally, the time complexity of the hash table insert and find operation is O (1), and any data item can calculate a hash value (key) in a time independent of the hash table length, and then navigate to a bucket within a constant time (the term bucket, which represents a position in the hash table). Of course this is ideal, because the length of any hash table is limited, so there must be a case where different data items have the same hash value, when different data items are set to the same bucket, called collisions (collision). The implementation of the hash table needs to solve the collision problem, the collision solution has two general ideas, the first is to set the collision data to other barrels according to some principle, such as linear detection-if the data in the insertion of a collision, then the order to find the bucket behind the bucket, put it into the first unused bucket The second strategy is that each bucket is not a location that can hold a single data item, but a data structure that can hold multiple data (for example, a linked list or a red-black tree), and all the colliding data is organized in the form of a data structure.

Regardless of which collision resolution strategy is used, the time complexity of the insert and find operations is no longer O (1). To find, for example, cannot be positioned through the key to the end of the bucket, you must also compare the original key (that is, the key before the hash) is equal, if not equal, you want to use the same algorithm as insert to continue the lookup until a matching value is found or the confirmation data is not in the hash table.

PHP uses a single-linked list to store collisions, so the average lookup complexity of the PHP hash table is O (l), where L is the average length of the bucket list, and the worst-case complexity is O (N), when all the data collide, and the hash table degrades into a single-linked list. PHP in a normal hash table and degenerate hash table.

Hash table collision Attack is through the careful construction of data, so that all the data collision, artificial hash table into a degenerate single-linked list, when the hash table various operations to increase the time of an order of magnitude, it will consume a lot of CPU resources, resulting in the system can not quickly respond to requests to achieve denial of service attacks (DoS) The purpose.

As you can see, the premise of a hash collision attack is that the hashing algorithm is particularly prone to finding collisions, and if it's MD5 or SHA1 that's basically no good, fortunately (and unfortunately) the hashing algorithms used in most programming languages are very simple (this is for efficiency reasons), The attack data can therefore be constructed with minimal effort. The next section will analyze Zend related kernel code to find out how to attack PHP against a hash table collision.
Internal implementation data structure of Zend hash table
PHP uses a struct called backet to represent buckets, and all buckets of the same hash value are organized into a single linked list. The hash table is represented using the Hashtable struct. The relevant source code under ZEND/ZEND_HASH.H:

typedef struct bucket {ulong h;/* used for numeric indexing */uint nkeylength; void *pdata; void *pdataptr; struct bucket *plistnext; struct bucket *plistlast; struct bucket *pnext; struct bucket *plast; Char arkey[1]; /* must is last element */} bucket;typedef struct _hashtable {uint ntablesize; uint ntablemask; uint nnumofelements; Ulon G Nnextfreeelement; Bucket *pinternalpointer; /* Used for element traversal */Bucket *plisthead; Bucket *plisttail; Bucket **arbuckets; dtor_func_t Pdestructor; Zend_bool persistent; unsigned char napplycount; Zend_bool bapplyprotection; #ifZEND_DEBUG int inconsistent; #endif} HashTable; 

The field name clearly indicates its purpose, so it does not explain too much. Focus on the following fields: the "H" in the bucket is used to store the Ntablemask in the original key;hashtable is a mask, which is generally set to nTableSize-1, and is closely related to the hashing algorithm, which is described in detail later in the discussion of the hashing algorithm. ; arbuckets points to an array of pointers, where each element is a head pointer to a bucket list.
Hashing algorithm
The PHP hash table has a minimum capacity of 8 (2^3), a maximum capacity of 0x80000000 (2^31), and a full power round to 2 (that is, the length is automatically extended to 2 of the entire power, such as 13 elements of the hash table length is 16, 100 elements of the hash table length is 128). Ntablemask is initialized to hash table length (rounded) minus 1. The specific code is in ZEND/ZEND_HASH.C's _zend_hash_init function, where the section related to this article is truncated with a few comments.

Zend_api Int_zend_hash_init (HashTable *ht, uintnsize, hash_func_t phashfunction, dtor_func_t pdestructor, Zend_bool Persistent zend_file_line_dc) {uinti = 3; Bucket **tmp; Set_inconsistent (HT_OK); The full power of length to 2 round if (nSize >= 0x80000000) {/* Prevent overflow */ht->ntablesize = 0x80000000;} else{while (1U <&L T i) < nSize) {  i++;} ht->ntablesize = 1<< i;} ht->ntablemask = ht->ntablesize-1; /* Omit several codes here ... */returnsuccess;}

It is worth mentioning that PHP to 2 of the full number of power rounding method is very clever, can be back down when needed to use.

Zend Hashtable's hashing algorithm is exceptionally simple:

Copy the Code code as follows:
Hash (key) =key&ntablemask

That is, the original key of the data and the ntablemask of Hashtable can be simply bitwise AND.

If the original key is a string, first use the TIMES33 algorithm to convert the string to reshape and then to the Ntablemask bitwise with.

Copy the Code code as follows:
Hash (strkey) =time33 (strkey) &ntablemask

Here is the code for finding a hash table in the Zend Source:

zend_api int zend_hash_index_find (consthashtable *ht, ulong h, void **pData) { UINT NIndex; Bucket *p; Is_consistent (HT); NIndex = h & ht->ntablemask; p = ht->arbuckets[nindex];  while (P! = NULL) {if ((p->h = = h) && (p->nkeylength = = 0)) {*pdata = p->pdata; returnsuccess; } p = p->pnext; } returnfailure;} Zend_api int Zend_hash_find (consthashtable *ht, Constchar *arkey, uint nkeylength, void **pdata) {ulong H; uint NIndex; Bucket *p; Is_consistent (HT); h = Zend_inline_hash_func (Arkey, nkeylength); NIndex = h & ht->ntablemask; p = ht->arbuckets[nindex]; while (P! = NULL) {if ((p->h = = h) && (p->nkeylength = = nkeylength)) {if (!memcmp (P->arkey, Arkey, Nkeyle  Ngth)) {*pdata = p->pdata;  returnsuccess; }} p = p->pnext; } returnfailure;} 

Where Zend_hash_index_find is used to find the integer key, Zend_hash_find is used to find the string key. The logic is basically the same, but the string key will be zend_inline_hash_func to integer key,zend_inline_hash_func encapsulated TIMES33 algorithm, the specific code is not posted.
Attacking basic attacks
Knowing the algorithm of the PHP internal hash table, you can use its principle to construct the data for the attack. One of the simplest methods is to create collisions using a mask pattern. As mentioned above, the length of the Zend Hashtable ntablesize will be rounded to 2 of the whole power, assuming we construct a 2^16 hash table, then ntablesize binary representation is: 1 0000 0000 0000 0000, and Ntablemask = NTableSize-1:0 1111 1111 1111 1111. Next, you can use 0 as the initial value, with 2^16 as the step size, to produce enough data, you can get the following speculation:

0000 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0

0001 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0

0010 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0

0011 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0

0100 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0

......

In general terms, as long as the 16 bits are guaranteed to be 0, then the hash value after the mask is located is all colliding at position 0.

Here is an attack code written using this principle:

<?php$size= Pow (2, +); $startTime = Microtime (true), $array = Array (), for ($key = 0, $maxKey = ($size-1) * $size; $key <= $maxKey; $key + = $size) {$array [$key] = 0;} $endTime = Microtime (true); Echo $endTime-$startTime, "seconds";

This code took nearly 88 seconds to complete on my VPS (single cpu,512m memory), and during this time the CPU resources were almost exhausted:

A common hash table with the same size is inserted for only 0.036 seconds:

<?php$size= Pow (2, +); $startTime = Microtime (true), $array = Array (), for ($key = 0, $maxKey = ($size-1) * $size; $key <= $size; $key + = 1) {$array [$key] = 0;} $endTime = Microtime (true); Echo $endTime-$startTime, "seconds";

It can be proved that the second code inserts n elements in the O (n) level, while the first attack code requires O (n^2) time to insert n elements.

Post attack

Of course, it is generally difficult to encounter situations where an attacker can directly modify the PHP code, but there are ways in which an attacker can indirectly construct a hash table to attack. For example, PHP constructs the data in the received HTTP POST request as $_post, which is an array that is internally represented by the Zend Hashtable, so that the attacker can achieve the purpose of the attack by constructing a POST request that contains a large number of collision keys. Specific practices are no longer demonstrated.

Protection against post attacks

For the post mode hash collision attack, the current PHP protection measures are to control the number of post data. In the >=php5.3.9 version, a configuration item max_input_vars was added to identify the maximum number of parameters to receive for an HTTP request at the default of 1000. So php5.3.x users can upgrade to 5.3.9来 Avoid hash collision attacks. 5.2.x users can use this patch:http://www.laruence.com/2011/12/30/2440.html.

Another protection method is to handle at the Web server level, such as restricting the size of the HTTP request body and the number of parameters, which is now the most temporary processing scheme. Specific practices are related to different Web servers and are no longer detailed.

Other protection

The above protection method only limits the number of post data, but not completely solve the problem. For example, if a post field is a JSON data type that will be json_decode by PHP, it can be exploited to construct a large JSON attack data. Theoretically, as long as the data in the PHP code that constructs an array is dependent on the external input, this problem can be caused, so a thorough solution is done from the implementation of the Zend bottom Hashtable. Generally there are two ways, one is to limit the longest length of the list of each bucket, and the other is to use other data structures such as red black tree instead of the linked list organization collision hash (does not solve the SID collision, just mitigate the attack, the operation time of N data from O (n^2) to O (Nlogn), the cost is generally close to O ( are changed to O (Logn)).

Post data attacks are still the most used, so it is recommended that PHP for the production environment be upgraded or patched. As for fixing this problem from the data structure level, there is no news.

The above is the whole content of this article, I hope you like it.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.