24-What is a hash collision attack?

Source: Internet
Author: User

24-What is a hash collision attack?

Recently, the topic of hash table collision attacks (Hashtable collisions as DOS attack) has been raised, and various languages have been recruited. This article unifies the PHP kernel source code, chats this kind of attack the principle and the realization.

The basic principle of hash table collision attack

A hash table is a very efficient data structure, and many languages implement a hash table internally. A hash table in PHP is an extremely important data structure that is used not only to represent the array data type, but also to store contextual information inside the Zend virtual machine (the variables and functions that execute the context are stored using the hash table structure).

Ideally, the time complexity of the hash table insert and find operation is O (1), and any data item can calculate a hash value (key) in a time independent of the hash table length, and then navigate to a bucket within a constant time (the term bucket, which represents a position in the hash table). Of course this is ideal, because the length of any hash table is limited, so there must be a case where different data items have the same hash value, when different data items are set to the same bucket, called collisions (collision). The implementation of the hash table needs to solve the collision problem, the collision solution has two general ideas, the first is to set the collision data to other barrels according to some principle, such as linear detection-if the data in the insertion of a collision, then the order to find the bucket behind the bucket, put it into the first unused bucket The second strategy is that each bucket is not a location that can hold a single data item, but a data structure that can hold multiple data (for example, a linked list or a red-black tree), and all the colliding data is organized in the form of a data structure.

Regardless of which collision resolution strategy is used, the time complexity of the insert and find operations is no longer O (1). To find, for example, cannot be positioned through the key to the end of the bucket, you must also compare the original key (that is, the key before the hash) is equal, if not equal, you want to use the same algorithm as insert to continue the lookup until a matching value is found or the confirmation data is not in the hash table.

PHP uses a single-linked list to store collisions, so the average lookup complexity of the PHP hash table is O (l), where L is the average length of the bucket list, and the worst-case complexity is O (N), when all the data collide, and the hash table degrades into a single-linked list. PHP in a normal hash table and degenerate hash table.



Hash table collision Attack is through the careful construction of data, so that all the data collision, artificial hash table into a degenerate single-linked list, this time the hash table various operations increased by an order of magnitude, and therefore consume a lot of CPU resources, resulting in the system can not quickly respond to requests, To achieve the purpose of denial-of-service attacks (DoS).

As you can see, the premise of a hash collision attack is that the hashing algorithm is particularly prone to finding collisions, and if it's MD5 or SHA1 that's basically no good, fortunately (and unfortunately) the hashing algorithms used in most programming languages are very simple (this is for efficiency reasons), The attack data can therefore be constructed with minimal effort. The next section will analyze Zend related kernel code to find out how to attack PHP against a hash table collision.

Internal implementation of the Zend hash table

PHP uses a struct called backet to represent buckets, and all buckets of the same hash value are organized into a single linked list. The hash table is represented using the Hashtable struct. The relevant source code under ZEND/ZEND_HASH.H:

typedef struct bucket {    ulong h;                        /* Used for numeric indexing */    uint nKeyLength;    void *pData;    void *pDataPtr;    struct bucket *pListNext;    struct bucket *pListLast;    struct bucket *pNext;    struct bucket *pLast;    char arKey[1]; /* Must be last element */} Bucket;typedef struct _hashtable {    uint nTableSize;    uint nTableMask;    uint nNumOfElements;    ulong nNextFreeElement;    Bucket *pInternalPointer;   /* Used for element traversal */    Bucket *pListHead;    Bucket *pListTail;    Bucket **arBuckets;    dtor_func_t pDestructor;    zend_bool persistent;    unsigned char nApplyCount;    zend_bool bApplyProtection;#if ZEND_DEBUG    int inconsistent;#endif} HashTable;

The field name clearly indicates its purpose, so it does not explain too much. Focus on the following fields: the "H" in the bucket is used to store the Ntablemask in the original key;hashtable is a mask, which is generally set to ntablesize–1, and is closely related to the hashing algorithm, which is described in detail later in the discussion of the hashing algorithm. ; arbuckets points to an array of pointers, where each element is a head pointer to a bucket list.

Hash algorithm: PHP hash Table minimum capacity is 8 (2^3), the maximum capacity is 0x80000000 (2^31), and to 2 of the full number of power round (that is, the length will automatically expand to 2 of the whole power, such as 13 elements of the hash table length is 16, 100 elements of the hash table length is 128). Ntablemask is initialized to hash table length (rounded) minus 1. The specific code is in ZEND/ZEND_HASH.C's _zend_hash_init function, where the section related to this article is truncated with a few comments.

ZEND_API int _zend_hash_init(HashTable *ht, uint nSize, hash_func_t pHashFunction, dtor_func_t pDestructor, zend_bool persistent ZEND_FILE_LINE_DC){    uint i = 3;    Bucket **tmp;    SET_INCONSISTENT(HT_OK);    //长度向2的整数次幂圆整    if (nSize >= 0x80000000) {        /* prevent overflow */        ht->nTableSize = 0x80000000;    } else {        while ((1U << i) < nSize) {            i++;        }        ht->nTableSize = 1 << i;    }    ht->nTableMask = ht->nTableSize - 1;    /*此处省略若干代码…*/    return SUCCESS;}

It is worth mentioning that PHP to 2 of the full number of power rounding method is very clever, can be back down when needed to use.

Zend Hashtable's hashing algorithm is simple: hash (key) =key&ntablemask

That is, the original key of the data and the ntablemask of Hashtable can be simply bitwise AND. If the original key is a string, first use the TIMES33 algorithm to convert the string to reshape and then with the Ntablemask bitwise with: hash (strkey) =time33 (strkey) &ntablemask

Here is the code for finding a hash table in the Zend Source:

 ZEND_API int zend_hash_index_find (const HashTable *HT, ulong h, void **pdata) {uint NIndex; Bucket *p; Is_consistent (HT); NIndex = h & ht->ntablemask; p = ht->arbuckets[nindex]; while (P! = NULL) {if ((p->h = = h) && (p->nkeylength = = 0)) {*pdata = p->pdata; return SUCCESS; } p = p->pnext; } return FAILURE;} ZEND_API int Zend_hash_find (const HashTable *HT, const char *arkey, uint nkeylength, void **pdata) {ulong H; UINT NIndex; Bucket *p; Is_consistent (HT); h = Zend_inline_hash_func (Arkey, nkeylength); NIndex = h & ht->ntablemask; p = ht->arbuckets[nindex]; while (P! = NULL) {if ((p->h = = h) && (p->nkeylength = = nkeylength)) {if (!memcmp (p-> Arkey, Arkey, nkeylength)) {*pdata = p->pdata; return SUCCESS; }} p = p->pnext; } return FAILURE;} 

Where Zend_hash_index_find is used to find the integer key, Zend_hash_find is used to find the string key. The logic is basically the same, but the string key will be zend_inline_hash_func to integer key,zend_inline_hash_func encapsulated TIMES33 algorithm, the specific code is not posted.

Attack

Knowing the algorithm of the PHP internal hash table, you can use its principle to construct the data for the attack. One of the simplest methods is to create collisions using a mask pattern. As mentioned above, the length of the Zend Hashtable ntablesize will be rounded to 2 of the whole power, assuming we construct a 2^16 hash table, then ntablesize binary representation is: 1 0000 0000 0000 0000, and Ntablemask = Ntablesize–1:0 1111 1111 1111 1111. Next, you can use 0 as the initial value, with 2^16 as the step size, to produce enough data, you can get the following speculation:

0000 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 00001 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 00010 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 00011 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 00100 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0……

In general terms, as long as the 16 bits are guaranteed to be 0, then the hash value after the mask is located is all colliding at position 0. Here is an attack code written using this principle:

<?php$size = pow(2, 16);$startTime = microtime(true);$array = array();for ($key = 0, $maxKey = ($size - 1) * $size; $key <= $maxKey; $key += $size) {    $array[$key] = 0;}$endTime = microtime(true);echo $endTime - $startTime, ‘ seconds‘, "\n";?>

This code took nearly 88 seconds to complete on my VPS (single cpu,512m memory), and CPU resources were almost exhausted during this time.

A common hash table with the same size is inserted for only 0.036 seconds:

<?php$size = pow(2, 16);$startTime = microtime(true);$array = array();for ($key = 0, $maxKey = ($size - 1) * $size; $key <= $size; $key += 1) {    $array[$key] = 0;}$endTime = microtime(true);echo $endTime - $startTime, ‘ seconds‘, "\n";?>

It can be proved that the second code inserts n elements in the O (n) level, while the first attack code requires O (n^2) time to insert n elements.

Of course, it is generally difficult to encounter situations where an attacker can directly modify the PHP code, but there are ways in which an attacker can indirectly construct a hash table to attack. For example, PHP constructs the data in the received HTTP POST request as $_post, which is an array that is internally represented by the Zend Hashtable, so that the attacker can achieve the purpose of the attack by constructing a POST request that contains a large number of collision keys. Specific practices are no longer demonstrated.

Defense

Post attack protection: for post-style hash collision attacks, the current protection of PHP is to control the number of post data. In the >=php5.3.9 version, a configuration item max_input_vars was added to identify the maximum number of parameters to receive for an HTTP request at the default of 1000. So php5.3.x users can upgrade to 5.3.9来 Avoid hash collision attacks. 5.2.x users can use this patch:http://www.laruence.com/2011/12/30/2440.html.

Another protection method is to handle at the Web server level, such as restricting the size of the HTTP request body and the number of parameters, which is now the most temporary processing scheme. Specific practices are related to different Web servers and are no longer detailed.

The above protection method only limits the number of post data, but not completely solve the problem. For example, if a post field is a JSON data type that will be json_decode by PHP, it can be exploited to construct a large JSON attack data. Theoretically, as long as the data in the PHP code that constructs an array is dependent on the external input, this problem can be caused, so a thorough solution is done from the implementation of the Zend bottom Hashtable. Generally there are two ways, one is to limit the longest length of the list of each bucket, and the other is to use other data structures such as red black tree instead of the linked list organization collision hash (does not solve the SID collision, just mitigate the attack, the operation time of N data from O (n^2) to O (Nlogn), the cost is generally close to O ( are changed to O (Logn)).

Post data attacks are still the most used, so it is recommended that PHP for the production environment be upgraded or patched. As for fixing this problem from the data structure level, there is no news.

24-What is a hash collision attack?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.