Text/Xuan soul 1. collision between Hash and Hash
Hash is a type of output that converts an input of any length into an output of a fixed length. An output of a fixed length can represent the input in "Actual Application Scenario. Hash Functions are usually translated into Hash functions. Hash is usually used to verify information consistency.
The implementation of Hash functions is diverse, and the SHAx Series and MDx series are the most widely used in the security field. Hash functions are also divided into Hash Functions with and without keys. Generally, Hash functions are Hash functions without keys. The implementation principle of the Hash algorithm is not discussed here.
Supplement: HashFunction Definition and Properties

Due to the fixed Hash length output feature, multiple different inputs may produce the same output. If the values of the hash functions of the two input strings are the same, the two strings are called Collision ). In the theoretical scope, an output string corresponds to an infinite number of input strings, so collision is inevitable.
If a collision is found, it means that we can destroy information consistency without being noticed by the receiver. The process of searching for the Hash collision value specified in the input is called "Hash cracking ". It should be noted that the Hash function must be irreversible, so there is no cracking from the Hash value to the original input (this does not include bruteforce cracking. Using a rainbow table is the best method for bruteforce cracking, however, it still cannot be ensured that the cracked data is raw data ). Poorly designed Hash algorithms make it easy to find collision values, for example (refer to: http://www.laruence.com/2011/12/30/2435.html ):
"
In PHP, if the key value is a number, the input value in Hash is the number itself. Generally, index & tableMask. tableMask is used to ensure that the numeric index does not exceed the number of elements allowed by the array, that is, the number of Arrays1.
The Hashtable size of PHP is an exponential value of 2. For example, if you store an array of 10 elements, the actual size of the array is 16. If you store 20 elements, the actual size is 32, the actual size of 63 words is 64. when the number of elements you store is greater than the current maximum number of elements in the array, PHP will resize the array and reHash it.
Now let's assume that we want to store 64 elements (the middle may be resized, but we only need to know that the final array size is 64 and the corresponding tableMask is 63: 0111111 ), if the key value of the element we saved for the first time is 0, then the hash value is 0, the second time we saved 64, and the hash (1000000 & 0111111) value is 0, we use 128 for the third time, and 192 for the fourth time...
"
A "excellent" hash function f must meet the following three conditions:
 Finding x for any y makes f (x) = y very difficult.
 Given x1, finding x2 makes f (x1) = f (x2) very difficult.
 Finding x1 and x2 makes f (x1) = f (x2) very difficult.
The above "very difficult" means that there cannot be any other faster method except enumeration. Almost all collision search methods start with article 3, that is, finding two different inputs to get the same output.
There are also many ways to look for hash collisions, such as birthday attacks for general attacks and the modulo difference method. These methods are used to attack the intermediate encounter attacks of hash schemes with a group chain structure, modify group attacks based on Moduloarithmetic hash functions. Here I will briefly introduce the following four methods for finding a collision:
 Equals substrings
 Birthday attack method
 Intermediate Method
 Modulus difference method
The "Equality Substring method" is used to construct a collision when some hash functions share the same string combination with the same hash value at the same position in the context. For example, if F ("string1") = F ("string2"), then in the string "aaastring1bbb" and the string "aaastring2bbb", "string1" and "string2" have the same hash value. For this feature, we can construct any number of collisions. For example, the hash values of "ly" and "nz" are the same, the hash values of "lyly", "nznz", "lynz", and "nzly" are the same.
The birthday attack method does not use the structure of the Hash function or any weak algebra. It only depends on the length of the message digest, that is, the length of the Hash value. This attack puts forward a necessary security condition for the Hash function, that is, the message digest must be long enough. The term "birthday attack" comes from the socalled birthday problem. How many students should have at least two students in a classroom so that the probability of their birthdays on the same day is no less than 1/2? The answer to this question is 23. The following describes the methods of birthday attacks in detail.
SetH:X> YIsHashFunction,XAndYAre limited, and X > = 2  Y , Note X  = m, Y  = n. Apparently there are at leastNThe question is how to find these collisions. A natural method is random selection.KDifferent ElementsX1, x2, X3,••••••, XKεX,ComputingYi = H (XI),1 <= I <= KAnd then determine whether there is a collision. This process is similarKRandomly throw a ball.NCheck whether a box contains at least two balls.KBalls correspondKRandom NumberX1, x2, X3,••••••, XK,NBoxes correspondYInNPossible elements. We will use this method to find the lower bound of the collision probability, which only depends onKAndNAnd does not depend onM. 
Because we care about the lower bound of collision probability, we can assume thatYεY, Yes H1 (y) ≈M/N. This assumption is reasonable because if the original image setH1 (y)εY)If it is not approximately equal, the probability of finding a collision increases. 
Because the original image setH1 (y)εY)AndXI (1 <= I <= K)Is randomly selected, so you can setYI = h (xi),1 <= I <= kAsYRandom Element (Yi (1 <= I <= k)). But computingKRandom ElementsY1, y2,••••••YkεYIt is easy to have different probabilities. ConsiderY1, y2,••••••Yk.Y1Optional;Y2=Y1The probability is11/n;Y3=Y1, y2The probability is12/n;••••••;Yk=Y1, y2,••••••Yk1The probability is1(k1)/n. 
Therefore, the probability of no collision is (11/n)(12/n)••••••(1(k1)/n). IfXIs a relatively small real number, then1u≈ExThis estimation can be introduced below:Ex = ≥ + x2/2! X3/3! +••••••. It is estimated that there is no collision probability (11/n)(12/n)••••••(1(k1)/n) About1ek (k1)/2N. We set ε to have at least one collision probability, then ε ≈1ek (k1)/2N.K2k≈NLN (1/(1ε) 2). RemoveKWe haveK2≈NLN (1/(1ε) 2), That isK≈SQRT (2nln (1/(1ε) 2 )). 
If we take ε= 0.5, ThenK≈1.17 SQRT (N). This indicates that onlySQRT (N)ItemsXThe random element50%The probability of a collision. Note that different choices of ε will lead to a different constant factor,KAndSQRT (N)Still proportional. 
IfXIs a collection of all students in a classroom,YIs a nonleap year365Day collection,H (x)Indicates the studentXAt this momentN = 365, ε= 0.5,K≈1.17 SQRT (N)Yes,K≈22.3. Therefore, the answer to this birthday question is23. 
The birthday attack implies a lower limit of the message digest length. One40The message digest with a bit length is insecure because it is only used220(About 1 million) randomHashAt least1/2Find a collision. To defend against birthday attacks, we recommend that the message digest length be at least128BITs, the birthday attack requires264TimesHash. SecureHashSelect the standard output Length160BITs are for this reason. 
The "encounter in the middle" method is a form of deformation for birthday attacks. It does not compare hash values, but compares the intermediate variables in the chain. This type of attack is mainly applicable to the hash scheme with a group chain structure. The basic principle of intercept attack is to divide the message into two parts. The first part of the spoofed message will generate R1 variables from the initial test value to the intermediate stage; the second part of the forged message is gradually returned from the hash result to the intermediate stage to generate R2 variables. In the intermediate stage, a matching probability is the same as the probability of successful birthday attacks.
The "modulo Difference Method" is a hash analysis method proposed by Professor Wang Xiaoyun of Shandong University, with high execution efficiency. For the modulo difference algorithm, see http://wenku.baidu.com/view/f0bf451414791711cc7917b5.html? From = related.
2. HashTable and HashTable degradation
We have learned about the features of the Hash function and the feasibility of the Hash attack. I have not provided detailed details about the attack algorithm or specific code implementation here, because these are not the focus of this article, if I have the opportunity, I will discuss various Hash attack solutions and code implementation. Next let's take a look at the data structure closely related to the Hash functionHashTable (reference: http://en.wikipedia.org/wiki/Hash_table ).
HashTable (hash table) is a data structure for direct access based on keyvalue.
HashTable combines the twoway advantages of linked lists and arrays, So adding, deleting, modifying, and querying operations are fast. In HashTable, the Hash function obtains the Value address (subscript of the array) through the Key ), this is different from the Hash algorithm that we mentioned earlier to ensure data integrity in the security field. Because the array subscript is returned, the Hash value must be an integer. Therefore, the standard Hash algorithm in the information security field is useless here. Different Application Development Platforms generally implement their own Hash algorithms, or use Hash Functions commonly used in HashTable construction algorithms, for example, the DJBX33A algorithm.
As we mentioned earlier, Hash cannot avoid Hash collision. How can HashTable solve the collision problem? There are two common practices: Open addressing and link.
The Open address method is used to calculate the hash of a key and find that the target address already has a value, that is, a conflict occurs, in this case, use the corresponding function to find the address next to this address until there is no conflict. This method is commonly used for linear detection, secondary detection, and hash. A bad solution for this method is that, when a conflict occurs, it will be found in a later address space, in this way, a key hash result may also be the address space it puts in. In this way, two keys with nonsynonyms may conflict.
The Separate chaining method is composed of arrays and linked lists. When a conflict occurs, you only need to add it to the corresponding linked list. 12.
Figure 12
Compared with the open address method, the link method has the following advantages:
① The link method is simple to deal with conflicts without accumulation, that is, nonsynonyms will never conflict, so the average search length is short;
② Because the Node space on each linked list in the Link Method is dynamically applied, it is more suitable for situations where the table length cannot be determined before table creation;
③ In order to reduce conflicts, the open addressing method requires a small filling factor α, which wastes a lot of space when the node size is large. In the link method, α ≥ 1 is recommended, and when the knots are large, the pointer fields added in the zipper method are negligible, saving space;
④ Delete nodes in the hash list constructed by link method is easy to implement. Simply delete the corresponding node on the linked list. For the hash list constructed by the open address method, the space of the deleted node cannot be empty simply when the deleted node is deleted, otherwise, the search path of the synonym node in the hash list is truncated. This is because in various open address methods, empty address units (that is, open addresses) are the conditions for failed search. Therefore, the delete operation is performed on the hash list that uses the open address method to handle conflicts. The delete mark can only be performed on the deleted node, but cannot be deleted.
Of course, the link method also has its disadvantages. The shortcomings of the zipper method are: the pointer requires extra space, so when the node size is small, the open address method is more spacesaving, if we use the saved pointer space to expand the size of the hash, the filling factor can be reduced, which reduces the conflict in the open addressing method and improves the average search speed.
Taking the link method as an example, if the inserted values collide, then HashTable will eventually become a linked list, which is called HashTable degradation. 13.
Figure 13
After HashTable degrades to a linked list, its performance will drop sharply.
3. DoS Attacks
HashTable has applications in all Web application frameworks. The server stores the parameters submitted by each request to the Web application in HashTable for background code calls. For example, in Asp. in the. NET application, we use Request. form [key] and Request. queryString [key] to obtain the parameters submitted by the client. The parameters are stored in HashTable. we pass in the parameter name as the Key and convert it to the array subscript of the corresponding Value through the Hash function, then the Value is returned.
In normal application scenarios, there is no problem. Now let's go back to the HashTable degradation problem mentioned above, if the client obtains a large number of collisions through a Hash attack based on the Hash function used by the Web application framework, HashTable will be degraded into a linked list, the server may take 10 minutes or even several hours to process a request. a pc can handle a server without distributed attacks. Of course, the prerequisite for successful attacks is that the Hash mechanism used by the Web application framework has vulnerabilities. If such a vulnerability exists, attackers can easily launch DoS attacks.
Next, let's take a look at the defensive capabilities of popular Web frameworks against HashTable degradation in the real world.(Reference content:Http://www.nruns.com/_downloads/advisory28122011.pdf)
3.1 PHP5
The HashTable function of PHP5 is DJBX33A.
DJBX33AAlgorithm, also calledTime33Algorithm, which isPhp,Apache,,Perl,BsddbBy defaultHashAlgorithm. 
The following code demonstratesDjbx33aBasic Idea of Algorithms Uint32_t time33 (char const * STR, int Len) { Unsigned long hash = 0; For (int I = 0; I <len; I ++ ){ Hash = hash * 33 + (unsigned long) str [I]; } Return hash; } 
Why?33This number has the following explanation:All odd numbers between 1 and 256 can reach an acceptable hash distribution, with an average distribution of about 86%. Among them, numbers 127,129, 17, and have a greater advantage in the face of a large number of hash operations, that is, these numbers can replace multiplication with bitwise operations with addition and subtraction, this increases the computing speed. Not allDjbx33aAll algorithms are used.33As a multiple,For example, Ngix uses time31 and Tokyo Cabinet uses time37. 
PHP versionDjbx33aThe algorithm is as follows: Inline unsigned time33 (char const * str, int len) { Unsigned long hash = 5381; /* Variant with the hash unrolled eight times */ For (; len> = 8; len= 8 ){ Hash = (hash <5) + hash) + * str ++; Hash = (hash <5) + hash) + * str ++; Hash = (hash <5) + hash) + * str ++; Hash = (hash <5) + hash) + * str ++; Hash = (hash <5) + hash) + * str ++; Hash = (hash <5) + hash) + * str ++; Hash = (hash <5) + hash) + * str ++; Hash = (hash <5) + hash) + * str ++; } Switch (len ){ Case 7: hash = (hash <5) + hash) + * str ++;/* fallthrough ...*/ Case 6: hash = (hash <5) + hash) + * str ++;/* fallthrough ...*/ Case 5: hash = (hash <5) + hash) + * str ++;/* fallthrough ...*/ Case 4: hash = (hash <5) + hash) + * str ++;/* fallthrough ...*/ Case 3: hash = (hash <5) + hash) + * str ++;/* fallthrough ...*/ Case 2: hash = (hash <5) + hash) + * str ++;/* fallthrough ...*/ Case 1: hash = (hash <5) + hash) + * str ++; break; Case 0: break; } Return hash; } 
For the DJBX33A algorithm, we can use the "equals substring method" mentioned above to locate the collision and conduct attacks.
Currently, PHP officially recommends that you configure the maximum length of Form submission to defend against this attack.
3.2 ASP. NET
Asp. NET uses the Request. Form object to obtain the variables submitted by the Form. The internal Hash function is DJBX33X (Dan Bernstein's times 33, XOR ).
The idea of the DJBX33X algorithm is as follows:
Static ulong DJBX33X (char * arKey, uint nKeyLength)
{
Ulong h= 5381;
Char * arEnd = arKey + nKeyLength;
While (arKey <arEnd ){
H + = (h <5 );
H ^ = (ulong) * arKey ++;
}
Return h;
}
In view of the characteristics of the DJBX33X algorithm, we can use the aforementioned method of intercept attack to find a collision.
Microsoft has released a patch for this vulnerability. If you are worried that this vulnerability may cause trouble to your website, update the patch.
Java 3.3
Java's Hash function is a transformation to DJBX33A (using 31 instead of 33, and the initial value is 0 rather than 5381), but we can still use the equals substring method to obtain the collision of the Hash function.
Javabased Tomcat servers have such vulnerabilities.
3.4 others
Python, Ruby, and V8 have the same vulnerability.