Editor's note: This article is the BNU of the three students sweet, female geek, csdn and other major websites of the user data was leaked, she wrote this article on the MD5 encryption, published their own views, if the reader wants to discuss with the author further, can be in Sina Weibo @ Attola.
MD5 is one of the most widely used hash algorithms in 1992, presented by MIT's Ronald L. Riverst, evolved from MD4. This algorithm is widely used in Internet Web site user data encryption, can encrypt user password to 128-bit long integer. The database does not explicitly store the user password, but instead MD5 the input password string when the user logs on, and matches the MD5 value stored in the database, thus reducing the risk of user loss after the password database is stolen.
However, due to the existence of hash collision, MD5 encrypted data is not safe, can be generated by the same hash value of the string to crack, so the addition of random number salt MD5 encryption method, to some extent, increased the difficulty of the dictionary attack.
Questions raised
A few years ago on Sina Weibo, a person released such a micro-blog: " debut Internet Security Common sense math problem ... Suppose your site has all the user passwords MD5 encryption (one-way hash, not reversible), assuming that your site has Ten million members, if you lost the user library, how many members will be the password is cracked? Think about it. "At the time, a friend of mine thought that all 100,000 passwords would be cracked, but I don't think so, because according to my prior knowledge:
(1) MD5 encryption algorithm is widely used in Internet applications, MD5 is not a simple classical encryption algorithm, can not be decrypted by reverse decrypt, can only be cracked through the hash (Hack);
(2) I have seen the results of the MD5 encryption of the same string, resulting in a random string (later found that I see is not a simple MD5 encryption, but the result of the addition of salt);
(3) MD5 used as cryptographic encryption algorithm is not absolutely safe, because it may produce hash collision, simple password MD5 encryption can be found by the rainbow table;
(4) I have seen a few cracked MD5 encrypted website (http://www.cmd5.com/), most of the practice is free of charge for the user to crack, the accumulation of sufficient database can be cracked simple password, decryption service began to charge, so the MD5 password to crack should not be so simple.
After a heated discussion of this issue, it was not long before the CSDN database leaks occurred and 6 million database records were randomly propagated. The end of the Tianya forum database also leaked, 20 million database records have been confirmed almost all can log in. The user passwords stored in the databases of both sites are not encrypted, which is in plaintext. The occurrence of this kind of event confirms the importance of encrypting the password of the user stored in the website database.
Nowadays, MD5 encryption is one of the most widely used algorithms for user cipher encryption.
Background knowledge
For the hash function h (x), the following attributes must be met [1]:
Compression: For a given input x, the output length y=h (x) is small;
Efficiency: For a given input x, it is easy to compute y=h (x);
One-way: The hash function h is a one-way function, that is, for almost all x, it is not feasible to find x for the value y of h (x);
Weak no collision: Known x, finding X ' makes H (x ') ==h (x) not feasible in calculation;
Strong no collision: for arbitrary X≠x ', H (x ') ==h (x) is computationally infeasible.
MD5 's full name is Message-digest algorithm 5, presented in 1991 by Ronald L. Riverst of MIT, which evolved from MD4 to produce a 128-bit (4 32-bit 16-digit) Information digest algorithm. [2] MD5 algorithm is an irreversible string transform algorithm, that is, see the source program and algorithm description, and can not transform a MD5 value back to the original string.
In the 1993, Den Boer and Bosselaers gave a limited "pseudo collision" result;
The design of the MD5 algorithm was found to be flawed in 1996, although it was not proven to be fatal at the time, and cryptography experts recommend using other cryptographic algorithms (such as SHA-1).
The MD5 algorithm was proved unsafe in 2004 because of a hash collision. [3]
In 2007, researchers found that using the Chosen-prefix collision method could cause programs that contain malicious code to produce legitimate MD5 values.
In 2008, researchers discovered two executables that produced the same MD5 hash value.
The above example proves that the security of MD5 algorithm is not high and can not be applied to SSL encryption and digital signature with high security requirements. At present, the most recommended hash encryption algorithm should be the SHA-2 encryption algorithm.
MD5 Algorithm Description
The MD5 algorithm can output fixed 128-bit length of encrypted information for indefinite length of input. MD5 the information entered by 512 bits, each grouping is divided into 16 32-bit subgroups, and the algorithm process eventually generates four 32-bit data to be combined into 128-bit hashes. The specific process of the algorithm is as follows [4]:
(1) The information is filled so that the result of its bit length to 512 is equal to 448. Extend the length of the information to n*512+448, where n is a non-negative integer and n can be zero. The padding method fills a 1 and countless 0 after the information until the condition is met.
See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/web/Skills/
(2) Append to the result a 64-bit binary representation of the length of the information before the fill. After these two-step processing, the information now has a bit length =n*512+448+64= (n+1) *512, i.e. the length is exactly 512 times the integer. The reason for this is to meet the requirements for the length of information in subsequent processing. The MD5 has four 32-bit integer parameters called link variables (chaining Variable) whose initial values are: a=0x67452301,b=0xefcdab89,c=0x98badcfe,d=0x10325476.
(3) The four-wheel main cycle operation of the algorithm is entered. The number of loops is the number of 512-bit information groupings in the information. The main loop has four rounds, and each round cycle is very similar. 16 operations in the first round. Each operation of a, B, C and D of the three in a non-linear function operation, and then add the result of the fourth variable, a subgroup of text and a constant. The result is then shifted to the Zoo by an indefinite number, plus one of a, B, C, or D. Finally, replace one of a, B, C, or D with this result.
(4) After the completion of the four-wheel bitwise operation, A, B, C, D plus a, B, C, D respectively. The algorithm is then continued with the next grouped data, and the final output is cascaded by a, B, C, and D.