Research on MD5 algorithm

Source: Internet
Author: User
Tags bitwise file system functions gz file hash integer variables variable
An overview of algorithms

The full name of MD5 is message-digest algorithm 5 (Information-Digest algorithm), in the early 90 by MIT Laboratory for Computer and RSA Data Security Inc, Ronald L. Riv EST developed and developed by MD2, MD3 and MD4. Its role is to allow bulk information to be "compressed" into a confidential format (that is, to transform a byte string of any length into a long, large integer) before signing the private key with the digital signature software. Whether they are MD2, MD4, or MD5, they need to obtain a random length of information and produce a 128-bit summary of the information. Although the structures of these algorithms are more or less similar, the design of MD2 is completely different from MD4 and MD5 because MD2 is designed for 8-bit machines, while MD4 and MD5 are 32-bit computers. The descriptions of these three algorithms and the C language source code are described in detail in the Internet RFCs 1321 (http://www.ietf.org/rfc/rfc1321.txt), which is the most authoritative document by Ronald L. Rivest submitted to Ieft in August 1992.

Rivest developed a MD2 algorithm in 1989. In this algorithm, the data is first made to complement the information, so that the byte length of the information is 16 multiples. Then, with a 16-bit check and append to the end of the message. And the hash value is computed based on the newly generated information. Later, Rogier and Chauvaud found that if the test was omitted and a MD2 conflict would arise. The MD2 algorithm's encryption results are unique-there is no duplication.

In order to enhance the security of the algorithm, Rivest developed a MD4 algorithm in 1990. The MD4 algorithm also needs to fill the information to ensure that the byte length of the message plus 448 can be divisible by 512 (Information byte length mod 512 = 448). Then, a 64-bit binary representation of the initial length of information is added in. Information is processed into blocks of 512-bit damg?rd/merkle iterative structures, and each block is processed by three different steps. Den Boer and Bosselaers and others quickly discovered vulnerabilities in the first and third steps of the attack MD4 version. Dobbertin shows you how to use a common personal computer to find a conflict in a MD4 full version in a few minutes (this conflict is actually a vulnerability that will result in encryption of different content and possibly the same encrypted result). There is no doubt that MD4 was eliminated from the matter.

Although the MD4 algorithm has such a big loophole in security, it can not ignore the emergence of several kinds of information security encryption algorithms that have been developed since then. In addition to MD5, the more famous among them are SHA-1, RIPE-MD and Haval.

A year later, in 1991, Rivest developed a more technologically sophisticated MD5 algorithm. It adds the concept of "security-straps" (safety-belts) on the basis of MD4. Although MD5 is slightly slower than MD4, it is more secure. This algorithm is clearly composed of four and MD4 designs with a few different steps. In the MD5 algorithm, the size of the information-digest and the necessary conditions for filling are exactly the same as the MD4. Den Boer and Bosselaers have discovered a fake conflict (pseudo-collisions) in the MD5 algorithm, but there are no other encrypted results found.

Van Oorschot and Wiener have considered a function (brute-force hash function) that violently searches for conflicts in a hash. And they speculate that a machine designed specifically to search for MD5 conflicts (which cost about 1 million dollars in 1994) can find a conflict on average every 24 days. But in the 10 years between 1991 and 2001, there was no new algorithm to replace the MD5 algorithm MD6 or other names, and we could see that the flaw did not have much impact on MD5 security. None of the above is enough to be a problem in the practical application of MD5. And, because the MD5 algorithm uses does not need to pay any copyright expense, therefore in the general situation (not the top-secret application domain. But even if the application is in the top secret domain, MD5 is also a very good intermediate technology, MD5 how all should be considered is very safe.

Application of the algorithm

A typical application of MD5 is to generate information summaries (message-digest) for a piece of information (message) to prevent tampering. For example, there are a lot of software downloads on Unix with a file name that has the same filename, file name extension. md5 files, in which there is usually only one line of text, roughly structured like this:

MD5 (tanajiya.tar.gz) = 0ca175b9c0f726a831d895e269332461

This is the digital signature of the tanajiya.tar.gz file. MD5 the whole file as a large text information, through its irreversible string transformation algorithm, produced this unique MD5 information digest. If in the process of propagating this file in the future, no matter how the contents of the file have changed in any way (including the transmission error caused by the line instability during the manual modification or download), you will find that the information digest is different as long as you recalculate the MD5 of the file, which makes sure that you get only an incorrect file. If there is a third party certification body, using MD5 can also prevent the file author's "Repudiation", which is called digital signature applications.

MD5 is also widely used in encryption and decryption techniques. For example, a user's password in a UNIX system is stored in a file system with MD5 (or other similar algorithms) encrypted. When the user logs in, the system calculates the password entered by the user as a MD5 value and then compares it to the MD5 value saved in the file system to determine if the password is correct. Through such steps, the system can determine the legality of the user login system without knowing the user's password. This will not only prevent the user's password is known to users with system administrator privileges, but also to some extent to increase the difficulty of the password is cracked.

For this reason, one of the most widely used hackers to decipher passwords is a way of being called a "running dictionary." There are two ways to get dictionaries, one is the daily collection of string tables used for passwords, the other is generated by the permutation and combination method, first the MD5 value of these dictionary items is computed with the MD5 program, and then the MD5 value of the target is retrieved in the dictionary. We assume that the maximum length of the password is 8-bit bytes (8 Bytes), at the same time the password can only be letters and numbers, a total of 26+26+10=62 characters, the number of listed dictionaries are P (62,1) +p (62,2) ... +p (62,8), which is already a very astronomical figure, Storing this dictionary requires a TB-level disk array, and there is a prerequisite for it to be able to obtain the password MD5 value of the target account. This encryption technology is widely used in UNIX systems, which is one important reason why UNIX systems are more robust than normal operating systems.

Algorithm description

A brief description of the MD5 algorithm can be: MD5 512-bit grouping to process the input information, and each grouping is divided into 16 32-bit subgroups, after a series of processing, the output of the algorithm is composed of four 32-bit, the four-bit group will be cascaded after this will generate a 32-bit hash value.

In the MD5 algorithm, the information must be populated first so that its byte length is equal to 448 for 512. As a result, the byte length of the message is extended to n*512+448, that is, the n*64+56 byte (Bytes) and n is a positive integer. The method of filling is as follows, filling a 1 and countless 0 after the information, until the above conditions are met to stop the fill of 0 pairs of information. Then, after this result, append a 64-bit binary representation of the padding information length. After these two steps, now the message byte length =n*512+448+64= (n+1) *512, that is, the length is exactly 512 times the integer. The reason for this is to meet the requirements for the length of information in subsequent processing.

The MD5 has four 32-bit integer parameters called link variables (chaining Variable), respectively: a=0x01234567,b=0x89abcdef,c=0xfedcba98,d=0x76543210.

When you set up these four link variables, you begin to enter the algorithm's four-wheel loop operation. The number of loops is the number of 512-bit information groupings in the information.

Copy the above four link variables to another four variables: A to A,b to B,c to C,d to D.

The main loop has four wheels (MD4 only), and each round cycle is very similar. 16 operations in the first round. Each operation of a, B, C and D of the three in a non-linear function operation, and then add the result of the fourth variable, a subgroup of text and a constant. The resulting result is then shifted to the right by a variable number, plus one of a, B, C, or D. Finally, replace one of a, B, C, or D with this result.
Here are four non-linear functions used in each operation (one for each round).

F (x,y,z) = (x&y) | ((~x) &z)
G (x,y,z) = (x&z) | (y& (~z))
H (x,y,z) =x^y^z
I (x,y,z) =y^ (x| ( ~Z))
(& is with, | yes or, ~ right or wrong, ^ is XOR)

Description of these four functions: if the corresponding bits of x, Y and Z are independent and homogeneous, then each bit of the result should also be independent and homogeneous.
F is a bitwise operation function. That is, if x, then y, otherwise Z. function h is a bitwise parity operator.

Suppose MJ represents the first J subgroup of a message (from 0 to,<<)
FF (A,b,c,d,mj,s,ti) represents the a=b+ ((A + (b,c,d) +mj+ti) << GG (A,b,c,d,mj,s,ti) represents a=b+ ((A + (G (b,c,d) +mj+ti) << HH (A , B,c,d,mj,s,ti) a=b+ ((A + (b,c,d) +mj+ti) << II (A,b,c,d,mj,s,ti) represents a=b+ ((A + (I (b,c,d) +mj+ti) <<
These four rounds (64 steps) are:

First round

FF (a,b,c,d,m0,7,0xd76aa478)
FF (d,a,b,c,m1,12,0xe8c7b756)
FF (c,d,a,b,m2,17,0x242070db)
FF (B,C,D,A,M3,22,0XC1BDCEEE)
FF (A,B,C,D,M4,7,0XF57C0FAF)
FF (D,A,B,C,M5,12,0X4787C62A)
FF (c,d,a,b,m6,17,0xa8304613)
FF (b,c,d,a,m7,22,0xfd469501)
FF (A,B,C,D,M8,7,0X698098D8)
FF (D,A,B,C,M9,12,0X8B44F7AF)
FF (C,D,A,B,M10,17,0XFFFF5BB1)
FF (B,C,D,A,M11,22,0X895CD7BE)
FF (a,b,c,d,m12,7,0x6b901122)
FF (d,a,b,c,m13,12,0xfd987193)
FF (c,d,a,b,m14,17,0xa679438e)
FF (b,c,d,a,m15,22,0x49b40821)

Second round



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.