---------------

What is hash?

Important features of hash

Implementation of Hash Functions

Primary hashAlgorithm

Security issues of Hash Algorithms

Application of Hash Algorithm

Conclusion

---------------

Hash, which is usually translated as "hash", and is also directly translated as "hash", that is, input of any length (also called pre- ing, pre-image ), the hash algorithm is used to convert an output with a fixed length. The output is the hash value. This type of conversion is a compression ing, that is, the space of hash values is usually much smaller than the input space, and different inputs may be hashed into the same output, instead, it is impossible to uniquely determine the input value from the hash value.

Mathematical Expression: H = H (M), where H () -- unidirectional hash function, m -- any length plaintext, h -- fixed length hash value.

The hash algorithm used in the information security field must meet other key features:

First, of course, it is a one-way method. From pre- ing, hash values can be obtained easily and quickly, but it is impossible to construct a pre- ing in computing, make the hashed result equal to a specific hashed value, that is, constructing the corresponding M = H-1 (H) is not feasible. In this way, hash values can uniquely represent input values in Statistics. Therefore, the hash in cryptography is also called "Message Digest )", this means that you can easily abstract the message, but you cannot get more information about the message than the summary itself.

The second is collision-resistant, which means that two pre-mappings with the same hash values cannot be generated in statistics. Given M, M' cannot be found in computing, satisfying H (m) = H (m'), which means weak conflict resistance. It is also difficult to find a pair of Arbitrary m and M' in computing ', so that the conditions h (m) = H (m') are met, which means strong resistance to conflict. "Strong resistance to conflict" is required mainly to prevent the so-called "birthday attack". In a group of 10 people, you can find that the probability of a person with the same birthday as yours is 2.4%. In the same group, the probability of two people with the same birthday is 11.7%. Similarly, when the pre- ing space is large, the algorithm must have sufficient strength to ensure that it cannot easily find people with the same birthday.

The third is the uniformity of ing distribution and the uniformity of differential distribution. In the hash result, BITs 0 and 1 are equal, and the total number of BITs should be roughly equal, more than half of the bit changes in the hash result are called avalanche effect, at least half of the input bits must change. The essence is that the information of each bit in the input must be evenly reflected to every bit in the output, the results are the results of the input with as many bit information as possible.

Damgard and Merkle define the so-called compression function, which is to convert a fixed-length input into a short fixed-length output, this has a great impact on the design of Hash Functions in cryptography practice. The hash function is the process of repeatedly compressing input groups and the results of the previous compression processing based on the specific compression function until the entire message is compressed, the final output serves as the hash value of the entire message. Despite the lack of strict proof, the vast majority of industry researchers have agreed that if compression functions are secure, it would be safe to hash messages of any length in the above format. This is the so-called damgard/Merkle structure:

In, messages of any length are split into groups that meet the input requirements of the compression function. The last group may need to add specific Padding Bytes at the end. These groups will be processed sequentially, except that the first message group uses the hash initialization value as the input of the compression function, the current group is used as the input of this compression function together with the compression function output of the previous group, the output is used as a part of the input of the next grouping compression function until the output of the last compression function is used as the result of the whole message hash.

MD5 and sha1 are currently the most widely used Hash algorithms, and they are all designed based on md4.

1) md4

Md4 (RFC 1320) was designed by MIT's Ronald L. Rivest in 1990. md is short for message digest. It is applicable to high-speed software implementation on 32-bit character-length processors-it is implemented based on 32-bit operations. Its security is not based on mathematical assumptions like RSA, although den Boer, bosselers, and dobbertin quickly attacked two of its three Transformations with analysis and difference, it proves that it is not as secure as expected, but its entire algorithm has not been actually cracked, and Rivest has quickly improved.

The following are examples of md4 hash results:

Md4 ("") = 31d6cfe0d16ae931b73c59d7e0c089c0

Md4 ("A") = bde52cb31de33e46245e05fbdbd6fb24

Md4 ("ABC") = a448017aaf21d8525fc10ae87aa6729d

Md4 ("Message Digest") = d9130a8164549fe818874806e1c7014b

Md4 ("abcdefghijklmnopqrstuvwxyz") = d79e1c308aa5bbcdeea8ed63df412da9

Md4 ("abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz0123456789") = 043f8582f241db351ce627e153e7f0e4

Md4 ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") = e33b4ddc9c38f2199c3e7b164fcc0536

2) MD5

MD5 (RFC 1321) is an improved version of md4 by Rivest in 1991. It still groups the input in 512 bits, and its output is a cascade of 4 32 bits, which is the same as that of md4. Compared with md4, it makes the following improvements:

1) added the fourth round

2) each step has a unique addition constant;

3) in the second round, the G function changes from (x ∧ y) ∨ (x ∧ Z) ∨ (Y ∧ z) to (x ∧ Z) ∨ (Y ∧ ~ Z) to reduce its symmetry;

4) the results of the previous step are added to each step to accelerate the avalanche effect ";

5) changed the sequence of access input Sub-Groups in the 2nd and 3rd rounds, and reduced the degree of similarity;

6) The Circular left shift displacement of each wheel is optimized to accelerate the avalanche effect. The left shift of each wheel is different.

Although MD5 is more complex than md4 and slower than md4, it is safer and better in terms of anti-analysis and anti-difference performance.

Messages are first divided into several 512-bit groups. The last 512-bit group is "message tail + padding byte (100... 0) + 64-bit message length "to ensure that the group is different for messages of different lengths. The 64-bit message length limit causes the MD5 safe input length to be smaller than the bytes bit, because the length information greater than the 64-bit length will be ignored. The four 32-bit register characters are initialized to a = 0x01234567, B = 0x89abcdef, c = 0xfedcba98, D = 0x76543210, they will always participate in the operation and form the final hash result.

Then each 512-bit message group enters the main loop of the algorithm in the form of 16 32-bit characters. The data of the 512-bit message group determines the number of cycles. The main cycle has four rounds, and each round uses non-linear functions.

F (x, y, z) = (x every y) equals (~ X ∧ Z)

G (x, y, z) = (x ∧ Z) returns (Y ∧ ~ Z)

H (x, y, z) = x ⊕ y ⊕ Z

I (x, y, z) = x round (Y round ~ Z)

These four transformations perform the following operations on 16 32-bit characters in the 512-bit message group that enters the main loop: add the results of the three F, G, H, and I operations in the copies A, B, C, and D to the 4th, add the addition constant of a 32-bit character and a 32-bit character, and round the obtained value to several places left. Finally, add one of the values A, B, C, and D to the result, and send it back to ABCD to complete a loop.

The addition constant used is defined by T [I], where I is 1... 64, t [I] is the integer part of the absolute sine value to the power of 4294967296. This is done to further eliminate linearity in the transformation through the sine function and power function.

After all the 512-bit groups are computed, the ABCD cascade is output as an MD5 hash. Below are some examples of MD5 hash results:

MD5 ("") = d41d8cd98f00b204e9800998ecf8427e

MD5 ("A") = MD5

MD5 ("ABC ") = 900150983cd24fb0d6963f7d28e17f72

MD5 ("Message Digest") = digest

MD5 ("abcdefghijklmnopqrstuvwxyz ") = c3fcd3d76192e4007dfb496cca67e13b

MD5 ("Digest") = d174ab98d277d9f5a5611c2c9f 419d9f

MD5 ("12345678901234567890123456789012345678901234567890123456789012345678901234567890 ") = 57edf4a22be3c955ac49da2e21_b67a

refer to the RFC documentation to obtain the md4, MD5 algorithm detailed description and algorithm C Source Code .

3) sha1 and others

Sha1 is designed to be used together with DSA by nist nsa. Visit http://www.itl.nist.gov/fipspubsto go to the details section --#/url] "FIPS pub 180-1 Secure Hash Standard ". It generates a hash value of 264 bits for input with a length less than, so brute-force is better. SHA-1 is designed based on the same principle as md4 and imitates this algorithm. Because it will generate a-bit hash value, it has five 32-bit register characters involved in the operation. The message grouping and filling methods are the same as MD5, and the main cycle is also four rounds, however, for 20 operations per round, the non-linear operation, the shift operation, and the addition operation are similar to the MD5 operation. However, the design of the non-linear function, the addition constant, and the cyclic left shift operation are different, you can refer to the specifications mentioned above to learn about these details. Here are some examples of sha1 hash results:

Sha1 ("ABC") = a9993e36 4706816a ba3e2571 7850c26c 9cd0d89d

Sha1 ("abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq") = 84983e44 1c3bd26e baae4aa1 f95129e5 e54670f1

Other well-known hash algorithms include md2, N-hash, ripe-MD, and haval. All of the above are "pure" hash algorithms. There are another two types of hash algorithms: one is the one-way hash algorithm based on the symmetric grouping algorithm. A typical example is the so-called Davies-Meyer Algorithm Based on Des, in addition, the Davies-Meyer algorithm improved by idea, both of which are currently considered safe algorithms. The other type is based on the modulo operation/discrete logarithm, that is, the Public Key algorithm. However, due to its high computing overhead, it lacks good application prospects.

Most algorithms that fail to pass the test of analysis and differential attacks have died in the lab. Therefore, if the popular hash algorithms fully comply with the one-way and conflict-resistant cryptography, this ensures that only the effort is exhausted and is the only way to undermine the security features of hash operations. In order to combat the weak resistance to conflict, we may need to try 2 ^ 128 or 2 ^ 160 different input with the same length as the hash value space, at present, it may take 10 ^ 25 years for a high-end PC to complete this arduous task. Even the highest-end parallel system, this is not what has been done for thousands of years. Because the "birthday attack" effectively reduces the amount of effort required, it is reduced to about 1.2*2 ^ 64 or 1.2*2 ^ 80. Therefore, strong conflict resistance is the key to determining the security of the hash algorithm.

In the new Advanced Encryption Standard (AES) of NIST, keys with a length of 128, 192, and bits are used. Therefore, sha256, sha384, and sha512 are designed accordingly, they provide better security.

The application of the hash algorithm in information security is mainly reflected in the following three aspects:

1) file Verification

We are familiar with the parity and CRC verification algorithms. These two verification algorithms do not have the ability to defend against data tampering. To a certain extent, they can detect and correct channel codes in data transmission, however, it cannot prevent malicious data destruction.

The "digital fingerprint" feature of the MD5 hash algorithm makes it the most widely used file integrity checksum algorithm. Many Unix systems provide commands for calculating MD5 checksum. It is often used in the following two cases:

The first is the file transfer verification. The MD5 checksum is calculated for the target file and compared with the MD5 checksum of the source file. The MD5 checksum is consistent between the two, in terms of statistics, we can ensure that each code element of the two files is identical. This can check whether errors occur during file transmission. More importantly, it can ensure that files are not maliciously tampered during transmission. A typical application is the FTP service, which can be used to ensure the correctness of resumable data transfer, especially the files downloaded from the image site.

A better solution is calledCodeSignature: when the file provider provides the file, it also provides a digital signature value for the file hash value using its own code signature key and its own code signature certificate. The recipient of the file not only verifies the integrity of the file, but also determines whether to accept the file based on the degree of trust they trust the certificate issuer and the certificate owner. The browser is downloading and running plug-ins and small JavaProgramThis mode is used.

The second is to store the digital fingerprints of the binary file system to detect whether the file system has been modified without permission. Many System Management/system security software provide the File System Integrity Evaluation function. After the initial installation of the system, the basic file system checksum database is established, because hash checksum has a small length, they can be conveniently stored on a storage medium with a small capacity. After that, the checksum of the file system can be calculated regularly or as needed. Once it is found that there is a mismatch with the original saved value, it indicates that the file has been illegally modified or infected, or be replaced by a trojan program. Tripwire provides a typical example of such an application.

The more perfect method is to use "Mac ". "Mac" is a term closely related to hash, that is, message Authority code ). It is the hash value related to the key. You must own the key to verify the hash value. The digital fingerprints of the file system may be stored on untrusted media and only provide the identifier for those who own the key. In addition, when the digital fingerprint of a file may need to be modified, only the owner of the key can calculate a new hash value, but those who attempt to damage the file integrity cannot succeed.

2) Digital Signature

Hash algorithms are also an important part of modern cryptographic systems. Due to the slow operation speed of asymmetric algorithms, one-way hashing plays an important role in Digital Signature protocols.

In this signature protocol, both parties must negotiate the hash function and signature algorithm supported by both parties in advance.

The signature first calculates the hash value of the data file, and then returns a very short hash value. For example, MD5 is 16 bytes and sha1 is 20 bytes, use asymmetric algorithms for Digital Signature operations. When verifying the signature, the other party first calculates its hash value for the data file, and then verifies the digital signature using an asymmetric algorithm.

It can be regarded as equivalent in statistics to digital signature of the file itself. This Protocol also has other advantages:

First, the data file itself can be stored separately from its hash value, and signature verification can also be performed without the existence of the data file itself.

In some cases, the signature key may be the same as the decryption key. That is to say, if a data file is signed, the asymmetric decryption operation is the same, this is quite dangerous. Malicious destructive attackers may send a file that is trying to trick you into decrypting it, and send it to you as a file that requires your signature. Therefore, it is safe to sign the hash value of any data file.

3) Authentication Protocol

The following authentication protocol is also known as "challenge-Authentication Mode: it is a simple and secure method when the transmission channel can be listened but cannot be tampered.

The authenticated Party sends a random string ("challenge") to the authenticated party. After the authenticated party hashes the random string with its own authentication password, return the authenticator. The hash value received by the authenticator is compared with the hash calculation result ("authenticate") performed by the user using the random string and the authorization password of the other party. If the hash value is the same, in statistics, you can assume that the other party has the password, that is, authentication.

The POP3 Protocol provides a typical example of this application:

S: + OK POP3 Server Ready <1896.697170952@dbc.mtview.ca.us>

C: APOP mrose c4c9334bac560ecc979e58001b3e22fb

S: + OK maildrop has 1 message (369 octets)

In the above POP3 protocol Session, the symmetric key shared by both parties (authentication password word) is tanstaaf, the challenge of the server is <1896.697170952@dbc.mtview.ca.us>, the client's response to the challenge was MD5 ("<1896.697170952@dbc.mtview.ca.us> tanstaaf") = c4c9334bac560ecc979e58001b3e22fb, and this correct response made it certified.

Hash algorithms have been widely used in computer science for a long time. With the development of modern cryptography, one-way hash functions have become an important structural module in the field of information security, we have reason to study its design theory and application methods in depth.