Introduction: What is Hashing Algorithm?

Source: Internet
Author: User
Keywords what is hashing algorithm define hashing algorithm what is the latest version of the secure hash algorithm
1. What is Hash and its function
Let me give you an example. In order to be able to participate in various social activities, each of us living in the world needs a sign to identify ourselves. Maybe you think your name or ID is enough to represent you, but this representation is very fragile, because there are many people with the same name, and ID can also be forged. The most reliable way is to record all the genetic sequences of a person to represent the person, but obviously, this is not practical. Fingerprints seem to be a good choice. Although some professional organizations can still simulate a person's fingerprints, the price is too high.
Simple Application Server
USD1.00 New User Coupon
* Only 3,000 coupons available.
* Each new user can only get one coupon(except users from distributors).
* The coupon is valid for 30 days from the date of receipt.

For files transmitted in the Internet world, how to mark the identity of a file is equally important. For example, when we download a file, it will pass through many network servers and routers during the downloading process. How can we ensure that this file is what we need? It is impossible for us to detect each byte of this file one by one, nor can we simply use the file name and file size, which are extremely easy to disguise information. At this time, we need a fingerprint-like mark to check the reliability of the file. , This kind of fingerprint is the Hash algorithm (also called the hash algorithm) we are using now.

Hash Algorithm, also known as hash algorithm, hash algorithm, is a method of creating a small digital "fingerprint" from any file. Like fingerprints, a hash algorithm is a sign that guarantees the uniqueness of a file with short information. This sign is related to every byte of the file, and it is difficult to find the reverse law. Therefore, when the original file changes, its flag value will also change, thus telling the file user that the current file is no longer the one you need.

What is the significance of this sign? The previous file download process is a good example. In fact, most network deployment and version control tools now use hashing algorithms to ensure file reliability. On the other hand, when we perform file system synchronization, backup and other tools, we use hashing algorithms to mark the unique performance of files to help us reduce system overhead. This has applications in many cloud storage servers.

Many version control tools represented by Git are using hash functions such as SHA1 to check file updates

Of course, as a fingerprint, the most important purpose of hashing algorithms is to add encryption protection to high-security content such as certificates, documents, and passwords. The purpose of this aspect is mainly due to the irreversibility of the hash algorithm. This irreversibility is reflected in the fact that it is not only impossible to obtain the original file based on a fingerprint obtained through the hash algorithm, but also impossible to simply create A file and make its fingerprint consistent with a target fingerprint. This irreversibility of the hash algorithm maintains the operation of many security frameworks, and this will also be the focus of this article.

2. What are the characteristics of the Hash algorithm
An excellent hash algorithm will be able to achieve:

Forward fast: Given the plaintext and hash algorithm, the hash value can be calculated within a limited time and limited resources.
Difficulty in reverse engineering: Given (several) hash values, it is difficult (basically impossible) to reverse the plaintext in a limited time.
Input sensitivity: If the original input information modifies a little information, the hash values generated should look very different.
Conflict avoidance: It is difficult to find two plaintexts with different contents so that their hash values are consistent (a conflict occurs). That is, for any two different data blocks, the possibility of the same hash value is extremely small; for a given data block, it is extremely difficult to find a data block with the same hash value.
However, in different usage scenarios, such as data structure and security, certain features will be emphasized.

2.1 Application of Hash in managing data structure
In the data structure that uses the hash to manage, the speed is more important, and the collision resistance is not very important, as long as the hash is evenly distributed. For example, in the hashmap, the purpose of the hash value (key) is to speed up the search of key-value pairs, and the function of the key is to place the elements in each bucket appropriately, and the requirements for collision resistance are not so high. In other words, as long as the key from the hash, it is enough to ensure that the value is roughly evenly placed in different buckets. But the set performance of the entire algorithm is directly related to the speed of hash value generation, so the speed of hash value generation at this time is particularly important. Take the String.hashCode() method in JDK as an example:
public int hashCode() {
    int h = hash;
    //hash default value: 0
    if (h == 0 && value.length> 0) {
        //value: char storage
        char val[] = value;
        for (int i = 0; i <value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

A very concise multiplication-addition iterative operation. In many hash algorithms, XOR+addition is used for iteration, and the speed is similar to the former.

2.1 Application of Hash in cryptography
In cryptography, the function of the hash algorithm is mainly used for message digest and signature. In other words, it is mainly used for verifying the integrity of the entire message. For example, when we log in to Zhihu, we need to enter a password. If Zhihu saves this password in plain text, then hackers can easily steal everyone's password to log in, which is particularly insecure. Then Zhihu thought of a way to use the hash algorithm to generate a password signature, and Zhihu only saves this signature value in the background. Since the hash algorithm is irreversible, even if the hacker gets this signature, it is of no use; and if you enter your password on the website login interface, then know that the background will recalculate the hash value, and the website stored The original hash value is compared, and if the same, it proves that you have the password for this account, then you will be allowed to log in. The same is true for banks. Banks dare not save the original text of user passwords, only the hash value of the password. In these application scenarios, the requirements for anti-collision and anti-tampering capabilities are extremely high, and the requirements for speed are second. A well-designed hash algorithm has high anti-collision capabilities. Taking MD5 as an example, its output length is 128 bits, and the design expected collision probability is a very small and extremely small number-and even after MD5 is cracked by Professor Wang Xiaoyun, the upper limit of the collision probability is as high as that. , It takes at least one time to have a 1/2 probability to find a hash value that is the same as the target file. For two similar strings, the MD5 encryption result is as follows:
MD5("version1") = "966634ebf2fc135707d6753692bf4b1e";
MD5("version2") = "2e0e95285f08a07dea17e7ee111b21c8";

It can be seen that only one bit change, the MD5 values of the two are very different

ps: In fact, the hash algorithm is regarded as an encryption algorithm. This is not accurate. We know that encryption is always relative to decryption. What about encryption without decryption? HASH is designed for the purpose of being unsolvable. And if we don't attach a random salt value, the hash password is easily hacked by dictionary attacks.

3. How is the Hash algorithm implemented?
With the development of cryptography and information security, various encryption algorithms and hashing algorithms can no longer be explained by a few words. Here we only provide a few simple concepts for your reference.

As a hash algorithm, the primary function is to use an algorithm to record the original large file information with several characters, and to ensure that each byte will affect the final result. So everyone may have thought that the algorithm of modulo can meet our needs.

In fact, as an irreversible calculation method, the modular algorithm has become the foundation of the entire modern cryptography. As long as it involves computer security and encryption, there will be modular computing. Hashing algorithms are no exception. One of the most primitive hashing algorithms is to simply select a number for modulo operation, such as the following program.
# Construct a hash function
def hash(a):
    return a% 8

# Test hash function function
print(hash(233))
print(hash(234))
print(hash(235))

# Output result
- 1
- 2
-3
Obviously, the above program accomplishes the primary goal of a hashing algorithm: to represent very long content with a small amount of text (the number after the modulus must be less than 8). But you may have noticed that the results after calculations using only the modulo algorithm have obvious regularities, which will make it difficult for the algorithm to guarantee irreversibility. So we will use another method, which is XOR.

Let's look at the following program again. We add an exclusive OR process to the hash function.
# Construct a hash function
def hash(a):
    return (a% 8) ^ 5

# Test hash function function
print(hash(233))
print(hash(234))
print(hash(235))

# Output result
-4
-7
-6

Obviously, after adding a layer of XOR process, the regularity of the results after calculation is not so obvious.

Of course, everyone may think that such an algorithm is still very insecure. If the user uses a continuously changing series of texts to compare with the calculated results, it is very likely to find the rules contained in the algorithm. But we have other ways. For example, modify the original text before performing calculations, or add additional operations (such as shifting), such as the following program.
# Construct a hash function
def hash(a):
    return (a + 2 + (a << 1))% 8 ^ 5

# Test hash function function
print(hash(233))
print(hash(234))
print(hash(235))

# Output result
-0
-5
-6

The hash algorithm obtained in this way is difficult to find its internal rules, that is, we cannot easily give a number so that the result after the above hash function operation is equal to 4-unless we go to the exhaustion test.

Is the above algorithm very simple? In fact, the commonly used algorithms MD5 and SHA1 that we will introduce below, the essential algorithm is that simple, but more cycles and calculations will be added to enhance the reliability of the hash function.

4. What are the popular algorithms of Hash
Currently popular Hash algorithms include MD5, SHA-1 and SHA-2.

MD4 (RFC 1320) was designed by Ronald L. Rivest of MIT in 1990. MD is the abbreviation of Message Digest. The output is 128 bits. MD4 has proven to be insecure.

MD5 (RFC 1321) is an improved version of MD4 by Rivest in 1991. It still groups the input in 512 bits, and its output is 128 bits. MD5 is more complicated than MD4, and the calculation speed is a bit slower and safer. MD5 has been proven not to have "strong collision resistance".

SHA (Secure Hash Algorithm) is a family of Hash functions. The first algorithm was published by NIST (National Institute of Standards and Technology) in 1993. The currently well-known SHA-1 was released in 1995, and its output is a hash value with a length of 160 bits, so it is more resistant to exhaustion. SHA-1 was designed based on the same principles as MD4 and imitated the algorithm. SHA-1 has been proven not to have "strong collision resistance".

In order to improve security, NIST also designed the SHA-224, SHA-256, SHA-384, and SHA-512 algorithms (collectively referred to as SHA-2), which are similar in principle to the SHA-1 algorithm. SHA-3 related algorithms have also been proposed.

It can be seen that the most important difference between the above several popular algorithms is "strong collision resistance".
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.