My views on Wang Xiaoyun's MD5 cracking

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

According to a csdn report, Wang Xiaoyun, a Chinese mathematician, proposed an algorithm on crypto 2004 that can successfully crack MD5. Both gigix and Wang Xiong quoted related reports in their blog.

MD5 is a digest algorithm, so it is theoretically impossible to obtain the original text from the signature (see the following description ). It is a misunderstanding of the digest algorithm. It is usually used in digital signatures to identify the original nature of the original text-that is, there is no modification after the signature. If you can use different original texts to generate the same signature, this means that the signature may be invalid, and it can prove that the digest algorithm is insecure.

I have read the report of Wang Xiaoyun provided by RL. Because I am not doing this, I am not familiar with the implementation of the MD5 algorithm and the theories referenced in this article, so I don't quite understand. In this report, only one and a half pages of MD5 cracking are introduced, and the specific algorithm is not described in detail. However, two collision examples of 1024-bit original texts are provided at the end, compared with the 512-bit collision proposed by others in 96 years, the calculation workload is said to be around an hour.

These are all progress. If we say this is a boast, it will be a bit arrogant.

Wang Xiaoyun's findings prove that there is a way to generate a collision, but as an anonymous brother at gigix said, this is only a non-specific collision, and to forge a signature, it must generate a specific collision. Therefore, MD5 has not been completely cracked, but it has already been a major breakthrough.

Because I don't have a book that describes specific cryptographic algorithms like "application Cryptography", and I only have one book Lu kaicheng's "Computer Cryptography". this article mainly introduces the principles and applicable fields of several cryptographic algorithms from the mathematical perspective. However, cryptography is based on advanced mathematical theories, such as number theory, group theory, and finite fields. However, I cannot use mathematics, so the specific theory cannot be said in 1234, it can only be said in a general sense. ^ O ^

The first thing I want to talk about is why I need a password? Because our general communication environment isInsecure.

What is an insecure communication environment? Insecurity is manifested in at least two aspects: first, the communication content may beTheftSecond, the communication content may beTampering.

The password is usually used to solve these two problems.

If there is a way to invalidate the role of a password, it can be said that this password isCracking.

Of course, there are some other aspects of insecurity. In addition to passwords, some special protocols are also required for these problems, which are rarely met and will not be mentioned here.

There are many types of commonly used passwords, among which the three are most commonly used:
1. symmetric Password
2. Asymmetric Password
3. Summary

Symmetric PasswordFeatures: encryption and decryptionSameOr even use the same algorithm. For example, from the simplest XOR to the commonly used des, blowfish, and idea. They are generally used as follows:

The sender encrypts the source file (m) with the key (k): E = ENC (M, K)
Then, the EIP is transmitted to the receiver through an insecure network. The receiver decrypts the EIP using the same key (k): m = Dec (E, K)

As long as the algorithm is good enough and the key (k) is kept, this communication can be ensured, because even if others know the ciphertext (E) and the algorithm ENC/DEC, and cannot know the plaintext (m ).

For such a password, if there is a way to export from the ciphertext (E) and algorithm ENC/DEC to the key (k) or plaintext (M), it means that the password isCracking. For example, a simple XOR algorithm can be easily cracked using statistical analysis. However, even if the DES algorithm is considered insecure for nearly 30 years, a large number of plaintext/ciphertext pairs (dozens of Power pairs of 2) are required ), A large amount of computing time is required to obtain the key (k ).

Asymmetric PasswordThis is the reason: in symmetric passwords, both parties need to agree on a common key (k). If the agreed process is not secure, the key may be leaked, for symmetric algorithms, once a key is disclosed, the subsequent communication process will not be cracked.

The general asymmetric password is the so-called public key cryptography algorithm. For example, the most common RSA. l. rivest and. shamir and others are created based on the principle that the big number factor decomposition is extremely difficult.) or recently, the more fashionable "elliptic curve", because my mathematical level is too bad, the specific algorithm is unclear, I only know that it is like this:

As the name suggests, the algorithm used is characterized by the different keys used for encryption and decryption. The procedure is as follows:

The sender generates a key pair: private key (KA) and Public Key (kPa)
The receiver also generates a pair of keys (Kb) and (kpb)
(KPa) and (kpb) are public.
Algorithm used by the sender: E = ENC (M, Ka), kpb)
Encryption is performed twice. The receiver uses the algorithm M = Dec (Kb), kPa)
Decrypt the file twice to obtain the original file.
Both parties do not need to know the private key of the other party, which avoids the security caused by the agreed key.
The asymmetric password algorithm determines that the content encrypted with the private key must be decrypted with the public key, and vice versa. The algorithm also ensures that the private key cannot be exported only when the public key and ciphertext are known, thus, communication security is determined.

Of course, if there is a way to export the private key from the public key, this algorithm will be reportedCracking. However, at least RSA is secure at present, because the current mathematical theory can prove that RSA Algorithms are a class of NPC (NP-complete) problems, as long as the key is long enough (RSA must be at least 10 to 100 power, and the actual use time is much larger), it is impossible to calculate the time cost by the most advanced computer.

SummaryThe algorithm is completely different from the two above. The first two passwords are used to prevent information from beingTheftThe abstract algorithm aims to prove the integrity of the original text, that is, to prevent information from beingTampering. It is also known as the hash algorithm, hash algorithm, and signature algorithm. It is characterized by the generation of a fixed length (for example, MD5 is 128 bits) result from an uncertain-length original text, called the "signature" (s ), this signature must be very sensitive to the original text, that is, even a small number of changes in the original text will cause this signature to be completely invisible. For example, traditional CRC algorithms, MD5 algorithms, and Sha algorithms.

Abstract algorithms are generally used as follows:

For exampleUser Password VerificationFor example, in Linux or some forums, when a user sets a password, the server only records the MD5 of the password, instead of the password itself. In the future, you can verify the validity of the password by comparing it with the recorded MD5.

For exampleIntegrity Verification of published files: For example, to release a program to prevent others from inserting viruses or Trojans into your program, you can publish the MD5 code of the program file while releasing the program, in this way, others only need to download the program from anywhere and then perform MD5 once. Then, compare it with the public MD5 to see if the program has been modified by a third party.

A secure Digest algorithm must meet two requirements during design: one is that it is not feasible to search for two inputs to obtain the same output value. This is what we usually callAnti-collisionThe second is to find an output. The given input is not feasible in computing, that isYou cannot export its initial status from the result..

Conversely, if a digest algorithm cannot meet both of the preceding conditions, it is insecure. In fact, it is mainly the first condition, because theoretically it is easy to prove that the following condition can basically be met:

The algorithm generates a fixed-length signature for any long original text. According to Shannon's information theory, when the length of the original text exceeds a certain degree, all information in the original text cannot be recorded in the signature, this means that there is informationLostIn theory, it is impossible to restore the original text from the signature.

Why in theory? That is to say, when this digest algorithm is completely cracked, it can be recovered from the signature.ArbitraryNote: it is an arbitrary original text, because all digest algorithms haveInfinity. The real original is only one of them. Corresponding to this infinite set, this is an infinitely small, which I once said:

The possibility is zero. It does not mean it is impossible.

The specific explanation is as follows: assume that the original text contains information (I), and the signature length is limited (for example, the MD5 128 bits), the information is only (I ), because I <I (unless the original text is very short), I = I + I '. Because I has no limits, and I has limits, I 'is also an unlimited volume. After the Digest algorithm is performed, the I 'information is lost.

In turn, if this digest algorithm is broken, it can be pushed back from I, but because I 'information is lost, it means I + I' (where I 'is arbitrary Information) it may be I (collision ). However, ''is an infinite set and'' belongs '. This note: theoretically, you can find I 'from I' to restore the original I, but the possibility is zero (1/∞ = 0 ).

However, it is not easy to achieve the above. BecauseAn Algorithm without a collision cannot be a digest algorithm.But only oneLossless Compression Algorithm. It must contain the originalAllInformation, which means that once it is cracked, it can only restore the original text. And the result is definitely not long, because it needs to contain all the information of the original text, of course, it will change according to the length of the original text. Only these two points determine, itIt cannot be a good Signature Algorithm.

The most important thing is:The purpose of the digest algorithm is determined. As long as a collision can be found, it is enough to invalidate it, and it does not need to find the original text..

In the previous two examples:

For example, for the Linux User security mechanism, you only need to obtain the User Password File (which records the MD5 of the password), and thenGenerate a collision article(Not necessarily the same as the original password), you can use this password to log on.

However, the subsequent program release examples are much harder to understand, because they must be able to generateSpecific collisionThat is, insert a virus or trojan in the program and then fill in some data to generate the same MD5 as the original one.

However, I thought about it yesterday. Taking MD5 as an example, it is still unlikely to generate a specific collision, because the 128-bit MD5 information is already a little large. If a specific collision occurs, the data to be filled may be very large, leading to a much larger number of forged original texts than the actual original ones.Several orders of magnitudeThe difference is that such forgery is meaningless.

Wang Xiaoyun's achievements have completely invalidated the MD5-based authentication technology used by Linux. Although technically it is too early to be completely cracked, from a legal point of view, it has "shaken the foundation of almost the entire digital signature field" (the full text of linhu is as follows ).

What I call "shake the foundation" cloud is from the perspective of invalid law, rather than purely technical.
Here is an example I gave yesterday. For example, if two people have identical fingerprints and I can quickly find them, we cannot use them as valid evidence from the legal point of view. Although the two people with the same fingerprint do not pretend to each other.

Likewise, it is impossible to forge a file and generate the same MD5 code, but now we can find two identical files with the same MD5 code in a short time, the "legal significance" of MD5 as a digital signature is lost. What is digital signature used? Is to make an electronic document legal. Therefore, I said that this discovery has shaken the foundation of digital signatures.

Supplement:

In fact, we should look at this event from a positive perspective.

Everyone has learned about Ma Zhe, and the contradiction has always been in opposition and will show a rising trend. For example, Shamir and others have cracked the MH backpack public key, but he also proposed a better RSA public key together with others. For example, more and more encryption methods have emerged since Des began to be insecure.

Therefore, Wang Xiaoyun and others have discovered the problems of the hash algorithm currently used, and will certainly help new hash algorithm designers in the future to consider this issue, this makes the new hash algorithm more secure.

Therefore, MD5 cracking is not necessarily a bad thing.

Or the old man is right: Fu Yi Fu

BTW. ^ O ^

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

My views on Wang Xiaoyun's MD5 cracking

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

My views on Wang Xiaoyun's MD5 cracking

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support