Yesterday afternoon lunch, a news on Sina caught my attention. In the news, a web-site database called Sesame Finance was compromised, and the passwords recorded in the database were hashed only once. Although I did not break its white hat, I did not study the leaked data carefully, but if the report said it was true, the issues mentioned were actually very serious. So I think it's necessary to make a simple introduction to how to save passwords in the system and why.
In this article, I will briefly introduce the current more popular means of attacking passwords, so that we can understand what we normally think of as "safe" password protection means actually vulnerable to hackers. I will also introduce the precautionary measures of these attacks, as well as the industry's more recognized password preservation method, so that people make the site can be more secure to save the user's password.
A protracted war
I remember the first time I contacted the internet was probably the fall of 1999, that year I third day. Third day for me there are two things to do: Prepare for the test, but also to prepare for the National science Experimental class. In order to save all the contest papers I found online, I applied for the first mailbox, and the password for the mailbox was only 4 letters. After that, a variety of password loss events gradually increased, such as Tencent QQ stolen, personal mailbox stolen and so on. So in about 03 years, the site began to have a mandatory requirement for the length of the password. The reason is simple: to prevent brute force.
Then there are more and more sophisticated methods, such as xss,csrf,session fixation and so on. and the corresponding security methods are gradually improved, such as the Referer first class added in the HttpOnly cookie,http request. This series of attack and defense is surrounded by the user's credentials, such as session ID, the user's password. Because of them, hackers can disguise themselves as legitimate users to quietly execute a series of illegal acts, thereby gaining benefits.
Therefore, how to safely protect the user's credentials is actually the most need to pay attention to the issue of each website. Wherever the credentials are used, we need to provide protection of the credentials, such as when the user enters the password, the user after logging on to the site, the credentials on the network transmission, and the password in the database and so on. In these scenarios, the methods for protecting credentials are different.
When the user enters the password, the hacker can listen to user input to complete the user name/password pair theft. After the user logs in, if the cookie is recorded in the user name/password pair, then the malicious person can also through XSS (see my other blog), CSRF and other ways to attack. While the credentials are transmitted over the network, a malicious person can steal the user's credentials by means of an intermediary. As you can see, in order to ensure the security of user credentials, a website needs to protect the user's credentials in different dimensions.
It is more important for the entire Web site that passwords are stored correctly in the database. In general, websites protect their databases as often as possible to prevent the disclosure of private information or malicious changes in data. However, once these defenses fail, the user name/password information recorded in the database will be exposed directly to the malicious user's eyes. So we encrypt these passwords to build the last line of defense that protects the user's credentials. This line of defense is particularly important. Because once the password encryption method is compromised, then the user name/password pairs obtained by the hacker can be compromised. For this website, this will allow illegal users to use these compromised username/password pairs to perform illegal operations, such as fraudulent transactions, account transfers, etc. As far as users are concerned, since most users are accustomed to using the same set of username/password pairs in multiple sites, the disclosure of credentials in a Web site is likely to compromise his credentials on other sites, resulting in the simultaneous theft of accounts in multiple sites. Therefore, even if the privacy requirements of our website is not high, we also need to properly save the password set by the user.
Insecure password storage methods
The least secure way to store passwords is to use clear text to store passwords in the database. That's what our cutest csdn did. The plaintext password means that the password corresponding to the user account is recorded in the database without any processing. At the very least, there is a downside, that is, if a person has access to the database, then he can easily see the user name and its corresponding password. From a security standpoint, this type of password storage is not trusted, as there is no guarantee that anyone with access to the database will not actively disclose these usernames/passwords for benefit.
Even if the database administrator can be trusted, these passwords can also be compromised by an inadvertent fault of the database administrator. For example, when a database administrator looks at a database and leaves the computer for a glass of water, but does not lock the computer while away, others can view the database table to get a list of username/password pairs.
Of course things can get worse: if there is a SQL injection vulnerability on a Web site, hackers can use attacks such as SQL injection to get the data recorded in the user name and password columns in the database. If the data is not processed, the malicious user will be given the password for that user name directly.
So how do we record these passwords? It is easy for readers to think of calculating their hash values. Yes, this is also the most commonly used tool in the industry today. So is it possible for us to choose a hashing algorithm randomly? The answer is in the negative. This is because there has been a lot of attack methods for hash calculation results.
Simply imagine that the site has a SQL injection vulnerability. It is very likely that a malicious attacker could use the SQL injection vulnerability to obtain a sequence of passwords that are encrypted by the user name and the simple hashing algorithm. The data may be tens of thousands, or even millions. Now all he knows is that the password is hashed, and the hash algorithm and input used in the hashing process are not known.
Next, his job is to list a range of possible cryptographic algorithms and the most commonly used passwords. What you need to believe here is that you can think of a hashing algorithm that an experienced hacker can definitely think of. The usual password, search the Internet for "the most common password", or find a password dictionary is enough. In a website that contains tens of thousands of users, there is a chance that a password like "12345qwer" appears in the near 100%.
OK, now his work begins: Select one of the most commonly used passwords from the password dictionary and encrypt them separately through the hashing algorithms that he collects. Next, compare these encryption results with the hashed passwords that he just got. In this procedure, the result of a hash calculation can be compared with the hash value of multiple passwords that are obtained. When the number of samples is very large, the hit probability of the result will become very large. This is one of the great advantages of an attacker in executing an attack: Executing an attack attempt in parallel, based on a very large number of samples.
Once the result of a hashing algorithm and the hash value of any one of the passwords match successfully, then the current encryption algorithm is very likely to be used by the system encryption algorithm. After multiple attempts with different common passwords, the cryptographic algorithms you use are already obvious to the hacker's eyes. If this encryption algorithm is bidirectional, then it can be said that the attacked system has completely fallen.
This attack can even be accelerated: before attempting to guess the encryption algorithm, the malicious user first calculates the corresponding hash value for these common algorithms and common passwords, and compares them directly with the resulting hashed password. This even omits the time to compute the hash and makes the hash calculation results reusable. This attack has a special name: Rainbow Table Attack.
Expand defenses
Does it feel a little scary? Don't worry, there's a defense against it. The battle between offense and defense is the most interesting thing in the field of security.
Now let's think about the necessary conditions for the success of this attack: a cryptographic algorithm that hackers can guess, and a password that hackers can guess as well. These two necessary conditions are based on probability: the probability of the hacker guessing encryption algorithm is higher, and in the case of large number of users, there is a very large probability of the existence of a password like "12345qwer" in the system. To stop hackers from attacking, we need to make our site no longer necessary.
For the first requirement, our solution is not to create a new encryption algorithm. This is completely unsafe from a security standpoint. In the following chapters we will explain why this practice is unsafe. And the solution we need is for hackers to guess the encryption algorithm we use, which is to encrypt using a variety of encryption algorithms. So even the same password will produce different results, reducing the probability that the hacker guessed the hash algorithm. For the second requirement, we can enhance the password itself. The enhancement algorithm needs to maximize the complexity of the password itself but does not generate a password collision (that is, the complexity of the two passwords eventually become the same password). This way, even if the hacker guessed the enhanced password and the algorithm used to encrypt it, he could not know what the original password was like. This step, the industry's recommendations are to be done by the standard class library. The concern is actually the same as not creating a self-created cryptographic algorithm.
Then the hacker left one attack method, that is the hard guess, that is often said violent hack (Brute force Attack). One-to-one guess is a stupid method, but as the number of guesses increases, the probability of guessing the password will gradually increase. To avoid this attack, we need to prevent hackers from quickly calculating the hash value for the password. A simple hash function, such as MD5, can run millions or billions of times per second on a modern device. That is, if we use a simple hash function, we can have thousands of hashes in a second to participate in the comparison. The result is that the user's password is likely to be guessed within a few 10 seconds. Therefore, the method of encrypting passwords needs to be slow to increase the difficulty of this attack.
But there is also a weapon in the hands of hackers, which is parallel computing. Now, building a system that can perform parallel computing is no longer so expensive, or even requires a common GPU that supports parallel computing. As a result, the hash of the password is calculated slowly, and the hacker can accelerate by simultaneously calculating the hash value of multiple passwords, which increases the rate of brute force attack by dozens of times times. The workaround is simple: Select a hashing algorithm that does not support parallel computing.
OK, now it seems that we have defended the attack methods commonly used by hackers. So let's summarize the characteristics that the cryptographic hashing algorithm needs to have:
- The hashing algorithm needs to be unidirectional. Because once a two-way hashing algorithm is used, strings that are obtained by inverse calculations often contain only numbers, letters, and commonly used symbols. This is a very obvious hash algorithm in the eyes of the hacker to guess (close) the signal of success. Next, he only needs to reverse compute the hash value to get the corresponding password.
- The hashing algorithm requires as few collisions as possible. Because if n different passwords can produce the same hash value, then the probability of its being breached is 1 time times greater.
- Slows down the calculation of the hash. This requires not only slowing down the computation of the hash, but also making the hash not support parallel computations.
- Salt. Salt is the component that we just mentioned to use in cryptographic systems to select a hash function and enhance the password.
Salt Introduction
I believe the reader is not very understanding of the salt mentioned above. For example: What is the value contained in salt? How to use? Where is it stored?
A more popular way to generate a salt is to get a 128-bit or longer random shaping data and convert it to a string, or use a randomly generated UUID. When the user first sets the user name and password that it uses, a salt is generated for it, and the hash value of the user's password is computed using the salt and the system's encryption method, and the hash value and salt exist in the database. When the user logs in again, the system calculates the hash value based on the password entered by the user, and again using the system's encryption method for the user's generated salt, and compares the result with the hash stored in the database. If the two hashes are equal, then the password entered by the user is correct and the login succeeds.
Because this salt is used every time the password is manipulated, we often store it in the database with the username/password pair. During the encryption process, the salt will also act as an input to the encryption method we are using to select the hash function used in the encryption method and to enhance the password used by the user. This makes it possible to disable dictionary attacks on a range of passwords and rainbow table attacks.
Perhaps you still don't quite understand how it makes dictionary attacks and rainbow table attacks ineffective. So let's assume that now a hacker has got a series of username/password hash combinations. When attacking, he first selects a possible password password and uses the selected hash function hash () for cryptographic operation Hash (password) and compares it with all password hashes. Once successful, you can basically determine the right hashing algorithm. And if a Salt is used in the hashing process, then the information he gets is the username/password hash value/salt. Because each user's salt is not the same, he needs to calculate the hash value based on the salt value of each user, that is, hash (salt[0], password), hash (salt[1], password), and so on. This makes it no longer possible to compare multiple hashes with a single calculation. And because the hashing algorithm is relatively slow, the time required for a successful hacker attack is greatly increased.
As already mentioned, the Rainbow table attack shortens the time by calculating the hash value of each possible password in advance. Now the salt that participates in the encryption is a random string, such as "js98lp6h", and obviously the possible passwords listed in Rainbow table will not contain this form of password, which invalidates the Rainbow table attack.
A common misconception about salt is that the use of salt can increase the difficulty of cracking a single password. Actually, it's not. In general, salt is present in the database as if it were a hashed password. As a result, a malicious person can access the salt that the hash needs to use while accessing the user name/password hash pair that is logged in the database. Therefore, when attempting an attack, it can be used to compute the hash value directly in combination with each of the possible passwords that are listed in the Salt and password dictionary. The defense against this attack is done by slowing down the hash calculation, while the salt is used to protect against parallel attacks, where a hash can be compared with multiple hashes at a time.
Choosing the right Encryption method
In fact, there are many ways to encrypt passwords, such as Pbkdf2,bcrypt,scrypt, in the industry. These encryption methods have their merits and demerits. So when we need to protect the user's password, we need to choose from these standard encryption methods as much as possible. When using these encryption methods, you also need to specify a number of parameters, such as number of iterations. These parameters are confidential to the Web site itself, so do not put them in the database, so as to avoid the loss of these settings when the database leaks, resulting in the use of these cryptographic algorithms to disclose the details of the encryption method to reduce the security.
Conversely, it is not actually recommended to define a cryptographic method by itself. In fact, designing a cryptographic algorithm is a very serious matter. The various cryptographic algorithms we know, such as the SHA algorithm set, have actually undergone rigorous argumentation to be declared safe. First of all, cryptography experts have to go through a very detailed study of the encryption scheme, and then in the industry's various discussions, these encryption schemes will be compared to each other, competition, and finally after several years of verification is declared safe.
It is because we are not experts in cryptography that we create cryptographic algorithms that are very insecure compared to the rigorous testing of each cryptographic algorithm.
One of the more confusing ones is collisions in cryptographic algorithms. We often say that MD5 is no longer considered a secure cryptographic algorithm. This is because a malicious person can easily find a series of inputs so that the MD5 they produce is the same. This is unsafe in a range of validation areas, such as file checksums. Because at the time of the MD5 checksum, the malware can bypass the MD5 checksum by making its MD5 equal to the target file. However, the cryptographic algorithm requires that the encrypted password cannot be parsed backwards after being encrypted, so it is still a secure encryption algorithm. It is not recommended to be used alone because it is computationally fast.
This article is carried out by a professional law firm copyright protection. The scope of authorization is limited to individual reprint and must be reproduced in the title
Reprint please specify the original address: http://www.cnblogs.com/loveis715/p/4417526.html
Commercial reprint please contact me in advance:[email protected]
Save your password--from the breach of the sesame Financial.