Data on servers and databases is occasionally stolen. Therefore, it is necessary to ensure that important user data, such as passwords, cannot be obtained by others. Here we will discuss the principles of Hash and how it protects the password security in Web applications.
1. statement
Cryptography is a complex topic, and I am not an expert in this field. Many colleges and research institutions have long-term research in this regard. In this article, I hope to show you a secure way to store Web application passwords in a simple and easy-to-understand way.
2. what is "Hash?
"Hash converts a piece of data (small data or big data) into a relatively short piece of data, such as a string or integer ."
This is accomplished by the unidirectional hash function. The so-called one-way approach means it is difficult (or actually impossible) to reverse it back. An example of a common hash function is md5 (), which is popular in various computer languages and systems.
The code is as follows:
$ Data = "Hello World ";
$ Hash = md5 ($ data );
Echo $ hash; // b10a8db164e0754105b7a99be72e3fe5
The result calculated using md5 () is always a 32-character string, but it only contains hexadecimal characters. Technically, it can also use 128 bits (16 bytes) the integer number. You can use md5 () to process long strings and data, but you always get a fixed-length hash value, this may also help you understand why this function is "one-way.
3. use the Hash function to store passwords
Typical user registration process:
Enter the registry form, which contains the password field;
The program stores all user information in the database;
However, passwords are encrypted by the hash function before they are stored in the database;
The original password is no longer stored anywhere or discarded.
User logon process:
Enter the user name and password;
The program encrypts the password by registering the same hash function;
The program checks the user from the database and reads the hash password;
The program compares the user name and password. if the password matches, the user is authorized.
We will discuss how to choose an appropriate method to encrypt the password later in this article.
4. Question 1: hash collision
Hash collision means that two different contents are hashed to get the same hash value. The possibility of hash collision depends on the hash algorithm used.
How to generate?
For example, some old programs use crc32 () to hash the password. this algorithm generates a 32-bit integer as the hash result, which means that only 2 ^ 32 (that is, 4,294,967,296) possible output results.
Let's hash a password:
The code is as follows:
Echo crc32 ('supersecretpassword ');
// Outputs: 323322056
Now we assume that a person steals the database and obtains the hash password. He may not be able to restore 323322056 to 'supersecretpassword'. However, he can find another password and hash the same value. This only requires a simple program:
The code is as follows:
Set_time_limit (0 );
$ I = 0;
While (true ){
If (crc32 (base64_encode ($ I) = 323322056 ){
Echo base64_encode ($ I );
Exit;
}
$ I ++;
}
This program may have to run for a while, but eventually it can return a string. We can use this string to replace 'supersecretpassword' and use it to successfully log on to the user account using this password.
For example, after running the program on my computer for a few months, I got a string: 'mtixmjy5mtawng = '. Let's test:
The code is as follows:
Echo crc32 ('supersecretpassword ');
// Outputs: 323322056
Echo crc32 ('mtixmjy5mtawng = ');
// Outputs: 323322056
How can this problem be solved?
Now a slightly stronger home PC can run billions of hash functions in one second, so we need a hash function that can generate a larger range of results. For example, md5 () is more suitable. it can generate 128-bit hash values, that is, there are 340,282,366,920,938,463,463,374,607,431,768,211,456 possible outputs. Therefore, it is generally impossible for people to perform so many cycles to find the hash collision. However, someone still finds a way to do this. for details, refer to the example.
Sha1 () is a better alternative because it generates up to 160-bit hash values.
5. Question 2: Rainbow table
Even if we solve the collision problem, it is still not safe enough.
"A rainbow table is a table created by calculating common words and their combined hash values ."
This table may store millions or even billions of data records. Currently, storage is very cheap, so you can create a very large rainbow table.
Now let's assume that a person steals the database and obtains millions of hash passwords. Attackers can easily search for these hash values in the rainbow table one by one and obtain the original password. Although not all hash values can be found in the rainbow table, they can certainly be found.
How can this problem be solved?
We can try to add some interference to the password, for example, the following example:
The code is as follows:
$ Password = "easypassword ";
// This may be found in a rainbow table
// Because the password contains 2 common words
Echo sha1 ($ password); // 6c94d3b42418febd4ad747801d50a8972022f956
// Use bunch of random characters, and it can be longer than this
$ Salt = "f # @ V) Hu ^ % Hgfds ";
// This will NOT be found in any pre-built rainbow table
Echo sha1 ($ salt. $ password); // cd56a16759623378628c0d9336af69b74d9d71a5
What we do here is to append an interference string before each password for hash, as long as the appended string is complex enough, the hash value must not be found in the pre-built rainbow table. However, it is still not safe enough.
6. Question 3: Rainbow table
Note: The rainbow table may be created after the string is stolen and collapsed. The interfering string may also be stolen together with the database. then they can use this interfering string to create a rainbow table from the beginning. for example, the hash value of "easypassword" may exist in a common rainbow table, however, in the New Rainbow table, the hash value of "f # @ V) Hu ^ % Hgfdseasypassword" also exists.
How can this problem be solved?
We can use a unique interference string for each user. One available solution is to use the user's id in the database:
The code is as follows:
$ Hash = sha1 ($ user_id. $ password );
The premise of this method is that the user id is a constant value (usually used in applications)
We can also randomly generate a unique interference string for each user, but we also need to store this string:
The code is as follows:
// Generates a 22 character long random string
Function unique_salt (){
Return substr (sha1 (mt_rand (), 0, 22 );
}
$ Unique_salt = unique_salt ();
$ Hash = sha1 ($ unique_salt. $ password );
// And save the $ unique_salt with the user record
//...
This method prevents us from being compromised by the rainbow table, because each password is disturbed by a different string. Attackers need to create a rainbow table with the same number of passwords, which is impractical.
7. Question 4: hash speed
Most hash algorithms take the speed into consideration during design because they are generally used to calculate the hash value of big data or files to verify the correctness and integrity of data.
How to generate?
As mentioned above, a strong PC can operate billions of times per second, and it is easy to use brute force cracking to try every password. You may think that a password with more than 8 characters can be prevented from being cracked, but let's see if it is like this:
If the password can contain lower-case letters, upper-case letters, and numbers, 62 (26 + 26 + 10) characters are optional;
There are 62 ^ 8 possible combinations of 8-bit Passwords. this number is slightly greater than 218 trillion.
It takes 60 hours to calculate the hash value at a rate of 1 billion times per second.
A 6-digit password is also a common password, which can be cracked in just one minute. It may be safer to require a password of 9 to 10 characters, but some users may feel very troublesome.
How can this problem be solved?
Use a slower hash function.
"If you use an algorithm that can only run 1 million times in one second under the same hardware conditions to replace 1 billion times in one second, the attacker may need 1000 times of time for brute force cracking, 60 hours will only change to 7 years!"
You can implement this method by yourself:
The code is as follows:
Function myhash ($ password, $ unique_salt ){
$ Salt = "f # @ V) Hu ^ % Hgfds ";
$ Hash = sha1 ($ unique_salt. $ password );
// Make it take 1000 times longer
For ($ I = 0; I I <1000; $ I ++ ){
$ Hash = sha1 ($ hash );
}
Return $ hash;
}
You can also use an algorithm that supports "cost parameters", such as BLOWFISH. In php, you can use the crypt () function to implement:
The code is as follows:
Function myhash ($ password, $ unique_salt ){
// The salt for blowfish shocould be 22 characters long
Return crypt ($ password, '$ 2a $10. $ unique_salt ');
}
The second parameter of this function contains several values separated by the "$" symbol. The first value is "$ 2a", indicating that the BLOWFISH algorithm should be used. The second parameter "$10" is the cost parameter here, which is the base 2 logarithm, indicating the number of iterations (10 => 2 ^ 10 = 1024 ), the value ranges from 04 to 31.
For example:
The code is as follows:
Function myhash ($ password, $ unique_salt ){
Return crypt ($ password, '$ 2a $10. $ unique_salt ');
}
Function unique_salt (){
Return substr (sha1 (mt_rand (), 0, 22 );
}
$ Password = "verysecret ";
Echo myhash ($ password, unique_salt ());
// Result: $ 2a $10 $ dfda807d832b094184faeu1elwhtR2Xhtuvs3R9J1nfRGBCudCCzC
The hash value of the result contains the $ 2a algorithm, the cost parameter $10, and a 22-bit interference string. The rest is the calculated hash value. let's run a test program:
The code is as follows:
// Assume this was pulled from the database
$ Hash = '$ 2a $10 $ dfda807d832b094184faeu1elwhtR2Xhtuvs3R9J1nfRGBCudCCzC ';
// Assume this is the password the user entered to log back in
$ Password = "verysecret ";
If (check_password ($ hash, $ password )){
Echo "Access Granted! ";
} Else {
Echo "Access Denied! ";
}
Function check_password ($ hash, $ password ){
// First 29 characters include algorithm, cost and salt
// Let's call it $ full_salt
$ Full_salt = substr ($ hash, 0, 29 );
// Run the hash function on $ password
$ New_hash = crypt ($ password, $ full_salt );
// Returns true or false
Return ($ hash = $ new_hash );
}
Run it and we will see "Access Granted !"
8. Integration
Based on the above discussions, we have written a tool class:
The code is as follows:
Class PassHash {
// Blowfish
Private static $ algo = '$ 2a ';
// Cost parameter
Private static $ cost = '$10 ';
// Mainly for internal use
Public static function unique_salt (){
Return substr (sha1 (mt_rand (), 0, 22 );
}
// This will be used to generate a hash
Public static function hash ($ password ){
Return crypt ($ password,
Self: $ algo.
Self: $ cost.
'$'. Self: unique_salt ());
}
// This will be used to compare a password against a hash
Public static function check_password ($ hash, $ password ){
$ Full_salt = substr ($ hash, 0, 29 );
$ New_hash = crypt ($ password, $ full_salt );
Return ($ hash = $ new_hash );
}
}
The following is the usage at registration:
The code is as follows:
// Include the class
Require ("PassHash. php ");
// Read all form input from $ _ POST
//...
// Do your regular form validation stuff
//...
// Hash the password
$ Pass_hash = PassHash: hash ($ _ POST ['password']);
// Store all user info in the DB, excluding $ _ POST ['password']
// Store $ pass_hash instead
//...
The following describes how to log on:
The code is as follows:
// Include the class
Require ("PassHash. php ");
// Read all form input from $ _ POST
//...
// Fetch the user record based on $ _ POST ['username'] or similar
//...
// Check the password the user tried to login
If (PassHash: check_password ($ user ['pass _ hash'], $ _ POST ['password']) {
// Grant access
//...
} Else {
// Deny access
//...
}
9. whether encryption is available
Not all systems support the Blowfish encryption algorithm. although it is now common, you can use the following code to check whether your system supports it:
The code is as follows:
If (CRYPT_BLOWFISH = 1 ){
Echo "Yes ";
} Else {
Echo "No ";
}
But for php5.3, you don't have to worry about this because it has built-in implementation of this algorithm.
Conclusion
The password encrypted in this way is safe enough for most Web applications. However, do not forget that you can still allow users to use passwords with higher security, such as requiring a minimum number of digits and a mix of letters, numbers, and special characters.