In the first half of this paper, the significance of KDF function, the second part discusses the feasibility of KDF calculation in front end.
Objective
Almost every once in a while, you will hear the "XX site is towed Library" news. Then there will be reports that analyze what the most used passwords are, how many, and so on.
It is well known that passwords are usually stored in the database as Hash values, and salt is added. Even if the attacker knows the specific Hash algorithm, it can only be brute force. This is supposed to be extremely laborious, but in reality there are always a lot of passwords being cracked, what makes security so fragile?
The reason is that these two points: password password, algorithmic cost.
Password passwords
Passwords can be written in many places. The most common is to remember in your head. Of course, you can also remember the items that belong to you, such as small books, cards, and so on, regardless of the brain to remember, rather than set up a very long and messy, for example:
QQ: n5Py 2r8W qGyg 4tU6GMail: 3TkS mVwQ hUrs wtmA...
This kind of meaningless long string as password, is very safe. Even if their Hash value and algorithm leaks, the attacker wants clear text, only violent exhaustive all combinations:
泄露的值是 BF656DEC5DD8BA0B,泄露的算法是 f(x)。开始穷举...尝试组合 f(x) 结果aaaa aaaa aaaa aaaa 02F49B3EA5592B14 ×aaaa aaaa aaaa aaab BD4E960D990DA3F3 ×... n5Py 2r8W qGyg 4tU5 4CEA28A904326A26 ×n5Py 2r8W qGyg 4tU6 BF656DEC5DD8BA0B √
Even if there are only letters and numbers, it must be nearly 10^28 times to guess. This is an astronomical, almost impossible. Therefore, this type of password is still very safe.
But there is not much to be done in reality. Items need to be carried, very inconvenient, if lost or stolen, it is more troublesome. Unless they are all memorized, but this does not go back to "remember in the head" this way!
The head is really safe, but the capacity is limited. Like the irregular string above, it is difficult to recite a word, let alone a number. Therefore, everyone will choose some meaningful, regular string as a password, such as, iloveyou2016
qwert12345
or mobile phone number, birthday and other combinations. This is a string without rote, which is the password (pass word).
Although the password is convenient, the flaw is also obvious: because it is regular, it is easier to guess. An attacker could just test a common word combination and probably guess:
泄露的值是 2B649D47C4546A3E,泄露的算法是 f(x)。开始跑字典...尝试组合 f(x) 结果...qwert yuiop 52708233CFFD6BFD ×qwert asdfg CD07933880702B97 ×qwert zxcvb 343F78782D73AB3A ×qwert 12345 2B649D47C4546A3E √
This process is called a "running dictionary". A good dictionary can greatly improve the probability of guessing.
Algorithmic cost
Speed is especially important in the same dictionary situation. How many times per second can you guess? This depends on the specific algorithm.
For example, the MD5 function, which takes approximately 1 microseconds per call, means that you can guess 1 million times a second! (And this is only a single-threaded speed, with more concurrency is more scary)
Thus, the faster the algorithm, the more favorable for the cracker. If it takes 10 milliseconds per call, you can guess only 100 times a second, so it's 10,000 times times slower!
Unfortunately, the usual Hash functions are fast. Because they are inherently multi-purpose, they are not designed for password processing. For example, to calculate the checksum of a large file, the speed is obviously important.
Therefore, it is unreasonable to use "fast function" such as MD5 and SHA256 to process the password. Includes some simple variants, such as MD5 (SHA256 (x)), which still belong to the "fast function". Once the hash value and algorithm leak, it is easy to be "run dictionary" hack.
In reality, because many sites use the "fast function" to handle passwords, so the database leaks, a large number of password is restored is inevitable.
Increase costs
Although the Hash function executes very quickly, we can execute a large number of times over and over again, so the overall time is longer. For example:
function slow_sha256(x) for i = 0 to 100000 x = sha256(x) end return xend
In cryptography, this approach is called stretching. There are many scenarios in the real world, such as pbkdf2--, which does not design a new algorithm, but encapsulates existing functions, which is more appropriate for password processing:
function pbkdf2 (fn< Span class= "OT" >, ..., iter for i = 0 to iter = fn (xend return xend
It has an iterative parameter that specifies the number of repeated hashes-the more iterations, the longer the execution time, the harder it is to crack.
PBKDF (password-based key derivation function, password-based key export functions), as the name implies, is to enter "password" (regular string) output "secret key" (irregular long string) function, and the calculation process will consume a certain amount of resources. is essentially a Hash function, the output is called DK (derived key).
Front-end stretching
The more you stretch, the more secure it is, but it's at the expense of a lot of computing resources on the service side! To be able to compromise between security and performance, typically only dozens of to hundreds of milliseconds of calculation time is selected.
The computing load on the server is so heavy that it is overwhelmed, and today's clients have a pervasive surplus of system resources. Is it possible for users to share some of the computational capacity?
It doesn't sound plausible. After all, the front end means exposing, exposing password-related algorithms without creating security issues.
First of all, the traditional Web site is how to handle the password-the front end usually do nothing, just for submission, passwords are processed by the backend:
Now, we are trying to transform the front end-when the user commits on a page such as registration, login, and so on, no longer sends the original password, but the DK of the password:
Back end, no changes are made. (Of course, this will affect the use of existing accounts, here for the time being not considered, assuming this is a new site)
This way, even if the user's password is simple, the corresponding DK is still a meaningless long string . With the DK hash value, it is extremely difficult to restore the DK. (mentioned at the beginning of this article)
Of course, attackers are more interested in not DK, but passwords. This can be cracked-just combine the front and back algorithms to form a new function:
F(x) = server_hash(client_hash(x))
Use this final function F to run a dictionary, or you can guess the password:
尝试组合 耗时 F(x) 结果...qwert yuiop 1s 1C525DC73898A8EF ×qwert asdfg 1s F9C0A131F43F1969 ×qwert zxcvb 1s 08F026D689D26746 ×...
But there is client_hash this barrier, the speed of the crack is greatly reduced!
So, we need:
A slow client_hash, increasing the cost of running the dictionary
A quick server_hash to prevent the DK from leaking
In this way, the vast majority of calculations can be transferred to the front end, the back end requires minimal processing, you can achieve a high-intensity password protection system.
Against the Budget
As the front end of everything is public, so client_hash algorithm everyone knows. An attacker can calculate the DK of a common password in advance and compile it into a new dictionary. In the future, you can save a lot of time by running this "new dictionary" after you drag the library.
In this way, it is necessary to use "salt" processing (in fact PBKDF itself needs to provide salt parameters). For example, select the user ID as the salt:
function Client_hash(Password, Salt) {return PBKDF2 (sha256 1000000) ;< Span class= "OP" >}client_hash ( ' 888888 ' [email protected] ') //b80c97beaa7ca316 ... client_hash ( ' 888888 ' [email protected] ') //465e26b9d899b05f ...
This way, even though the passwords are the same, the resulting DK is different for different users . An attacker can only generate dictionaries for a specific account, and the scope of the application is much smaller.
Further, we can even add the "website ID" to the Salt:
function Client_hash(Password, Salt) {return PBKDF2 (sha256 1000000) ;< Span class= "OP" >}client_hash ( ' 888888 ' [email protected]/www.site-a.com ') //77a1b139aa93ac8b ... client_hash ( ' 888888 ' [email protected]/www.site-b.com ') //fab6b82e6a1d17d7 ...
This way, even the same "account password", the DK generated on different sites is not the same!
Study questions: ID is public, can you choose a hidden field as the salt of client_hash?
DK leak
The DK is born in the front end and the back end is no longer present after its Hash, so it is a temporary value. Ideally, it won't leak.
But in some cases, DK is still possible to leak. For example, server poisoning, network transmission is eavesdropping, etc., can lead to DK leakage.
After the DK is compromised, the attacker can control the account, which is unavoidable. Fortunately,DK is just a meaningless long string, and the attacker does not know what the meaningful "password" behind it is . So other accounts that use similar passwords are spared!
An attacker would have to use the Client_hash algorithm to run a dictionary to restore the password through the DK-the cost remains high. Compared to the previous "final function F", just one less server_hash. (Server_hash would have been quick and negligible). So even if the DK leaks, the difficulty of cracking the password is basically not reduced.
"The account is stolen, the password cannot get", this is the meaning of "front end Hash".
Extra meaning
The stretching calculation of the front end makes the user log on the system resources. This side effect, in fact, can also play a certain defensive effect.
For the average user, logging in for a few extra seconds may not be a big deal, but it will be a huge expense for those who log in frequently. Who will log in extremely frequently? This is likely to be a "pool attacker"-they get a bunch of account passwords from somewhere else and then come here and hit the odds to see how much they can climb.
Since we are using the DK login, so the attacker must also be the password to be measured, first calculate the DK and then submit, so it will add a lot of computational costs. This is a bit like the proof-of-work of the previous article.
Just as stretching a spring requires a lot of energy, stretching a Hash also requires a real force of effort.
Optimize your Experience
In fact, as long as the design is reasonable, it is possible to minimize the wait for "stretch calculation". For example, when the user finishes the account and password, the program starts calculating the DK immediately, rather than waiting until the commit. If the website has a verification code, it can be calculated at the same time as the user input. This can dramatically improve the user experience.
Summary
The 3 types of passwords mentioned in this article:
type |
Security |
Ease of Use |
Description |
Meaningless long string |
High |
Poor |
It is difficult to remember that it can only be stored externally, increasing the cost of storage |
A regular password. |
Low |
Good |
Easy to remember, but also easy to be "run dictionary" attack |
Password turned into secret key |
In |
Good |
Ditto. But the conversion process is time consuming, increasing the cost of running the dictionary |
Two kinds of Hash functions:
Fast function (MD5, SHA256, etc., input data "very long very irregular" when used)
Slow functions (PBKDF2, bcrypt, etc., used when the input data is "easy to guess")
About "The front-end Hash", actually is "0 knowledge proof" (Zero-knowledge proof) one kind. What is the 0 knowledge proof, which applies a classic example:
You have a treasure trove that you can open by reading spells. One day you want to prove to your friends that you can open a treasure trove, but don't want him to hear the curse. How does this work?
Well, he knows that there is a unique treasure in your treasure trove, and if you can show it to him, it will naturally prove that you can open it. So you don't have to bring him to the scene, open the treasure alone and show it to him. Thus, you can prove that you can open the spell without revealing it.
This is the 0 proof of knowledge--the proof that the authenticator believes that the assertion is correct without revealing "any useful information".
Practical application
The "front-end Hash" approach is common in the password management plug-in: The user's password is no longer sent back, but only used to generate the DK. Then, according to the DK, to different accounts to generate different passwords. This will not expose your password even in the worst case scenario, such as server poisoning, transmission eavesdropping, back-end plaintext storage, and so on. A maximum loss of an account, without affecting other accounts!
(in addition, the password is no longer filled in the original text box, but is filled in the plug-in interface, the plugin to calculate the DK automatically fill in the original text box.) This can reduce the risk of password leaks, such as malicious scripts that may lurk in web pages
Web Apps
In reality, however, there are not many sites built into the front-end KDF calculations. Unlike plug-ins can invoke local programs, high performance and stability, the browser is mixed, different version performance varies greatly, so the computational time is very unstable.
In addition, for a long time (IE ERA), the browser's computing power is very low, so that everyone has retained the front-end computing significance, "everything from the back-end computing" concept. But now the mainstream browser performance has been greatly improved, even HTML5 introduced the Webcrypto specification, JS can directly invoke the browser's built-in cryptographic algorithm library, including PBKDF2. The performance is very high because it is native-implemented. Here's a simple demo:
Https://etherdream.github.io/FunnyScript/pbkdf2/test.html
(using Asm.js, Flash, webcrypto several scenarios, calculate 1 million iterations of the test)
Better KDF.
Of course, as a password hash function, PBKDF2 is not the best, because it simply applies the existing Hash function, rather than the targeted design.
2015 Password Hashing Competition, the winner of the--argon2, is very advanced. It not only sets the time cost (iteration count) but also sets the space cost (memory footprint). It also supports multi-threaded computing, so that attackers also have to invest the same computational power to crack!
Of course argon2 is too new, currently not included in the Webcrypto. However, it is worth trying to transplant it into a javascrpt version.
Browser password hack