Web site using distributed deployment, read and write separation, and write to the database server has multiple, login registered server currently has one, the user table temporarily has 10, each table is expected to store 20 million.
Excuse me:
First, when the user registers, how to query whether the user name already exists? What are the good and efficient solutions?
Second, how to design the program, put the user to the user to store the corresponding table?
Third, when the user logged on, how to know the user in which table?
The problem is more difficult, urgent, I hope you can help invite the big God answer, thank you.
Reply content:
This thing is simple and practical and complex to operate. It is recommended to take a look at eventual consistency wiki page.
If you confirm 10 tables, n years will not change, that's too easy. Use the user name as a 36-digit number (26 letters + 10 digits), and the direct mod 10 take-up determines which user table to read and write.
But first you know where the pit is. Specific to see eventual consistency bar ... I'm going to end this problem.
In general, the user login using the user name or mailbox + password way to request login verification.
Suppose we have 10 tables, user_0,user_1,user_2, ... user_9.
Suppose we have several users, username username1,nameuser2123, username 12, kill God.
How do we find out if a user name exists when the user "kills God" to log in? We will first make an int hash of the user name, similar to MD5, MD5 is to convert a piece of data into a 32-bit 16 binary string, and an int hash is an int value. int hash See: Hash algorithm-HQD_ACM's column
Then we will balance this int hash value to get a 0-9 number. Here, we have solved the following three questions:
First, when the user registers, how to query whether the user name already exists? What are the good and efficient solutions?
Second, how to design the program, put the user to the user to store the corresponding table?
Third, when the user logged on, how to know the user in which table?
Show me code:
function get_user_table_name($username, $prefix = 'user_', $count = 10){ return $prefix . abs(crc32($username)) % $count;}var_dump(get_user_table_name('username1'));var_dump(get_user_table_name('nameuser2123'));var_dump(get_user_table_name('用户名12'));var_dump(get_user_table_name('逆天杀神'));/** * dump: * string(6) "user_2" * string(6) "user_8" * string(6) "user_5" * string(6) "user_9" */
Laxatives
===========
MD5 + Hexdec, combined with the above answer, the hint has been enough ~ The user name hash, then mod, you know it should be in which table.
Then determine if the user name exists and need to be checked only once.
-------------------------------------------------------------------
This kind of apart begins to irony where the ethos comes from, and a good answer to a question will die?
I've put the useless answer to the point. Objection + No Help the first question, when stored with a hash, this can be quickly found.
The second problem is to build an index solution.
The third problem, using an indexed query, is that users do not need to know which table to access and which file system. Index to take care of this aspect of things. This scale does not need to be divided into tables, memcached add a user name query cache is resolved. If you want to divide, the first hash to achieve the model, the back of the top can not be demolished it. 200 million really need to divide the table???
For performance reasons only, if you are using Oracle, there is ASM, and even partitioning is not necessary. Haha, this is very simple!
The first problem does not say, check the weight there are many ways to answer already many, the main problem should be able to solve;
The second problem, take each user name of the first self-mother, the user table is split into 26 (must be ten table words also does not matter, big not a table to save three self-mother), the corresponding user name in the relative to the first letter of the table, this is a lot of benefits, in the case of large user base to maintain the basic balance of And no confusion, speed up the query speed, reduce redundancy;
Question three: haha, through the problem of the second not already solved the problem three? When the user logs in, it is good to find the corresponding first self-master table.
..................... Divider line ....... ................
Summarize:
By the first letter of the user name split, directly solve the problem two or three, and the query speed has greatly improved, while the linear reduction of table redundancy, but also by the resolution of the table equalization problem.
To the problem of a check on the efficiency is also very helpful, directly check the table corresponding to the first, so that the total amount of queries to the original 1/26, if you do some optimization of each table, the efficiency will be greatly improved.
Haha, that's probably it!
If there is a wrong place, welcome to the harassment, welcome to the private messages ... (๑ ̀ㅂ ́) ✧ do not over-design you what project can have 200 million of registered users???
If you don't die more than tens of millions of, don't be so flashy.
1. Repeat, use the cache exits the user name directly to do key
2. Generate distributed unique Id,hash by the time stamp (registration) to take the model to go to the table or directly by the time interval sub-table
The simplest is to direct the user name hash to the table, the third step to facilitate the 3-minute table lookup.
3. How to find the watch when checking a user? It depends on what conditions you are looking for. User name, simple.