This article will take "User Center" as an example, introduce "single key" business, with the gradual increase of data volume, database performance is significantly reduced, database level segmentation related architecture practices:
- How to implement horizontal slicing
- Common problems after horizontal segmentation
- Optimization ideas and practice of typical problems
First, the User Center
User Center is a very common business, mainly provide user registration, login, information query and modification services, the core metadata is:
User (UID, login_name, passwd, sex, age, nickname, ...)
which
- UID is User ID, primary key
- Login_name, passwd, sex, age, nickname, ... and other user attributes
Database design, generally in the early days of business, single-Library single-table can be able to handle this requirement, the typical architecture design is:
- User-center: User Center service, providing friendly RPC interface to callers
- USER-DB: Data storage for users
Second, User Center horizontal segmentation method
When the amount of data is getting bigger and larger, the database needs to be sliced horizontally, and the common horizontal slicing algorithm has "scope method" and "hash Method".
The scope method, which divides the data horizontally into two DB instances, is based on the UID of the business primary key of the User center:
- USER-DB1: Store UID data from 0 to 10 million
- USER-DB2: Store UID data from 1 to 20 million
The advantages of the Range method are:
- The segmentation strategy is simple, according to the UID, according to the scope, User-center can quickly locate the data in which library
- Simple expansion, if the capacity is not enough, just add user-db3 can
The deficiencies of the scope law are:
- The UID must meet the incremental characteristics
- Data volume is uneven, the new USER-DB3, in the initial data will be relatively small
- Request volume uneven, in general, the new registered user activity will be relatively high, so user-db2 tend to be higher than the USER-DB1 load, resulting in unbalanced server utilization
The hashing method is also based on the User Center's business primary key UID, which divides the data horizontally into two DB instances:
- USER-DB1: The UID data that stores the UID to modulo 1
- USER-DB2: The UID data that stores the UID to modulo 0
The advantages of the hashing method are:
- Segmentation strategy is simple, according to the UID, according to Hash,user-center can quickly locate the data in which library
- Data balance, as long as the UID is uniform, the distribution of data in each library must be balanced
- Demand balance, as long as the UID is uniform, the distribution of the load on each library must be balanced
The disadvantages of the hashing method are:
- Expansion trouble, if the capacity is not enough, to add a library, re-hash may lead to data migration, how to smooth the data migration, is a need to solve the problem
Third, the user Center after the level of the problems brought about
When using UID for horizontal segmentation, what is the problem with business access across the user Center?
For a query on the UID attribute can be routed directly to the library, assuming access to uid=124 data, after the modulo can be directly positioned Db-user1:
For queries on non-UID attributes, such as queries on the Login_name property, it is tragic:
Assuming that you access Login_name=shenjian data, you often need to traverse all the libraries because you do not know which library the data falls on, and the performance can be significantly reduced when the number of libraries is multiple.
How to solve the problem of query on non-UID attribute after the library, is the content that should be discussed emphatically.
Four, User Center non-UID attribute query demand analysis
Any architecture design that is out of business is bullying, and before the architectural discussion, the business is briefly analyzed to see what query requirements are on the non-UID attribute.
Based on the architectural experience of the landlord over the years, there are often two types of business requirements in the User Center non-UID attribute:
(1) User side, front desk access, the most typical there are two types of requirements
- User login: Query the user's entity through Login_name/phone/email, 1% requests belong to this type
- User information query: After login, through the UID to query the user's instance, 99% requests belong to this type
User-side query is basically a single record of the query, a large number of visits, services need to be highly available, and the requirements for consistency are high.
(2) Operation side, backstage access, according to product, operational needs, access patterns vary according to age, gender, avatar, login time, registration time to make inquiries.
Operation side of the query is basically a batch paging query, because it is an internal system, access is very low, the requirements for availability is not high, the requirements for consistency is not so strict.
What kinds of architectural solutions should be used to solve these two different business requirements?
Five, the User Center horizontal segmentation structure Idea
User Center in the case of large data volume, using UID for horizontal segmentation, for non-UID attributes on the query requirements, the core idea of architecture design is:
- For the user side, the "establish a non-UID attribute to the UID mapping relationship" schema scheme
- For the operational side, the "foreground and background separation" architecture should be used
Vi. User Center-user side best Practices
"Index Table Method"
Ideas: UID can be directly located to the library, login_name can not directly locate the library, if through login_name can query to the UID, problem solving
Solution:
- Establish an index table to record the mapping relationship of Login_name->uid
- When accessed using login_name, first query the UID through the Index table, then locate the corresponding library
- The index table has fewer properties, can hold very much data, and generally does not require a library
- If the amount of data is too large, you can divide the library by login_name
Potential shortfall: One-time database query, performance degradation
"Cache Mapping Method"
Idea: Access Index table performance is low, the mapping relationship in the cache performance better
Solution:
- Login_name query first to the cache to query the UID, and then locate the database according to the UID
- Assuming the cache miss, the login_name corresponding UID is obtained using the Sweep library method and placed in the cache
- Login_name to UID mapping relationship does not change, once the mapping relationship is put into the cache, will not be changed, no elimination, cache hit rate is super high
- Cache level Segmentation via login_name if the amount of data is too large
Potential shortfalls: Cache queries more than once
"Login_name Generate UID"
Idea: Do not make remote query, by login_name directly get UID
Solution:
- When the user registers, the design function login_name generates UID,UID=F (login_name) and inserts the data by the UID Sub-Library
- When accessed using login_name, the UID is computed first through the function, i.e. uid=f (login_name) again, routed from the UID to the corresponding library
Potential deficiency: This function design requires a very technical skill, with UID generation conflict risk
"Login_name gene into UID"
Idea: Cannot use login_name to generate UID, can extract "gene" from login_name, integrate into UID
Assuming 8 libraries, using UID%8 routing, the subtext is that the last 3 bits of the UID determine which library the data falls on, and the 3 bits are called "genes".
Solution:
- When the user registers, the design function login_name generates 3bit genes, login_name_gene=f (login_name), such as the pink part
- At the same time, generate a globally unique ID of 61bit, as the user's identity, such as the green section
- And then the 3bit Login_name_gene as part of the UID, like the poo-yellow part.
- Generates 64bit UID, assembled from ID and login_name_gene, and inserts data according to the UID Sub-Library
- When accessed using login_name, the 3bit gene is re-restored by the function by Login_name, login_name_gene=f (login_name), directly to the library via login_name_gene%8
VII. User Center-operational side Best practices
Front user side, business requirements are basically a single-line record access, as long as the establishment of non-UID attribute login_name/phone/email to the UID mapping relationship, can solve the problem.
Background operation side, the business needs are different, basically is the bulk paging access, such access calculation is large, the amount of return data is large, compared to consume database performance.
If at this time the foreground business and the background Business public batch service and a database, may cause, because the background "few requests" the "batch query" the "inefficient" access, causes the database CPU occasional instantaneous 100%, affects the foreground normal user's access (for example, the login time-out).
Moreover, in order to meet the background business of various "grotesque" needs, often in the database to establish a variety of indexes, these indexes occupy a large amount of memory, will make the user side of the front desk service Uid/login_name on the query performance and write performance significantly reduced, processing time growth.
For this type of business, the "foreground and background separation" architecture scenario should be used:
Customer side front desk business requirements structure is still unchanged, product operation side back-end business requirements are extracted independent web/service/db to support, the decoupling between the system, for "business complex" "Concurrency Low" "without high Availability" "can accept a certain delay" background business:
- You can remove the service layer and access the DB directly through DAO in the operating background web layer
- No reverse proxy required, no cluster redundancy required
- No need to access real-time libraries, you can synchronize data asynchronously via MQ or offline
- In a very large database, you can use an "index external" or "HIVE" design that fits a lot of data to allow for higher latency
Viii. Summary
This article makes some introductions to the "User Center" as a typical "single key" class of business, horizontally segmented architecture points.
Horizontal segmentation:
Problems encountered after horizontal segmentation:
- The UID property query can be directly located to the library, the non-UID property query cannot be located to the library
Typical business for non-UID attribute queries:
- User-side, front-desk access, single-record query, large number of visits, service needs to be high availability, and high consistency requirements
- Operation side, background access, according to product, operational requirements, access patterns vary, basically is the bulk paging query, due to the internal system, access is very low, the requirements for availability is not high, the requirements for consistency is not so strict
These two types of business architecture design ideas:
User foreground side, "establishing non-UID attribute mapping relationship to UID" best practice:
- Index Table method: The mapping relation of record Login_name->uid in database
- Cache mapping Method: the mapping relationship of record login_name->uid in cache
- Login_name Generating UID
- Login_name gene into UID
Operating background side, "foreground and background separation" best practices:
- Front desk, back-end system web/service/db decoupling, avoid background low-efficiency query caused foreground query jitter
- The design of data redundancy can be adopted
- can use "external index" (such as ES search system) or "Big Data processing" (such as Hive) to meet the background of the abnormal query requirements
Ix. Reference Documents
Shenjian Architect's Road public number
"Turn" single key business, database horizontal Segmentation architecture Practice