"Turn" single key business, database horizontal Segmentation architecture Practice

Last Update:2018-01-15 Source: Internet

Author: User

Tags db2 unique id

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article will take "User Center" as an example, introduce "single key" business, with the gradual increase of data volume, database performance is significantly reduced, database level segmentation related architecture practices:

How to implement horizontal slicing
Common problems after horizontal segmentation
Optimization ideas and practice of typical problems

First, the User Center

User Center is a very common business, mainly provide user registration, login, information query and modification services, the core metadata is:

User (UID, login_name, passwd, sex, age, nickname, ...)

which

UID is User ID, primary key
Login_name, passwd, sex, age, nickname, ... and other user attributes

Database design, generally in the early days of business, single-Library single-table can be able to handle this requirement, the typical architecture design is:

User-center: User Center service, providing friendly RPC interface to callers
USER-DB: Data storage for users

Second, User Center horizontal segmentation method

When the amount of data is getting bigger and larger, the database needs to be sliced horizontally, and the common horizontal slicing algorithm has "scope method" and "hash Method".

The scope method, which divides the data horizontally into two DB instances, is based on the UID of the business primary key of the User center:

USER-DB1: Store UID data from 0 to 10 million
USER-DB2: Store UID data from 1 to 20 million

The advantages of the Range method are:

The segmentation strategy is simple, according to the UID, according to the scope, User-center can quickly locate the data in which library
Simple expansion, if the capacity is not enough, just add user-db3 can

The deficiencies of the scope law are:

The UID must meet the incremental characteristics
Data volume is uneven, the new USER-DB3, in the initial data will be relatively small
Request volume uneven, in general, the new registered user activity will be relatively high, so user-db2 tend to be higher than the USER-DB1 load, resulting in unbalanced server utilization

The hashing method is also based on the User Center's business primary key UID, which divides the data horizontally into two DB instances:

USER-DB1: The UID data that stores the UID to modulo 1
USER-DB2: The UID data that stores the UID to modulo 0

The advantages of the hashing method are:

Segmentation strategy is simple, according to the UID, according to Hash,user-center can quickly locate the data in which library
Data balance, as long as the UID is uniform, the distribution of data in each library must be balanced
Demand balance, as long as the UID is uniform, the distribution of the load on each library must be balanced

The disadvantages of the hashing method are:

Expansion trouble, if the capacity is not enough, to add a library, re-hash may lead to data migration, how to smooth the data migration, is a need to solve the problem

Third, the user Center after the level of the problems brought about

When using UID for horizontal segmentation, what is the problem with business access across the user Center?

For a query on the UID attribute can be routed directly to the library, assuming access to uid=124 data, after the modulo can be directly positioned Db-user1:

For queries on non-UID attributes, such as queries on the Login_name property, it is tragic:

Assuming that you access Login_name=shenjian data, you often need to traverse all the libraries because you do not know which library the data falls on, and the performance can be significantly reduced when the number of libraries is multiple.

How to solve the problem of query on non-UID attribute after the library, is the content that should be discussed emphatically.

Four, User Center non-UID attribute query demand analysis

Any architecture design that is out of business is bullying, and before the architectural discussion, the business is briefly analyzed to see what query requirements are on the non-UID attribute.

Based on the architectural experience of the landlord over the years, there are often two types of business requirements in the User Center non-UID attribute:

(1) User side, front desk access, the most typical there are two types of requirements

User login: Query the user's entity through Login_name/phone/email, 1% requests belong to this type
User information query: After login, through the UID to query the user's instance, 99% requests belong to this type

User-side query is basically a single record of the query, a large number of visits, services need to be highly available, and the requirements for consistency are high.

(2) Operation side, backstage access, according to product, operational needs, access patterns vary according to age, gender, avatar, login time, registration time to make inquiries.

Operation side of the query is basically a batch paging query, because it is an internal system, access is very low, the requirements for availability is not high, the requirements for consistency is not so strict.

What kinds of architectural solutions should be used to solve these two different business requirements?

Five, the User Center horizontal segmentation structure Idea

User Center in the case of large data volume, using UID for horizontal segmentation, for non-UID attributes on the query requirements, the core idea of architecture design is:

For the user side, the "establish a non-UID attribute to the UID mapping relationship" schema scheme
For the operational side, the "foreground and background separation" architecture should be used

Vi. User Center-user side best Practices

"Index Table Method"

Ideas: UID can be directly located to the library, login_name can not directly locate the library, if through login_name can query to the UID, problem solving

Solution:

Establish an index table to record the mapping relationship of Login_name->uid
When accessed using login_name, first query the UID through the Index table, then locate the corresponding library
The index table has fewer properties, can hold very much data, and generally does not require a library
If the amount of data is too large, you can divide the library by login_name

Potential shortfall: One-time database query, performance degradation

"Cache Mapping Method"

Idea: Access Index table performance is low, the mapping relationship in the cache performance better

Solution:

Login_name query first to the cache to query the UID, and then locate the database according to the UID
Assuming the cache miss, the login_name corresponding UID is obtained using the Sweep library method and placed in the cache
Login_name to UID mapping relationship does not change, once the mapping relationship is put into the cache, will not be changed, no elimination, cache hit rate is super high
Cache level Segmentation via login_name if the amount of data is too large

Potential shortfalls: Cache queries more than once

"Login_name Generate UID"

Idea: Do not make remote query, by login_name directly get UID

Solution:

When the user registers, the design function login_name generates UID,UID=F (login_name) and inserts the data by the UID Sub-Library
When accessed using login_name, the UID is computed first through the function, i.e. uid=f (login_name) again, routed from the UID to the corresponding library

Potential deficiency: This function design requires a very technical skill, with UID generation conflict risk

"Login_name gene into UID"

Idea: Cannot use login_name to generate UID, can extract "gene" from login_name, integrate into UID

Assuming 8 libraries, using UID%8 routing, the subtext is that the last 3 bits of the UID determine which library the data falls on, and the 3 bits are called "genes".

Solution:

When the user registers, the design function login_name generates 3bit genes, login_name_gene=f (login_name), such as the pink part
At the same time, generate a globally unique ID of 61bit, as the user's identity, such as the green section
And then the 3bit Login_name_gene as part of the UID, like the poo-yellow part.
Generates 64bit UID, assembled from ID and login_name_gene, and inserts data according to the UID Sub-Library
When accessed using login_name, the 3bit gene is re-restored by the function by Login_name, login_name_gene=f (login_name), directly to the library via login_name_gene%8

VII. User Center-operational side Best practices

Front user side, business requirements are basically a single-line record access, as long as the establishment of non-UID attribute login_name/phone/email to the UID mapping relationship, can solve the problem.

Background operation side, the business needs are different, basically is the bulk paging access, such access calculation is large, the amount of return data is large, compared to consume database performance.

If at this time the foreground business and the background Business public batch service and a database, may cause, because the background "few requests" the "batch query" the "inefficient" access, causes the database CPU occasional instantaneous 100%, affects the foreground normal user's access (for example, the login time-out).

Moreover, in order to meet the background business of various "grotesque" needs, often in the database to establish a variety of indexes, these indexes occupy a large amount of memory, will make the user side of the front desk service Uid/login_name on the query performance and write performance significantly reduced, processing time growth.

For this type of business, the "foreground and background separation" architecture scenario should be used:

Customer side front desk business requirements structure is still unchanged, product operation side back-end business requirements are extracted independent web/service/db to support, the decoupling between the system, for "business complex" "Concurrency Low" "without high Availability" "can accept a certain delay" background business:

You can remove the service layer and access the DB directly through DAO in the operating background web layer
No reverse proxy required, no cluster redundancy required
No need to access real-time libraries, you can synchronize data asynchronously via MQ or offline
In a very large database, you can use an "index external" or "HIVE" design that fits a lot of data to allow for higher latency

Viii. Summary

This article makes some introductions to the "User Center" as a typical "single key" class of business, horizontally segmented architecture points.

Horizontal segmentation:

Problems encountered after horizontal segmentation:

The UID property query can be directly located to the library, the non-UID property query cannot be located to the library

Typical business for non-UID attribute queries:

User-side, front-desk access, single-record query, large number of visits, service needs to be high availability, and high consistency requirements
Operation side, background access, according to product, operational requirements, access patterns vary, basically is the bulk paging query, due to the internal system, access is very low, the requirements for availability is not high, the requirements for consistency is not so strict

These two types of business architecture design ideas:

User foreground side, "establishing non-UID attribute mapping relationship to UID" best practice:

Index Table method: The mapping relation of record Login_name->uid in database
Cache mapping Method: the mapping relationship of record login_name->uid in cache
Login_name Generating UID
Login_name gene into UID

Operating background side, "foreground and background separation" best practices:

Front desk, back-end system web/service/db decoupling, avoid background low-efficiency query caused foreground query jitter
The design of data redundancy can be adopted
can use "external index" (such as ES search system) or "Big Data processing" (such as Hive) to meet the background of the abnormal query requirements

Ix. Reference Documents

Shenjian Architect's Road public number

"Turn" single key business, database horizontal Segmentation architecture Practice

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More