The application scenario and solution of the----------database sub-list of high concurrency and high load solution

Last Update:2018-08-24 Source: Internet

Author: User

Tags db2 unique id

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The application scenario and solution of database sub-Library sub-table

In a real-world business scenario, to protect the customer experience and to meet the linear growth of the business. A sub-database strategy for products that have a huge amount of data and that business will always do. However, it is very important to adopt a sub-table and sub-library strategy based on business. The following is a specific example to analyze.

• Scenario One: User Center, single key business How to do database segmentation • Scenario Two: Order center, multi-key business how to do database segmentation scenario One: User Center Database Segmentation Architecture Practice | Scenario Introduction

User Center is a very common business system, including user login, registration, information inquiry and modification services.

The user's core metadata is:

User (Uid,login_name,nickname,password,sex,age)

Where UID: User ID, primary key

Login_name,nickname,password,sex,age: Other properties of the user

In the early days of business, single-form libraries can meet business needs

Scenario One: User Center Database Segmentation Method | Scope method

When the amount of data is getting bigger and larger, the database needs to be sliced horizontally, and the common segmentation algorithms are "Scope method" and "hash Method".

Scope method: Divide the data horizontally into two DB instances based on the service UID of the User center:

The advantages of the Range method are:

• The segmentation strategy is simple, according to the UID, according to the scope, User-center can quickly locate the data in which library • Expansion is simple, if the capacity is not enough, as long as the increase USER-DB3 can range method of the insufficiency is: UID must meet the incremental characteristics

• Uneven data volumes, new USER-DB3, less data in the early stages

• Uneven requests, in general, the new registered user activity will be relatively high, so user-db2 tend to be higher than the USER-DB1 load, resulting in unbalanced server utilization

Scenario One: User Center database Segmentation Method | Hash method

Hashing: Divide the data horizontally into two DB instances based on the business UID of the User Center

The advantages of the hashing method are:

• The segmentation strategy is simple, according to the UID, according to the scope, User-center can quickly locate the data in which library

• Data Volume Equalization: As long as the UID is balanced, the distribution of the data on each library must be balanced

• Demand balancing: As long as the UID is balanced, the distribution of the load on each library must be balanced

The disadvantages of the hashing method are:

• If you need to add a library, you need to re-hash it, which can lead to data migration and make it difficult to smooth the upgrade.

Scenario One: User Center data query Demand analysis

Any architectural design that is out of business is bullying, and before the architectural discussion, the business must be briefly analyzed to see what query requirements are on the table structure.

Based on business experience, user centers often have the following types of business needs:

(1) User side, front desk access, the most typical there are two types of requirements

User login: Query user entity through Login_name/email/phone, 1% of requests belong to this type

User information query: After login, through the UID to query the user's instance, 99% requests belong to this type.

The basic characteristics of user-side query are: The basic is a single record query, access volume, service requirements are high availability, and high consistency requirements

(2) Operation side, background access. Needs to meet the needs of the product and operational level, access patterns vary, according to age, gender, login time, registration time and other properties to query. Operational side requirements of the basic characteristics are: a large number of bulk paging query requirements, access is low, the availability requirements are not high, the requirements for consistency is not so strict.

Scenario One: User Center Data query demand solution-user side 1. Index Table Method:

Idea: UID can be directly located to the database, login_name can not directly navigate to the library. Establish a mapping relationship between login_name and login_id.

Solution:

• Establish an index table to record the mapping relationship of Login_name->uid • When accessed using login_name, the index table is used to query the UID, then the corresponding Coux Sol list attributes are less, can hold very much data, generally do not need to divide the library ·　　If the amount of data is too large, you can use Login_name to sub-Library: One more database query, performance decreased by one time. 2. Cache mapping method: the way to access the index table is relatively low performance. A better performance experience can be achieved by placing the mappings in the cache.

Solution:

login_name Query first to the cache to query the UID, and then according to the UID location database • Assuming CacheMiss, using sweeping library method to obtain the login_name corresponding UID, put cache login_name to the UID mapping relationship will not Changes, mapping relationships once placed in the cache, will not be changed, no elimination, cache hit rate is high • If the amount of data is too large, the cache can be sliced by login_name

Insufficient: one more cache query.

3.login_name Generating UID

Idea: Do not make remote query, by login_name directly get UID

Solution:

• When the user registers, the design function login_name generates UID,UID=F (login_name), inserts the data by the UID sub-Library • When logging in with Login_name, the UID is computed by the function, and then the UID is routed to the corresponding database for querying.

Insufficient: login_name to UID generation function requires high, there is a risk of UID generation conflict

4.login_name gene into UID

Idea: Extracting "genes" from login_name into the UID.

Solution:

• When the user registers, the design function login_name generates 4bit genes, login_name_gene=f (login_name), such as the pink part • Simultaneously, generates a 60BIT global unique ID as the user's identity, such as the green part • Then the 4bit Login_name_gene also acts as part of the UID, such as the excrement-yellow part • Generates 64bit UID, assembled by ID and Login_name_gene, and inserts data according to the UID Sub-Library • Accessed by login_name by function by Logi N_name again recovers 4bit gene, login_name_gene=f (login_name), direct to library scene by login_name_gene%8: User Center Data query demand solution-operational side

Background operation side of the query requirements are different, basically is a batch paging query, the amount of computation and return data is large, compared to consume database performance. At this point, if the background business and the foreground business share a batch of services and the same database. Inefficient access to bulk queries that may cause a few requests in the background causes the database server CPU to be instantaneous 100%, affecting the normal access of the foreground user. In addition, because of the wide range of query requirements in the background business, it is necessary to establish multiple indexes on the database, which consume a large amount of memory and disk, resulting in a significant reduction in the query and write performance of the uid/login_name of the foreground business, and the processing time increases. For this type of business, the "back and forth" architecture should be adopted:

Scenario Two: Order Center data query Demand analysis

Or that sentence, any out-of-business architectural design is bullying, before the architectural discussion, first of all to a brief analysis of the business to see what the table structure of the query requirements.

According to business experience, the order center often has the following types of business requirements:

(1) User side, front desk access, the most typical three types of needs

Order Entity query: 90% is the requirement for order entities through OIDs.

User Order List query: Through buyer_id paging query user History order list, 9% traffic belongs to this demand.

Merchant Order List query: Check the list of merchant history orders via Seller_uid page, 1% traffic belongs to this kind of demand.

The main features of the front desk access are: High throughput, highly available service requirements, and high consistency requirements. Among them, the consistency requirement of the merchant is low and can accept a certain degree of delay.

(2) Operation side, background access. Access patterns vary according to product and operational requirements: query by time, structure, commodity and details

Background Access features: The operation side of the query is basically a batch paging query, low access, the requirements for consistency of availability is not high, allowing the second or even 10-second level of query latency.

Scenario Two: Order Center data query Demand solution

Background operation side of the query requirements are different, basically is a batch paging query, the amount of computation and return data is large, compared to consume database performance. At this point, if the background business and the foreground business share a batch of services and the same database. Inefficient access to bulk queries that may cause a few requests in the background causes the database server CPU to be instantaneous 100%, affecting the normal access of the foreground user. For this kind of business, we should adopt "front and rear separation" of the architecture scheme: front-Office Business architecture unchanged, site access, service tiering, database level segmentation.

Scenario Two: Segmentation Method of Order Center database

After defining the access requirements of the order center, the problem is turned into how the oid,buyer_id,seller_id of the front desk can be split horizontally in the database.

The following conditions need to be met:

1. Depending on the buyer_uid%n, you can locate the database

2. Depending on the oid%n, you can locate the database

3. Depending on the seller_uid%n, you can locate the database

The above business is a business scenario for a 1:n (a buyer: n orders) and N: N (a buyer: nth sellers, 1 sellers: N buyers), for the "many-to-many" business, horizontal segmentation should use the "Data redundancy method"

Scenario Two: Segmentation Method of Order Center database

• When an order is generated, through the Buyer_uid, the OID is incorporated into the library gene, written to the Db-buyer library • Asynchronously through the binlog+canal, the data is redundant to the Db-seller library through buyer D Sub-Library, seller library through the Seller_uid sub-Library, the former meet the OID and Buyer_uid query requirements, the latter to meet the needs of the Seller_uid query Scenario two: Order Center Database Segmentation Method | Data Redundancy method

Why do you want redundant data?

Internet data volume of business scenarios, often the database needs to be horizontal segmentation to reduce the amount of single-Library data.

Horizontal cutting Branch has a patitionkey, the query through Patition key can be directly located to the library, but queries on non-patitionkey may need to scan multiple libraries.

The common architecture design is to use the inverse paradigm design of data redundancy to meet the query requirements of different dimensions after the library.

For example: Order business, to users and businesses have order inquiry requirements:

Order (Oid,info_detail);

T (buyer_uid,seller_uid,oid);

If you use Buyer_uid to divide the library, seller_uid queries need to scan multiple libraries.

If you use Seller_uid to divide the library, buyer_uid queries need to scan multiple libraries.

Data redundancy can now be used to meet the query requirements on BUYER_UID and Seller_uid, respectively

T1 (buyer_uid,seller_uid,oid)

T2 (seller_uid,buyer_uid,oid)

The same data, redundant two copies, a copy of Buyer_uid to meet the needs of buyers, a copy of the Seller_uid to meet the seller's query requirements.

Scenario Two: Order Center Database Segmentation Method | How to implement data redundancy

1. Service Synchronous Double Write

The service synchronizes two writes, that is, the service layer synchronously writes redundant data.

The process is as shown on the right:

(1) Business application Alternate service layer, write data

(2) Service layer writes data to DB1

(3) Service layer writes data to DB2

(4) The service layer returns new data to the business application successfully

Advantages:

• Simple, service layer from single write, changed to two write people • Data consistency is high and the double write succeeds before returning

Disadvantages:

• Since the write has changed to two writes, the request time has increased • The data may still be inconsistent (after data is written to DB1, the service is down or restarted, the data cannot be written DB2) 3. Offline Asynchronous Double Write

In order to mask "complexity", the data is written by the offline service or the task to complete, no longer by the service layer

The process is as shown on the right:

(1) Business application Alternate service layer, write data

(2) Service layer writes data to DB1

(3) The service layer returns new data to the business application successfully

(4) The data will be written to the log in the database

(5) Offline service or task read the database log

(6) Offline service or task inserting T2 data

Advantages:

• Data bi-write and business fully decoupled • Short request processing time

Disadvantages:

• Returns a time window for inconsistent data when the business is added successfully, but ensures eventual consistency • Data consistency depends on the ability of offline services or tasks to be sunken

The application scenario and solution of the----------database sub-list of high concurrency and high load solution

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More