The application scenario and solution of database sub-Library sub-table
In a real-world business scenario, to protect the customer experience and to meet the linear growth of the business. A sub-database strategy for products that have a huge amount of data and that business will always do. However, it is very important to adopt a sub-table and sub-library strategy based on business. The following is a specific example to analyze.
• Scenario One: User Center, single key business How to do database segmentation • Scenario Two: Order center, multi-key business how to do database segmentation scenario One: User Center Database Segmentation Architecture Practice | Scenario Introduction
User Center is a very common business system, including user login, registration, information inquiry and modification services.
The user's core metadata is:
User (Uid,login_name,nickname,password,sex,age)
Where UID: User ID, primary key
Login_name,nickname,password,sex,age: Other properties of the user
In the early days of business, single-form libraries can meet business needs
Scenario One: User Center Database Segmentation Method | Scope method
When the amount of data is getting bigger and larger, the database needs to be sliced horizontally, and the common segmentation algorithms are "Scope method" and "hash Method".
Scope method: Divide the data horizontally into two DB instances based on the service UID of the User center:
The advantages of the Range method are:
• The segmentation strategy is simple, according to the UID, according to the scope, User-center can quickly locate the data in which library • Expansion is simple, if the capacity is not enough, as long as the increase USER-DB3 can range method of the insufficiency is: UID must meet the incremental characteristics
• Uneven data volumes, new USER-DB3, less data in the early stages
• Uneven requests, in general, the new registered user activity will be relatively high, so user-db2 tend to be higher than the USER-DB1 load, resulting in unbalanced server utilization
Scenario One: User Center database Segmentation Method | Hash method
Hashing: Divide the data horizontally into two DB instances based on the business UID of the User Center
The advantages of the hashing method are:
• The segmentation strategy is simple, according to the UID, according to the scope, User-center can quickly locate the data in which library
• Data Volume Equalization: As long as the UID is balanced, the distribution of the data on each library must be balanced
• Demand balancing: As long as the UID is balanced, the distribution of the load on each library must be balanced
The disadvantages of the hashing method are:
• If you need to add a library, you need to re-hash it, which can lead to data migration and make it difficult to smooth the upgrade.
Scenario One: User Center data query Demand analysis
Any architectural design that is out of business is bullying, and before the architectural discussion, the business must be briefly analyzed to see what query requirements are on the table structure.
Based on business experience, user centers often have the following types of business needs:
(1) User side, front desk access, the most typical there are two types of requirements
User login: Query user entity through Login_name/email/phone, 1% of requests belong to this type
User information query: After login, through the UID to query the user's instance, 99% requests belong to this type.
The basic characteristics of user-side query are: The basic is a single record query, access volume, service requirements are high availability, and high consistency requirements
(2) Operation side, background access. Needs to meet the needs of the product and operational level, access patterns vary, according to age, gender, login time, registration time and other properties to query. Operational side requirements of the basic characteristics are: a large number of bulk paging query requirements, access is low, the availability requirements are not high, the requirements for consistency is not so strict.
Scenario One: User Center Data query demand solution-user side 1. Index Table Method:
Idea: UID can be directly located to the database, login_name can not directly navigate to the library. Establish a mapping relationship between login_name and login_id.
Solution:
• Establish an index table to record the mapping relationship of Login_name->uid • When accessed using login_name, the index table is used to query the UID, then the corresponding Coux Sol list attributes are less, can hold very much data, generally do not need to divide the library · If the amount of data is too large, you can use Login_name to sub-Library: One more database query, performance decreased by one time. 2. Cache mapping method: the way to access the index table is relatively low performance. A better performance experience can be achieved by placing the mappings in the cache.
Solution:
login_name Query first to the cache to query the UID, and then according to the UID location database • Assuming CacheMiss, using sweeping library method to obtain the login_name corresponding UID, put cache login_name to the UID mapping relationship will not Changes, mapping relationships once placed in the cache, will not be changed, no elimination, cache hit rate is high • If the amount of data is too large, the cache can be sliced by login_name
Insufficient: one more cache query.
3.login_name Generating UID
Idea: Do not make remote query, by login_name directly get UID
Solution:
• When the user registers, the design function login_name generates UID,UID=F (login_name), inserts the data by the UID sub-Library • When logging in with Login_name, the UID is computed by the function, and then the UID is routed to the corresponding database for querying.
Insufficient: login_name to UID generation function requires high, there is a risk of UID generation conflict
4.login_name gene into UID
Idea: Extracting "genes" from login_name into the UID.
Solution:
• When the user registers, the design function login_name generates 4bit genes, login_name_gene=f (login_name), such as the pink part • Simultaneously, generates a 60BIT global unique ID as the user's identity, such as the green part • Then the 4bit Login_name_gene also acts as part of the UID, such as the excrement-yellow part • Generates 64bit UID, assembled by ID and Login_name_gene, and inserts data according to the UID Sub-Library • Accessed by login_name by function by Logi N_name again recovers 4bit gene, login_name_gene=f (login_name), direct to library scene by login_name_gene%8: User Center Data query demand solution-operational side
Background operation side of the query requirements are different, basically is a batch paging query, the amount of computation and return data is large, compared to consume database performance. At this point, if the background business and the foreground business share a batch of services and the same database. Inefficient access to bulk queries that may cause a few requests in the background causes the database server CPU to be instantaneous 100%, affecting the normal access of the foreground user. In addition, because of the wide range of query requirements in the background business, it is necessary to establish multiple indexes on the database, which consume a large amount of memory and disk, resulting in a significant reduction in the query and write performance of the uid/login_name of the foreground business, and the processing time increases. For this type of business, the "back and forth" architecture should be adopted:
Scenario Two: Order Center data query Demand analysis
Or that sentence, any out-of-business architectural design is bullying, before the architectural discussion, first of all to a brief analysis of the business to see what the table structure of the query requirements.
According to business experience, the order center often has the following types of business requirements:
(1) User side, front desk access, the most typical three types of needs
Order Entity query: 90% is the requirement for order entities through OIDs.
User Order List query: Through buyer_id paging query user History order list, 9% traffic belongs to this demand.
Merchant Order List query: Check the list of merchant history orders via Seller_uid page, 1% traffic belongs to this kind of demand.
The main features of the front desk access are: High throughput, highly available service requirements, and high consistency requirements. Among them, the consistency requirement of the merchant is low and can accept a certain degree of delay.
(2) Operation side, background access. Access patterns vary according to product and operational requirements: query by time, structure, commodity and details
Background Access features: The operation side of the query is basically a batch paging query, low access, the requirements for consistency of availability is not high, allowing the second or even 10-second level of query latency.
Scenario Two: Order Center data query Demand solution
Background operation side of the query requirements are different, basically is a batch paging query, the amount of computation and return data is large, compared to consume database performance. At this point, if the background business and the foreground business share a batch of services and the same database. Inefficient access to bulk queries that may cause a few requests in the background causes the database server CPU to be instantaneous 100%, affecting the normal access of the foreground user. For this kind of business, we should adopt "front and rear separation" of the architecture scheme: front-Office Business architecture unchanged, site access, service tiering, database level segmentation.
Scenario Two: Segmentation Method of Order Center database
After defining the access requirements of the order center, the problem is turned into how the oid,buyer_id,seller_id of the front desk can be split horizontally in the database.
The following conditions need to be met:
1. Depending on the buyer_uid%n, you can locate the database
2. Depending on the oid%n, you can locate the database
3. Depending on the seller_uid%n, you can locate the database
The above business is a business scenario for a 1:n (a buyer: n orders) and N: N (a buyer: nth sellers, 1 sellers: N buyers), for the "many-to-many" business, horizontal segmentation should use the "Data redundancy method"
Scenario Two: Segmentation Method of Order Center database
• When an order is generated, through the Buyer_uid, the OID is incorporated into the library gene, written to the Db-buyer library • Asynchronously through the binlog+canal, the data is redundant to the Db-seller library through buyer D Sub-Library, seller library through the Seller_uid sub-Library, the former meet the OID and Buyer_uid query requirements, the latter to meet the needs of the Seller_uid query Scenario two: Order Center Database Segmentation Method | Data Redundancy method
Why do you want redundant data?
Internet data volume of business scenarios, often the database needs to be horizontal segmentation to reduce the amount of single-Library data.
Horizontal cutting Branch has a patitionkey, the query through Patition key can be directly located to the library, but queries on non-patitionkey may need to scan multiple libraries.
The common architecture design is to use the inverse paradigm design of data redundancy to meet the query requirements of different dimensions after the library.
For example: Order business, to users and businesses have order inquiry requirements:
Order (Oid,info_detail);
T (buyer_uid,seller_uid,oid);
If you use Buyer_uid to divide the library, seller_uid queries need to scan multiple libraries.
If you use Seller_uid to divide the library, buyer_uid queries need to scan multiple libraries.
Data redundancy can now be used to meet the query requirements on BUYER_UID and Seller_uid, respectively
T1 (buyer_uid,seller_uid,oid)
T2 (seller_uid,buyer_uid,oid)
The same data, redundant two copies, a copy of Buyer_uid to meet the needs of buyers, a copy of the Seller_uid to meet the seller's query requirements.
Scenario Two: Order Center Database Segmentation Method | How to implement data redundancy
1. Service Synchronous Double Write
The service synchronizes two writes, that is, the service layer synchronously writes redundant data.
The process is as shown on the right:
(1) Business application Alternate service layer, write data
(2) Service layer writes data to DB1
(3) Service layer writes data to DB2
(4) The service layer returns new data to the business application successfully
Advantages:
• Simple, service layer from single write, changed to two write people • Data consistency is high and the double write succeeds before returning
Disadvantages:
• Since the write has changed to two writes, the request time has increased • The data may still be inconsistent (after data is written to DB1, the service is down or restarted, the data cannot be written DB2) 3. Offline Asynchronous Double Write
In order to mask "complexity", the data is written by the offline service or the task to complete, no longer by the service layer
The process is as shown on the right:
(1) Business application Alternate service layer, write data
(2) Service layer writes data to DB1
(3) The service layer returns new data to the business application successfully
(4) The data will be written to the log in the database
(5) Offline service or task read the database log
(6) Offline service or task inserting T2 data
Advantages:
• Data bi-write and business fully decoupled • Short request processing time
Disadvantages:
• Returns a time window for inconsistent data when the business is added successfully, but ensures eventual consistency • Data consistency depends on the ability of offline services or tasks to be sunken
The application scenario and solution of the----------database sub-list of high concurrency and high load solution