forwarded from : original 2017-08-29 58 Shen Jian Architect's Road
Database horizontal slicing is a very interesting topic, different business types, the method of horizontal segmentation of database is different.
This article will take "Order center" as an example, introduce "multi-key" business , with the gradual increase in data volume, database performance is significantly reduced, database level segmentation related architectural practices.
first, what is "multi-key "Class business
The so-called "multi-Key", refers to a meta-data, there are many properties on the front desk online query requirements.
Order Center Business Analysis
Order Center is a very common "multi-key" business, mainly provides the order of query and modification of the service, its core metadata is:
Order (OID, Buyer_uid, Seller_uid, Time,money, detail ...);
which
OID is the order ID, primary key
Buyer_uid for Buyer UID
Seller_uid for seller UID
Time, money, detail, ... Wait for the order attribute
Database design, generally in the early days of business, single-Library single-table can be able to handle this requirement, the typical architecture design is:
Order-center: Order Center service, providing friendly RPC interface to callers
ORDER-DB: Data storage for orders
With the order volume increasing, the database needs to be split horizontally, because there are more than one key query requirements, which field to use to slice, become a key technical problem to solve:
If you use OID to slice, queries on buyer_uid and seller_uid need to traverse multiple libraries
If you use Buyer_uid or seller_uid to slice, queries on other properties need to traverse multiple libraries
In short, it is difficult to have a complete strategy, before the deployment of technical solutions, first comb the query needs.
Second, the Order Center attribute query Demand Analysis
Before you discuss the architecture, let's briefly analyze the business to see which attributes have query requirements.
Front desk access, the most typical three types of needs:
Order entity Query : Through OID Query Order entity, 90% traffic belongs to this kind of demand
User Order List query : Through buyer_uid paging query user History order list, 9% traffic belongs to this kind of demand
Merchant Order List Query : Check the list of merchant history orders via Seller_uid page, 1% traffic belongs to this kind of demand
front-desk Access features : Large throughput, high availability of service requirements, user access to orders of high consistency requirements, merchants to order access consistency requirements are relatively low, can accept a certain time delay.
Background access, depending on the product, operational requirements, access patterns vary:
background Access features : The operation side of the query is basically a bulk paging query, due to the internal system, access is very low, the requirements for availability is not high, the requirements for consistency is not so strict, allowing second-level or even 10-second level of query latency.
What kinds of architectural solutions should be used to solve these two different business requirements?
Third, the architecture design of the separation between foreground and background
If the foreground business and the background business public a batch of services and a database, it is possible to cause the "few requests" in the background "batch query" "low-efficiency" access, resulting in the occasional CPU instantaneous 100% of the database, the impact of normal user access to the foreground (for example, order query timeout).
Foreground and background access to the query requirements are different, the requirements of the system is not the same, it should be decoupled, the implementation of "front and back office separation" architecture design .
Front desk Business architecture unchanged, site access, service tiering, database horizontal segmentation.
Back-end business requirements are extracted from the independent web/service/db to support, the decoupling between the system, for "business complex" "Concurrency Low" "without high Availability" "can accept a certain delay" background business:
You can remove the service layer and directly access the data layer through DAO in the operating background web layer
No need to reverse proxy, no cluster redundancy required
Data can be synchronously synchronized via MQ or offline, sacrificing the real-time of some data
You can use an "index external" or "HIVE" design that fits a lot of data to allow for higher latency
Solve the back-end business access requirements, the problem into the foreground of the Oid,buyer_uid,seller_uid how to do the database level segmentation?
Multiple dimensions of the query is more complex, for complex system design, can be gradually simplified.
Iv. assuming no seller_uid
Order Center, assuming there is no seller_uid on the query requirements, and only the OID and Buyer_uid on the query requirements, degenerate into a "1-to-many" business scenario, for "1-to-many" business, horizontal segmentation should use "Gene law."
Again, what is a library gene?
Through the Buyer_uid Sub-Library, the assumption is divided into 16 libraries, the use of buyer_uid%16 approach to database routing, the so-called modulo 16, which is essentially buyer_uid the last 4 bit determines which library this row of data, the 4 bit, is the library gene.
And again, what is a library of genetic methods?
When the order data OID is generated, the OID end joins the library gene, allowing all orders under the same buyer_uid to contain the same gene and fall on the same sub-library.
As shown, the buyer_uid=666 user placed an order:
Use the buyer_uid%16 to decide which library this row of data will be inserted into
The library gene is the last 4 bits of buyer_uid, or 1010
Before generating the order identification OID, first 60bit (medium green part) is generated using a distributed ID generation algorithm
Add the library gene to the last 4 bits (medium pink part) of the OID and assemble it into the final 64bit order OID (Middle blue part)
This method ensures that all order OIDs under the same user fall on the same library, the last 4 bits of the OID are the same, and:
Five, assuming no OID
Order Center, assuming no OID query requirements, and only Buyer_uid and seller_uid on the query requirements, degenerate into a "many-to-many" business scenario, for "many-to-many" business, horizontal segmentation should use "Data redundancy method."
As shown in the following:
When an order is generated, the OID is incorporated into the Db-buyer library by buyer_uid the repository.
Redundancy of data into the Db-seller library via binlog+canal in an offline asynchronous manner
Buyer Library through the Buyer_uid sub-Library, seller library through the Seller_uid sub-Library, the former to meet the OID and Buyer_uid query requirements, the latter to meet the requirements of SELLER_UID query
There are many ways to data redundancy:
Service Synchronous Double Write
Service Asynchronous Double Write
Offline asynchronous double write (shown, is offline asynchronous double write)
Either way, because the two-step operation is not guaranteed atomicity, there is always the possibility of inconsistent data, high throughput distributed transactions is an unresolved problem in the industry, at this time the architecture optimization direction is not to fully guarantee the consistency of the data, but the early discovery of inconsistencies, and repair inconsistencies .
final consistency is a common practice for high throughput Internet business consistency . There are three ways to ensure final data consistency:
Full-time, redundant data timing scan
Redundant data incremental log scan
Real-time message detection on redundant data lines
These details of the program in the "many-to-many" business level split in the detailed analysis of the article, will not repeat.
Liu, Oid/buyer_uid/seller_uid Simultaneous Presence
Through the above analysis:
If there is no seller_uid, "multi-key" business will degenerate into "1-to-many" business, at this time should use the "genetic method" of the library: the use of Buyer_uid Library, the OID to add the library gene
If there is no OID, "multi-key" business will degenerate into a "many-to-many" business, at this time should use the "Data redundancy Method" sub-Library: Using Buyer_uid and seller_uid to separate library, redundant data, to meet the different attributes of the query requirements
If Oid/buyer_uid/seller_uid exists at the same time, it can solve the problem of database horizontal slicing of "multi-key" service by using the comprehensive scheme of the above two schemes.
Vii. Summary
The solution of any complex problem is a process of simplifying and gradually breaking down.
For the complex "multi-key" business like the Order center, when the data volume is large and the database needs to be sliced horizontally, the "foreground and background separation" architecture Design method is used for the background requirements:
Front desk, back-end system web/service/db decoupling, avoid background low-efficiency query caused foreground query jitter
Using front-end and back-end data redundancy design method, respectively to meet the needs of both sides
Use "external index" (e.g. ES search system) or "Big Data processing" (such as Hive) to meet the background perverted query requirements
For the front-office demand, the design ideas of simplifying, will "multi-key" business, broken down into "1-to-many" business and "many-to-many" business category to solve:
using the "genetic method"to solve the "1-to-many" sub-Library requirements: Using the Buyer_uid library, adding a library gene to the OID, and meeting the query requirements on OIDs and Buyer_uid
use "Data redundancy method"to solve "many-to-many" sub-Library requirements: Use Buyer_uid and seller_uid to separate the library, redundant data, to meet the query requirements on BUYER_UID and Seller_uid
If Oid/buyer_uid/seller_uid exists at the same time, a comprehensive solution of the above two schemes can be used to solve the problem of database horizontal slicing of "multi-key" business.
Data redundancy can lead to consistency issues, high throughput of the Internet business, to fully guarantee transactional consistency is difficult, the common practice is final consistency .
Any architectural design that is out of business is bullying and mutual encouragement.
Multi-key service, database horizontal segmentation architecture once done.