Multi-key service, database horizontal segmentation architecture once done.

Source: Internet
Author: User

forwarded from : original 2017-08-29 58 Shen Jian Architect's Road

Database horizontal slicing is a very interesting topic, different business types, the method of horizontal segmentation of database is different.

This article will take "Order center" as an example, introduce "multi-key" business , with the gradual increase in data volume, database performance is significantly reduced, database level segmentation related architectural practices.

first, what is "multi-key "Class business

The so-called "multi-Key", refers to a meta-data, there are many properties on the front desk online query requirements.

Order Center Business Analysis

Order Center is a very common "multi-key" business, mainly provides the order of query and modification of the service, its core metadata is:

Order (OID, Buyer_uid, Seller_uid, Time,money, detail ...);

which

    • OID is the order ID, primary key

    • Buyer_uid for Buyer UID

    • Seller_uid for seller UID

    • Time, money, detail, ... Wait for the order attribute

Database design, generally in the early days of business, single-Library single-table can be able to handle this requirement, the typical architecture design is:

    • Order-center: Order Center service, providing friendly RPC interface to callers

    • ORDER-DB: Data storage for orders

With the order volume increasing, the database needs to be split horizontally, because there are more than one key query requirements, which field to use to slice, become a key technical problem to solve:

    • If you use OID to slice, queries on buyer_uid and seller_uid need to traverse multiple libraries

    • If you use Buyer_uid or seller_uid to slice, queries on other properties need to traverse multiple libraries

In short, it is difficult to have a complete strategy, before the deployment of technical solutions, first comb the query needs.

Second, the Order Center attribute query Demand Analysis

Before you discuss the architecture, let's briefly analyze the business to see which attributes have query requirements.

Front desk access, the most typical three types of needs:

    • Order entity Query : Through OID Query Order entity, 90% traffic belongs to this kind of demand

    • User Order List query : Through buyer_uid paging query user History order list, 9% traffic belongs to this kind of demand

    • Merchant Order List Query : Check the list of merchant history orders via Seller_uid page, 1% traffic belongs to this kind of demand

front-desk Access features : Large throughput, high availability of service requirements, user access to orders of high consistency requirements, merchants to order access consistency requirements are relatively low, can accept a certain time delay.

Background access, depending on the product, operational requirements, access patterns vary:

    • Query by time, structure, product, detail

background Access features : The operation side of the query is basically a bulk paging query, due to the internal system, access is very low, the requirements for availability is not high, the requirements for consistency is not so strict, allowing second-level or even 10-second level of query latency.

What kinds of architectural solutions should be used to solve these two different business requirements?

Third, the architecture design of the separation between foreground and background

If the foreground business and the background business public a batch of services and a database, it is possible to cause the "few requests" in the background "batch query" "low-efficiency" access, resulting in the occasional CPU instantaneous 100% of the database, the impact of normal user access to the foreground (for example, order query timeout).

Foreground and background access to the query requirements are different, the requirements of the system is not the same, it should be decoupled, the implementation of "front and back office separation" architecture design .

Front desk Business architecture unchanged, site access, service tiering, database horizontal segmentation.

Back-end business requirements are extracted from the independent web/service/db to support, the decoupling between the system, for "business complex" "Concurrency Low" "without high Availability" "can accept a certain delay" background business:

    • You can remove the service layer and directly access the data layer through DAO in the operating background web layer

    • No need to reverse proxy, no cluster redundancy required

    • Data can be synchronously synchronized via MQ or offline, sacrificing the real-time of some data

    • You can use an "index external" or "HIVE" design that fits a lot of data to allow for higher latency

Solve the back-end business access requirements, the problem into the foreground of the Oid,buyer_uid,seller_uid how to do the database level segmentation?

Multiple dimensions of the query is more complex, for complex system design, can be gradually simplified.

Iv. assuming no seller_uid

Order Center, assuming there is no seller_uid on the query requirements, and only the OID and Buyer_uid on the query requirements, degenerate into a "1-to-many" business scenario, for "1-to-many" business, horizontal segmentation should use "Gene law."

Again, what is a library gene?

Through the Buyer_uid Sub-Library, the assumption is divided into 16 libraries, the use of buyer_uid%16 approach to database routing, the so-called modulo 16, which is essentially buyer_uid the last 4 bit determines which library this row of data, the 4 bit, is the library gene.

And again, what is a library of genetic methods?

When the order data OID is generated, the OID end joins the library gene, allowing all orders under the same buyer_uid to contain the same gene and fall on the same sub-library.

As shown, the buyer_uid=666 user placed an order:

    • Use the buyer_uid%16 to decide which library this row of data will be inserted into

    • The library gene is the last 4 bits of buyer_uid, or 1010

    • Before generating the order identification OID, first 60bit (medium green part) is generated using a distributed ID generation algorithm

    • Add the library gene to the last 4 bits (medium pink part) of the OID and assemble it into the final 64bit order OID (Middle blue part)

This method ensures that all order OIDs under the same user fall on the same library, the last 4 bits of the OID are the same, and:

    • Ability to navigate to the library via buyer_uid%16

    • The library can also be located by oid%16

Five, assuming no OID

Order Center, assuming no OID query requirements, and only Buyer_uid and seller_uid on the query requirements, degenerate into a "many-to-many" business scenario, for "many-to-many" business, horizontal segmentation should use "Data redundancy method."

As shown in the following:

    • When an order is generated, the OID is incorporated into the Db-buyer library by buyer_uid the repository.

    • Redundancy of data into the Db-seller library via binlog+canal in an offline asynchronous manner

    • Buyer Library through the Buyer_uid sub-Library, seller library through the Seller_uid sub-Library, the former to meet the OID and Buyer_uid query requirements, the latter to meet the requirements of SELLER_UID query

There are many ways to data redundancy:

    • Service Synchronous Double Write

    • Service Asynchronous Double Write

    • Offline asynchronous double write (shown, is offline asynchronous double write)

Either way, because the two-step operation is not guaranteed atomicity, there is always the possibility of inconsistent data, high throughput distributed transactions is an unresolved problem in the industry, at this time the architecture optimization direction is not to fully guarantee the consistency of the data, but the early discovery of inconsistencies, and repair inconsistencies .

final consistency is a common practice for high throughput Internet business consistency . There are three ways to ensure final data consistency:

    • Full-time, redundant data timing scan

    • Redundant data incremental log scan

    • Real-time message detection on redundant data lines

These details of the program in the "many-to-many" business level split in the detailed analysis of the article, will not repeat.

Liu, Oid/buyer_uid/seller_uid Simultaneous Presence

Through the above analysis:

    • If there is no seller_uid, "multi-key" business will degenerate into "1-to-many" business, at this time should use the "genetic method" of the library: the use of Buyer_uid Library, the OID to add the library gene

    • If there is no OID, "multi-key" business will degenerate into a "many-to-many" business, at this time should use the "Data redundancy Method" sub-Library: Using Buyer_uid and seller_uid to separate library, redundant data, to meet the different attributes of the query requirements

    • If Oid/buyer_uid/seller_uid exists at the same time, it can solve the problem of database horizontal slicing of "multi-key" service by using the comprehensive scheme of the above two schemes.

Vii. Summary

The solution of any complex problem is a process of simplifying and gradually breaking down.

For the complex "multi-key" business like the Order center, when the data volume is large and the database needs to be sliced horizontally, the "foreground and background separation" architecture Design method is used for the background requirements:

    • Front desk, back-end system web/service/db decoupling, avoid background low-efficiency query caused foreground query jitter

    • Using front-end and back-end data redundancy design method, respectively to meet the needs of both sides

    • Use "external index" (e.g. ES search system) or "Big Data processing" (such as Hive) to meet the background perverted query requirements

For the front-office demand, the design ideas of simplifying, will "multi-key" business, broken down into "1-to-many" business and "many-to-many" business category to solve:

    • using the "genetic method"to solve the "1-to-many" sub-Library requirements: Using the Buyer_uid library, adding a library gene to the OID, and meeting the query requirements on OIDs and Buyer_uid

    • use "Data redundancy method"to solve "many-to-many" sub-Library requirements: Use Buyer_uid and seller_uid to separate the library, redundant data, to meet the query requirements on BUYER_UID and Seller_uid

    • If Oid/buyer_uid/seller_uid exists at the same time, a comprehensive solution of the above two schemes can be used to solve the problem of database horizontal slicing of "multi-key" business.

Data redundancy can lead to consistency issues, high throughput of the Internet business, to fully guarantee transactional consistency is difficult, the common practice is final consistency .

Any architectural design that is out of business is bullying and mutual encouragement.

Multi-key service, database horizontal segmentation architecture once done.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.