001---MySQL sub-database table

Last Update:2018-04-20 Source: Internet

Author: User

Tags message queue

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MySQL Sub-Library sub-table first, the whole way of segmentation

1, Sub -Database sub-table: that is, the segmentation of data is through a specific condition, we stored in the same database of data scattered into multiple databases (host), in order to achieve the effect of decentralized single device load

2, the data segmentation according to its segmentation rules of the type, can be divided into the following two kinds of segmentation mode

"1" Vertical (Portrait) slicing : Splitting a single table into multiple tables/separating unrelated tables into different databases (hosts).

such as: User table, commodity SKU table, Transaction pay table, according to different business segmentation, the table is sliced into different databases.

Advantages:

(1). Clear business after splitting, clear split rules

(2). Integration or expansion between systems is easy

(3). It is easy to manage by placing the table on different machines according to cost, grade of application, type of application, etc.

(4). Design pattern of database table for the separation of static and dynamic separation and hot and cold separating

Hot and cold separation: Cold data (more queries, less variation), hot data/Active data (many variations). Separation of hot and cold data is separated from hot and cold.

(5). Simple Data Maintenance

Disadvantages:

(1). Some business tables can not be associated (Join), can only be resolved through the interface, improve the complexity of the system

(2). A single-library performance bottleneck that is not easy to scale and improve performance by each business's different limits

(3). Transaction processing Complexity

Factors to consider when designing a database table structure:

(1). Cold data is suitable for MyISAM storage engine (query fast), hot data suitable for INNODB storage engine (update fast).

(2). Master-Slave Separation: The main library is used for write operations, from the library (configurable more from the library) for query operations, to share the request pressure.

(3). For active/Hot data, memcache, Redis and other caches can be used to update the DB when a certain amount is accumulated.

2 horizontal (horizontal) Slicing: splits data in the same table into multiple databases (hosts) according to a condition based on the logical relationships of the data in the table.

Such as: the user table can be horizontally divided according to the ID User 1 table & User 2 table (table structure consistent), that is, user table data = User 1 table data + user 2 table data

Advantages:

(1). The data of Library single table is kept at a certain level, which helps to improve the performance.

(2). The segmented table has the same structure, and the application layer is less reformed, only need to increase the routing rules.

(3). Improve the stability and load capacity of the system.

Disadvantages:

(1). After slicing, the data is scattered, it is difficult to take advantage of the join operation of the database, cross-Library join performance is poor

(2). Splitting rules are difficult to abstract.

(3). The consistency of the Shard transaction is difficult to resolve.

(4). The difficulty and maintenance of data expansion is great.

above, horizontal/vertical segmentation in common:

There is a problem with distributed transactions.
There is a cross-node join problem.
There is a problem of sorting, paging across the node merge.
There are multiple data source management issues.

two, the route process and the Shard dimension of the horizontal slicing way

1. Horizontal slicing of routing processes

When designing a table, we need to determine what rules the table will use to divide the database into tables.

For example, when a new user is available, the program must determine

(1). Which table the user's information is added to;

(2). At the time of login we need to find the corresponding record in the database through the user's account.

all of this requires routing requests according to a rule, because the data required for the request is distributed across different shard tables .

Above: For input requests, the process of finding the corresponding tables and libraries through the Sub-Library table rules is called routing .

For example:

The rule of the sub-list is user_id% 4, when the user registers a new account, assuming that the user ID is 123, we can use 123 4 = 3 to determine that the account should be saved in the User3 table.

When a user with ID 123 is logged in, we can determine that it is recorded in User3 after 123 4 = 3 calculation.

2. Split-level segmented dimensions (refer to the Myca provided by the tile)

"1" follows a hash slice

Hash Shard: A field of data is hashed, divided by the total number of shards after the modulo, modulo the same data as a shard.

Hash shards are often used in situations where data is not time-sensitive, such as when all data is generated at any moment, and needs to be processed or queried.

Advantages: Data slicing is more uniform, the effect of data pressure dispersion is better.

Cons: When data is dispersed, aggregation is required for query requirements.

"2" by Time slice

Time Shard: Distributes data to different shards by time range. (This slicing method is suitable for data with obvious time characteristics.)

third, post-shard transaction processing mechanism

(i), distributed transactions

Because we slice the data of a single table and store it in multiple databases and even multiple DB instances, the transaction mechanism that relies on the database itself does not meet the needs of all scenarios.

However, we recommend that operations in a DB instance use local transactions to ensure consistency whenever possible, and a series of update operations across DB instances need to be done in different data sources based on transactional routing, and update operations between data sources need to be handled through distributed transactions .

There are three main types of distributed transaction solutions: Two-phase commit protocol, maximum effort guarantee mode and transaction compensation mechanism.

1. Two phase Submission Agreement

The two-phase commit protocol divides a distributed transaction into two phases, one for the preparation phase, one for the commit phase, and two for the transaction manager.

Based on the two-phase commit protocol, the transaction manager is able to maximize the atomicity of transactions across database operations and is the most stringent transaction implementation method in a distributed system environment.

The AppServer (for example: Websphere, Weblogic, JBoss, etc.) of the Java EE specification implement a two-phase commit protocol for relational database data sources and message queues, which can only be configured when used

However, the two-phase commit protocol also poses a performance problem and is difficult to scale horizontally because during the commit transaction, the transaction manager needs to coordinate with each participant in preparing and submitting the action, locking the resource during the preparation phase, and consuming resources during the commit phase.

However, due to the large number of participants, the time difference between the lock resource and the consuming resource is elongated, resulting in a slower response, which is more likely to result in deadlocks or uncertain results during this period. Therefore, in the Internet industry, in order to pursue performance improvement, rarely use two-phase commit protocol.

In addition, since the two-phase commit protocol is a blocking protocol and cannot respond quickly to the requester in extreme cases, a three-phase commit protocol is proposed to resolve the blocking problem of the two-phase commit protocol, but it is still necessary for the transaction manager to coordinate between the participants to complete a distributed transaction.

2. maximum effort guarantee mode

This is a very common pattern of guaranteed distributed consistency, which many developers have been using, but not aware of as a pattern. The best effort guarantee mode is for scenarios where conformance requirements are not very stringent but performance requirements are high.

This is accomplished by delaying the submission of multiple resources to the last minute when updating multiple resources, so that if there is a problem with the orchestration, all resource updates can be rolled back and the transaction remains consistent.

The only possible problem is a system problem when committing multiple resources, such as a network problem, but this is very rare, and in the event of this, real-time compensation is needed to roll back the committed transaction, which is somewhat similar to what we often call the TCC pattern.

The following is a sample of the maximum effort assurance pattern, which involves two operations, one for consuming messages from Message Queuing, and one for updating the database, which requires guaranteed distributed consistency.

(1) Start the message transaction.

(2) Start the database transaction.

(3) receive the message.

(4) Update the database.

(5) Commit the database transaction.

(6) Submit a message transaction.

At this point, from the 1th to the 4th step is not very critical, the key is the 5th and 6th steps, you need to put it together at the end of the submission, do our best to ensure the consistency of the previous business processing.

In steps 5th and 6th, when the business logic is processed, only system errors can occur, and if the 5th step fails, the message queue and the database transaction are all rolled back, consistent. If the 5th step succeeds, and the 6th step encounters a network timeout, this is the only case where the problem is likely to occur.

In this case, the consuming process of the message is not committed to the message queue, and Message Queuing may resend the message to the other message processing service, which causes the message to be consumed repeatedly, but can be processed by idempotent to guarantee the effect of eliminating duplicate messages.

Of course, when using this pattern, we have to take into account the order in which each resource is submitted. One of the anti-patterns we encounter in production practice is to nest remote calls in database transactions, and remote invocation is a time-consuming task, resulting in database transactions being stretched and finally dragging down the database.

Therefore, the above case involves a nested database transaction of the message transaction, which must be adequately evaluated and designed in order to evade the transaction risk.

3. Transaction compensation mechanism

Obviously, in a scenario with high performance requirements, a two-phase commit protocol is not a good solution, and the best effort to ensure that the pattern also causes multiple distributed operations to be nested with each other may affect each other. Here, we give the transaction compensation mechanism, which has high performance and can guarantee the final consistency of the transaction to the maximum extent possible.

After the database is divided into tables, if the multiple update operations involved in a database-wide completion, you can use local transactions within the database to ensure consistency, for multiple operations across the library, you can compensate and retry the operation in a certain time window, so that the final consistency of the transaction can be achieved, The traditional idea of breaking through a business and rolling back to a problem.

If the use of transaction compensation mechanism, in the event of a problem, we need to record the environment, information, steps, status, etc., followed by the retry mechanism to achieve final consistency, the details can refer to the "Distributed Service Architecture: Principles, Design and Combat" 2nd chapter, thoroughly understand the acid principle, cap theory, Base principle, the final consistency mode and so on.

This article comes from:

Https://www.toutiao.com/a6545626478447428103/?tt_from=weixin&utm_campaign=client_share&article_ Category=stock&timestamp=1524215325&app=news_article&utm_source=weixin&iid=28070358035&utm _medium=toutiao_android&wxshare_count=1

001---MySQL sub-database table

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More