The evolution of database architecture and the sub-list of library

Source: Internet
Author: User

As the volume of business increases in the production environment, the amount of database data can also increase dramatically. When the amount of data in the database reaches a certain level (database bottleneck), the host load of the database is very high, which can seriously affect the business, and seriously cause the database to be down.
In order to avoid this extreme situation, we should make a plan before the occurrence, to solve the problem of overloading the database data volume.
Here are the solutions that I use in my personal work:
1) Database master-slave or multi-master multi-Slave scheme
2) database splitting of hot and cold data
3) Database sub-Library sub-table operation
4) Add cache Redis or memcached to the front of the database

At the outset, some of the company's business was structured as follows (all single-node cases)

After emphasizing the serious shortcomings of the architecture: single node, serious service interruption, and so on, after the evolution of the following diagram of the architecture:

In the above architecture, we have updated our code to implement non-disruptive services while addressing some of the single node issues. Later a period of time business to do promotional activities, business volume explosion, database data volume into exponential growth. The database load has been high. The current architecture has been unable to meet the business requirements, and then put forward to the database data on the hot and cold data splitting and data read and write separation (you can consider using middleware mycat, Atlas, Maxscale, you can also consider the development of write routing rules, some framework comes with routing rules), here we use Mycat , such as the schema, which shows only the backend:

Over time, library single table has reached the limit of business endurance, which has been unable to satisfy the business in hardware or MySQL parameter tuning. At this time, the database of the Sub-Library table. Here is the main explanation of the database sub-list
There are two different options in the sub-list of the library:
1) Vertical Split
2) Horizontal Split
1, Vertical split
The so-called vertical split, is to split a single database into multiple databases, you can consider the scenario:
1) split according to business logic
2) split according to Hot and cold data
Here we first explain the split based on business logic:
Here take goods--orders--for example, when a variety of goods stored in a database and a table, over time, the amount of data increased, single-table query ability to decline, can be divided according to commodity categories, such as the company's products: High school, middle school, primary school, separated by a single table into three separate tables. If still large, continue to split, high school split into higher, sophomore, senior three. Junior high school split into the primary, second day, third day. Primary school split into small one, small two ... Little Six. (All the sub-Libraries are derived from the logic of life).
We split the data in our production environment, cold data (with the MyISAM engine), hot data (with the InnoDB engine), and the host of the MyISAM engine's database is cached more by Redis or memcache. The host of the InnoDB engine's database allocates compute-and-memory physical machines for business situations, or hybrid, and can consider using Redis for partial storage to mitigate database stress.
Advantages and disadvantages of vertical splitting:
Advantages:
1) by single-Library single-table split into multi-Library multi-table, reducing the database additions and deletions to check the pressure
2) to split the hot and cold data, reduce the cost and reasonable use of hardware
3) for a vertically split database, you should consider the extensibility of the database schema when designing the database (otherwise it will cause great resistance to later expansions)
4) The business logic is clearer and more convenient to operation and management after the business sub-Library.
Disadvantages:
1) for the Joint table query, it brings inconvenience, can invoke the way of the interface to trigger the operation of the table query, the overall system, the complexity of the increase.
2) for some databases, the existence of a single-library performance bottleneck affects the entire business situation.
3) At the same time, the complexity of the transaction is increased.
2. Horizontal Split
The so-called horizontal split, in a simple way, is to split the data in a table into multiple copies of the tables stored in different databases. For example, the user table has 9,000 data, the user table is split into User1, User2, user3 three tables stored in different databases, User1 storage of the first 3,000 data, the middle 3,000 data stored in the User2 table, the last 3,000 data into the User3 table.
Partitioned dimensions that are split horizontally (there are many algorithms to decide which horizontal split scheme to use)
1) According to the hash slice, a field is hashed, then divided by the number of shards, and finally modulo, modulo the same data as a shard, this is the hash shard.
This fragmentation is not timely. The disadvantage of data dispersion is that it needs to be aggregated if queries are needed.
2) According to the time Series algorithm slicing, some business will have obvious quarterly fluctuations, can use time algorithm. This algorithm is the uneven distribution of data.
In a production environment, some of our business uses a hybrid of hashing and time-series algorithms
When it comes to horizontal splitting, you have to mention a horizontal split routing rule.
When designing a database, it is necessary to take into account the routing rules of various tables in the database, and also to consider what routing rules the data table will follow in the future.
For example, how does a new user register, and how is this user assigned to which table? Usually the system will automatically assign a UID when registering. According to this UID can be allocated according to an algorithm. For example Uid%4, (here a table is divided into 4 tables), if 1 is divided into the first table, and so on.

Advantages and disadvantages of horizontal splitting:
1) Library single table the largest amount of data, there is a limit on the amount of data, we can allocate according to the business situation, just reach the maximum (this amount is difficult to grasp), both to improve the performance of the table and save Resources
2) because the structure of the table changes very little, There is very little change in code for development, only the routing rules need to be added.
3) The overall system stability and load are greatly improved
Disadvantage:
1) jion operation is very difficult, especially cross-Library query
2) Some of the split rules are difficult to abstract out
3) Shard Transaction consistency is difficult to resolve
4) And the database is more difficult to expand and maintain
for the above vertical and horizontal split, there are the following disadvantages:
1) jion operation difficulty
2) transactional consistency difficulty
3) management of multiple data sources becomes complex.
How do I resolve transactional consistency issues? (This importance is in the first place)
1) The consistency of the local transaction (can be used)
2) Distributed transaction processing
Distributed Transaction Processing solution:
1) Two-phase commit scenario (the most rigorous scenario, rarely used, Because the blocking protocol is causing performance issues): There are two stages of preparation (locking resources) and submission (consumption resources). This method relies on the resource manager.
2) Maximum effort guaranteed mode (most commonly, extreme situations require real-time compensation, rollback of committed data) no dependency on the resource manager
There are two things: consume messages from Message Queuing and update database operations
Start a message transaction? Start a database transaction? Receive a message? Update the database? Commit a database transaction? Commit a message transaction
when the database is updated, the abrupt interruption, the rollback, the return to the initial state, in special cases, if the commit database transaction is successful, but the submission of the message transaction fails, it will cause the message to consume again. This can be handled by the message idempotent (sometimes a lot of messages can not meet idempotent such as update operation, you may consider adding a message application state table to record the message consumption and database transaction conditions), the above extreme situation, need real-time compensation. This compensation mechanism is similar to the TCC mode Try-confirm-cancel (if the try is successful, it repeats Confirm or repeats Cancel until all succeeds), the TCC requires that the Confirm/cancel must be idempotent.

3) transaction compensation mechanism
The first two scenarios, are not the best, the transaction compensator can guarantee the performance and the best possible guarantee the consistency of the transaction. To be blunt is to break the idea of a failure on a transaction, and to specify that it be committed for a certain amount of time until it succeeds, and if the timeout is rolled back.
Problems arising from the process of sub-tables in the library
1. Expansion and migration of data
Data has been fragmented, at the same time the amount of data has reached a threshold, need to expand the cluster, are used to multiply the expansion.
Here are 5 expansion steps:
1) Add new routing rules, write to new database with new routing rules, and use old routing rules for old database. (double write, two routines by rules)
2) write the old data before the double write to the new database (a lot of data cleansing work is required).
3) Change the query according to the old Shard rule to query according to the new Shard rule.
4) write down the code of the two-write routing rules, and only follow the new rules.
5) Delete historical data written according to the old Shard rule.
2. Table query problem caused by sub-database sub-table
Example:
Sellers and buyers, sellers need to view the sale of the goods, buyers need to view their trading situation, you can consider using two tables, one to the buyer dimension, record the buyer's merchandise transactions, a seller dimension, record the seller of the merchandise transactions. That is, the data that is queried for transactions and transactions is stored separately and interfaces are provided from different systems.
Method Two: Solve by the search engine (big data)

Reference Documentation:
21593187

The evolution of database architecture and the sub-list of library

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.