The basis and practice of sub-list of "Dry goods" database

Source: Internet
Author: User
Tags joins one table unique id


evolution of the database architecture
In the era of less business data, we can use the single-machine database to meet the business use, as the volume of business requests more and more, the data in the database increased rapidly, then the single-machine database can not meet the performance requirements of the business, the database master-slave replication architecture came into being.





Master-slave replication is the separation of database writes and reads, with multiple read-only instances (slaver replication) responsible for processing read requests, and Master instances (master) handling write requests, while read-only instances maintain data consistency with the primary instance by replicating the primary instance's data. Because read-only instances can scale horizontally, so more read requests are not a problem, with the advent of cloud computing, the Big Data era, things have not been solved perfectly, when writing requests more and more, the main instance of the write request becomes the main performance bottleneck.



How to solve the above problem? If you simply add a master instance to share the write request, how the write operation synchronizes between the two primary instances to ensure data consistency, and how to avoid double writing, the problem becomes more complex. At this point, we need to use the Sub-database table (sharding), the write operation is sliced to solve, 1:

Figure 1: Typical read-write separation and sub-database table



As the pre-distributed database access Service of RDS, Cloud Middleware product DDM (distributed database middleware) completely solves the extensibility problem of the database, realizes the high concurrent access to the application transparently, realizes the reading and writing separation and the sub-database table.



Realization Scheme of Data shard
The implementation scheme of data Shard can be divided into application layer Shard and Middleware shard, and the characteristics of these two implementations are as follows: 2

Figure 2: Application layer sharding and middleware sharding



As an excellent distributed database middleware product, DDM realizes the function of reading and writing separation and data slicing, using DDM to divide the library, apply 0 changes and be completely transparent to the application.



How to divide the dividing table of a library
The segmentation of data (sharding) can be divided into two segmentation modes according to the type of its segmentation rules. One is based on different tables (or schemas) to the different database (host), this kind of segmentation can be called vertical (vertical) segmentation of data, and the other is based on the data in the table logical relationship, the data in the same table according to a certain conditions split into multiple databases (hosts), This segmentation is called horizontal (horizontal) slicing of data.



The biggest feature of vertical slicing is the simple rules, the implementation is more convenient, especially for the coupling between the business is very low, mutual impact is very small, business logic is very clear system. In this system, it is easy to split the tables used by different business modules into different databases. Depending on the table to split, the impact on the application is also smaller, the split rule will be relatively simple and clear.



Horizontal segmentation is relatively slightly more complex than vertical slicing. Because you want to split different data from the same table into different databases, splitting the rules itself is more complex than splitting the table name for the application, and later data maintenance is more complex.



Specifically, if a single library is too large, then we have to look at it because there are more tables and more data, or because there is more data in a single table. If it is because there are more tables and more data, use vertical slicing, which is divided into different libraries according to the business. If it is because the data of a single table is too large, then use horizontal segmentation, that is, the table's data in a certain rules are divided into multiple tables, or even more than one table on multiple libraries. The order of the sub-tables should be the vertical and horizontal points. Because the vertical division is simpler, it is more in line with the way we deal with real world problems.



Horizontal slicing
In contrast to vertical splitting, a horizontal split is not a classification of a table, but rather a rule of a field that is scattered across multiple libraries, with a subset of the data in each table. To put it simply, we can understand the horizontal segmentation of the data as the segmentation of the data rows, that is, some rows in the table are sliced into one database, and some other rows are sliced into other databases, 3:

Figure 3: Horizontal Slicing



Advantages of horizontal slicing
1, split rule abstraction is good, join operation can basically database do.
2, there is no single-library big data, high concurrency performance bottleneck.
3, the application side transformation less.
4, improve the stability of the system and load capacity.



Disadvantages of horizontal slicing
1, the splitting rule is difficult to abstract.
2, shard transaction consistency is difficult to solve.
3, the data many times to expand the difficulty with the maintenance amount is great.
4, cross-Library join performance is poor.



Typical shard rules for horizontal segmentation
1, the hash to take the mold
For example: Take the user ID, and then hash modulo, assigned to a different database.
2. RANGE
For example: from 0 to 100,001 tables, 10001 to 200,001 tables.
3. Time
By time slicing, for example: 6 months ago, or even a year ago, the data were cut out into another table, because over time, the data of these tables is less likely to be queried, so there is no need to "hot data" together, this is "cold and hot data separation."
The segmentation principle is generally based on the business to find the appropriate segmentation rules scattered to different libraries, 4, according to user ID modulo as a segmentation rule.

Figure 4: Slicing based on userid modulo



Vertical slicing
A database is made up of many tables, each of which corresponds to a different business, and vertical slicing refers to classifying the tables according to the business, distributing them to different databases, and then sharing the data or pressure to different libraries, 5:

Figure 5: Vertical Slicing



Advantages of vertical slicing
1, data maintenance is simple.
2, after splitting the business clear, split rules clear.
3, the system between the integration or expansion is easy.



Disadvantages of vertical slicing
1, transaction processing complex.
2, some business tables can not join, only through the interface method to solve, improve the system complexity.
3, by each kind of business different restrictions exist single-Library performance bottleneck, data expansion and performance is not easy to improve.



Because vertical slicing is to spread the table to different libraries according to the classification of the business, some business tables are too large, there are single-library read-write and storage bottlenecks, so they need to be split horizontally to solve.



Segmentation principle
The data segmentation experience is also shared here because of the difficulty of data join after data segmentation:
The first principle: can not slice as far as possible not to slice.
The second principle: if you want to slice must choose the appropriate segmentation rules, planning well in advance.
Third principle: Data segmentation minimizes the possibility of cross-library joins by data redundancy or table group.
The Forth principle: Because the database middleware is difficult to grasp the merits of data join, and it is difficult to achieve high performance, the service reads as few as possible using the multi-table join.



Problems and coping strategies after the sub-list of library
The Library sub-table is mainly used to deal with the two common scenarios of the current Internet: massive data and high concurrency. However, the sub-database is a double-edged sword, although it is good to deal with massive data and high concurrency on the database impact and pressure, but also improve the complexity of the system and maintenance costs, bring some problems.



1. Business Support
After the sub-database table, it becomes the distributed transaction, how to ensure the consistency of the data becomes a problem that must be faced. In general, so that the storage data as far as possible to achieve user consistency, to ensure that the system after a short period of self-recovery and correction, the final data to achieve consistency.



2, pagination and sorting issues
In general, the list needs to be sorted by the specified field when paging. In the case of library single table, pagination and sorting are also very easy. However, with the evolution of sub-libraries and tables, cross-library sorting and cross-table sorting problems are also encountered. For the accuracy of the final result, the data needs to be sorted and returned in different tables, and the result sets returned by the different tables are summarized and sorted again, and then returned to the user.



3. Table related Issues
In the case of library single table, joint queries are very easy. However, with the evolution of the sub-Library and the sub-table, the Federated query Encounters cross-Library association problems. Rough Workaround: Er shard: The record of the child table is stored on the same data shard as the associated parent table record. Global tables: Basic data, all libraries are copied one copy. Field redundancy: Some fields do not have to be queried by join. Sharejoin: is a simple cross-shard join, currently supporting 2 table joins, the principle is to parse the SQL statement, split into a single-table SQL statement execution, and then the data of the various nodes together.



4. Distributed Global Unique ID
In the case of library single table, it is quite simple to use the database auto-increment attribute directly to generate the primary key ID. In the context of the Sub-Library table, the data is distributed on different sub-tables and no longer with the use of the database self-growth feature, the global unique ID is required.



Sub-database sub-list case
A tax core management system, the national 34 provincial/local tax, the electronic tax bureau 15 provincial pattern.



Technical path: Core Levy + tax Service business application distributed on-cloud transformation.



Business challenges
1, data query time 3-5 seconds, slow response to seriously affect the experience
The current business logic is placed in a large number of database layer, a tax business transaction boundary is too large (40 SQL statements), involving the "declaration", "Invoice" large table-based multi-table associated transaction operations, resulting in slow business query response.
2. Rapid growth of billions of data, challenging business performance bottlenecks
Provincial Tax Bureau, the peak of tax-carrying millions user concurrency, 3000-5000tps. The present network analysis obtains the data: the core collection and management storehouse nearly 1000 tables, among them "the declaration", "the Invoice" the Business table data quantity is big, the growth is fast, is the main bottleneck table; invoice synthesis Information: Each province 1 billion level record, each year tens of millions of records level increment; Declaration Information table: billions of records of data volume.



Solution Solutions
1, vertical sub-Library, micro-service decomposition database pressure, reduce the number of single-business SQL
Based on MicroServices, the large transaction is disassembled into asynchronous small transactions, and the business logic is stripped from the database level. Split the main library data, the large table vertically split into multiple databases, a business 40 SQL reduced to 20 SQL, to decompose the database pressure.
2, data fragmentation support massive data growth, linear increase business processing speed
The single-meter billion-level record is split into 128 shards of rds-mysql with the taxpayer as the split key. To support the storage of massive data. After the database design is concise and simple, there is no foreign key between the tables in the database, no trigger is written, no stored procedure is written, and the level expansion of database records is realized.
3, read-write separation to improve query performance
DDM automates read and write separations, transparently completing the distribution of write operations and read operations, and applications without the need for special changes and processing logic. The write operation is distributed to the RDS master instance, and the read operation is automatically distributed to multiple read instances of RDS, so that the write operation does not affect the concurrency of the read operation, but only needs to increase the read-only instance as needed.



Business Benefits
1, using the DDM, easy to break the original performance bottleneck, a business operation, the original needs 3-5 seconds, now only need 1 seconds.
2, read and write operation through the automatic reading and writing of DDM, without changing the business situation, and easily improve the overall read and write concurrency ability.



The basis and practice of sub-list of "Dry goods" database


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.