Java Internet Architecture-mysql Sub-database sub-list order generation system combat analysis

Source: Internet
Author: User

Overview

The necessity of the Sub-database table

First, let's look at why we want to do a sub-database table. In our business (Web application), the relational database itself is relatively easy to become a system performance bottleneck, the single-machine storage capacity, number of connections, processing power, etc. are very limited, the database itself "stateful" caused it is not as easy as the Web and application server expansion. So in our business, whether it is really necessary to carry out the sub-database of the table, can be considered from the above several conditions.

· Single-machine storage capacity. Whether your data volume is in a single-machine storage bottleneck. Like hungry? The user behavior data generated by the day is 24T, then in the traditional single-machine storage is certainly not enough.

· Number of connections, processing power. When we reach a certain level of user volume, the concurrency of a specific time becomes a big problem, in a high-concurrency site, the second level of hundreds of thousands of concurrency is very normal. In the ordinary single-machine database, the second thousand of thousands of operational problems are very large.

So before we do the sorting table we'd better think about whether our data volume is big enough and the concurrency is not big enough. If your answer is yes, then let's start doing it.

Several methods of sub-database sub-table

In the Sub-database table, we have several different ways: vertical sub-table, vertical sub-Library, horizontal sub-table, horizontal sub-Library sub-table. Points of the same way, the main ideas are large and small, and can be used in combination.

Vertical Sub-table

Vertical sub-table in the daily development and design is more common, simply speaking is "large table split table", Split is based on the column in the relational database. In our basic database course, our teacher teaches us that there are not too many fields in a table, even if the amount of data is not so large, we will do it for the sake of rationality of the table. For example, a table with more fields, you can create a new "extension table", the less frequently used or the length of a large field split out into the "extension table."

For example, in a product table (ID, name, price, company,), we put the commonly used (ID, name, price, company) field in a table, and the less commonly used () field in the expansion table, we query the first query characters commonly used segment, Query the extension fields if necessary. Efficiency can be effectively improved, and the amount of data we can hold in a table increases after the field is reduced.

The vertical sub-table also has a benefit that makes our microservices focus more explicit.

Note: The operation of the table is best done in the database design phase, and if you split it during the subsequent development process, you may need to make a lot of changes to the SQL statement.

Vertical Sub-Library

The basic idea of vertical partitioning is that we divide different databases according to different business modules. Vertical sub-Libraries in the "microservices" prevalent today is very popular. As we said above, this can make our microservices focus more explicit. That is, business logic is clearer.

For example, in our business, User, Product, company, etc. all belong to different business module, can put it in different database.

Horizontal Sub-table

The idea of a horizontal table is simple, it's equivalent to a stack of 100 pancakes, and then I put a basket for every 10. In the database The performance is, I a user table has 1 million data, then I 0-10w in a table, 10-20w put a table, and so on (of course, this is only one of the distribution law). This can reduce the amount of data in a single table and optimize query performance.

The horizontal sub-table can reduce the data amount of single table, and can alleviate the bottleneck of query performance to some extent. But it is still in a database, so when the query goes up to the database level, its IO bottleneck is not solved well. Therefore, it is not recommended to simply horizontal table practice.

Horizontal sub-Library sub-table

The horizontal sub-library table and the horizontal sub-table of the same idea, the same table will be a large number of data in a hierarchical storage, the difference is that the tables are divided into different databases. In the scenario of high concurrency and massive data, the sub-database table can effectively alleviate the performance bottleneck and pressure of single machine and library, and break through the bottleneck of Io, connection number and hardware resource. Of course, the cost of hardware will also be higher. At the same time, this can lead to complex technical problems and challenges (e.g., complex queries across shards, cross-shard transactions, etc.).

I combine the project practice, the level of sub-Library to do a systematic analysis, hope for everyone's level of library (including to the IoE) to provide general ideas.

The main contents include:

Horizontal Sub-Library description

Sub-Library Dimension--which field to divide the library from

Sub-Library strategy--how records are assigned to different libraries

Number of libraries-the number of initial libraries and how many libraries are growing

Route transparency--how to implement Seobinggo, support application transparency

Paging--paging case across multiple libraries how to handle

Lookup mapping-non-library fields are mapped to a library field for library access

Overall architecture--the overall technical architecture of the sub-Library

Online Step----------launch

Project Summary

Horizontal Sub-Library description

There are two types of database splits:

1) Vertical Sub-Library

There are too many tables in the database, take out part of the new library, generally according to the Business Division table, close the table to put the same database, the application of database connection can be modified, relatively simple.

2) Horizontal sub-Library

A table is too large, a single database can not be stored or access performance pressure, a table is split into multiple, each table to store part of the records, stored in different databases, the level of the library needs to be large-scale transformation of the system.

The order table is stored in Oracle database with hundreds of billions of fields, and the mode of access is complex, with the rapid growth of the business, regardless of storage space or access performance are facing enormous challenges, especially in the big promotion, the order database has become a system bottleneck.

There are usually two workarounds:

Scale up, which upgrades the physical machine where the Oracle database is located, improves memory/storage/io performance, but this upgrade is expensive and can only meet short-term needs.

Scale out, split the order library into multiple libraries, dispersed across multiple machines for storage and access, this practice supports horizontal scaling to meet long-term needs.

To take the latter approach, its order base mainly includes the Order Main Table/order schedule (record product details)/order Extension table, the horizontal sub-library is to divide the records of the 3 tables into multiple databases, the order level sub-Library effect as shown:

The original Oracle library was replaced by multiple MySQL libraries, support 1 master multi-standby and read-write separation, the master and slave through the MySQL-brought data synchronization mechanism (sla<1 seconds), all applications through the order service access order data.

The dimension of the library

The horizontal sub-library first of all to consider which field as the library dimension, the selection criteria is to avoid the application code and SQL performance is affected, which requires that the current SQL in the library, access as far as possible in a single library, or library access into a multi-library scan, read and write performance and application logic will be greatly affected.

For order splitting, the first thing you think of is split according to the user ID, the conclusion is correct, but it is better to speak the data, can not shoot the head. A good practice is to collect all the SQL first, pick the filter field that most often appears in the where statement, such as the user ID/PO ID and Merchant IDs, each of which has three cases in SQL:

Single ID filtering, such as user id=?

Multi-ID filtering, such as User ID in (?,?,?)

The ID does not appear

Then further statistics, assuming a total of 500 SQL access to the order library, 3 filter fields appear as follows:

Filter field Single ID filter multi-ID filter does not appear

User Id12040330

Order Id6080360

Merchant Id150485

The conclusion is obvious that the user ID should be chosen for the sub-Library.

Wait, this is just static analysis, each SQL access is different, so you also have to analyze the number of accesses per SQL. We analyzed the SQL that TOP15 executed the most (they accounted for the total number of executions 85%), and if they were divided by user ID, those SQL 85% fell to a single database, 13% to multiple databases, and only 2% needed to traverse all the databases, significantly better than using other IDs for the library.

Through quantitative analysis, we know that according to the user ID sub-Library is optimal, but also roughly know the impact of the existing system, such as in this example, 85% of SQL will fall into a single database, this part of the performance of the access will be optimized, to strengthen the confidence of the parties to the library.

Sub-Library strategy

After the library dimension is determined, how can the records be divided into various libraries? There are generally two ways:

Depending on the range of values, for example, a record with a user ID of 1-9999 is divided into the first library, 10000-20000 is divided into the second library, and so on.

Based on the numerical modulus, such as user ID mod n, a record with a remainder of 0 is placed in the first library, the remainder is 1 to the second library, and so on.

The pros and cons of the two methods are compared as follows:

Evaluation indicators According to the scope of the library according to the MoD sub-Library

The number of libraries is relatively small, the number of libraries can be determined according to the mode factor as the user/business grows on demand.

Access performance pre-Library quantity is small, the whole library query consumes less resources, the single database query performance is slightly worse the number of pre-library, the whole library query consumes more resources, the single database query performance is slightly better

It is easier to adjust the number of libraries, generally just add a library for new users, the old library split only affects a single library difficulty, changing the MoD factor causes the data to migrate between all libraries

Data hotspot new and old users shopping frequency is different, there are data hot issues old and new users evenly distributed to each library, no hot

In practice, in order to deal with the simple, choose MoD sub-Library more. At the same time two times the library, in order to facilitate data migration, is generally increased by multiples, such as the initial 4 libraries, two split into 8, and then 16. So for a library of data, half of the data moved to the new library, the remaining fixed, compared to only one library at a time, all the data will be large-scale changes.

In addition, the MoD library generally has a more uniform number of records per library, but there are some databases, there are super IDs, the records of these IDs far exceed other IDs, such as in the ad scene, a large advertiser may occupy a large proportion of the total. If you take a modular library by The Advertiser ID, there are a number of records for some libraries, and for these super IDs you need to provide a separate library to store the records.

Number of libraries

The number of libraries is first related to the number of records that can be processed by the library, in general, Mysql single library more than 50 million records, Oracle Single library more than 100 million records, DB pressure is very large (of course, the processing capacity and number of fields/access mode/record length has a further relationship).

In order to meet the above premise, if the number of sub-libraries is less than the purpose of decentralized storage and reduce the performance of the DB; if the number of libraries, the advantage is that each library record less, library access performance is good, but for access across multiple libraries, the application needs to access multiple libraries, if the concurrency mode, to consume valuable thread resources If it is in serial mode, the execution time will increase sharply.

The final number of libraries also directly affect the input of the hardware, generally each sub-library run on a separate physical machine, more than one library means more than one device. Therefore, the specific number of libraries, to comprehensive assessment, the general initial sub-library recommendations are divided into 4-8 libraries.

Route transparency

In a sense, the library means that the DB schema changes and inevitably affects the application, but this change is irrelevant to the business, so try to ensure that the library is transparent to the application code, and that the library logic is handled at the data access layer as much as possible. Of course it is difficult to do this completely, specifically which should be responsible for the DAL, which is responsible for the application, here are some suggestions:

For library access, such as a query condition that specifies a user ID, the SQL only needs to access a specific library. This should be automatically routed to a specific library by the DAL layer, and the application code will not be affected if the MoD factor is changed when the library is split two times.

For a simple multi-Library query, the DAL is responsible for summarizing the records returned by each database, and is still transparent to the upper application.

For multi-Library queries with aggregation operations, such as keywords with groupby/orderby/min/max/avg, it is recommended that the DAL summarize the results returned by a single library and that the upper-level applications do further processing. On the one hand the DAL fully supports a variety of cases, the implementation is very complex, on the other hand, from the practice of shop 1th, such examples are not many, in the upper application for targeted processing, more flexible.

The DAL can be further subdivided into JDBC and DAL Layer Two, based on the JDBC level to achieve the sub-library routing, system development is difficult, flexibility is low, there is no good success case; it is generally based on the persistence layer framework to further encapsulate the ddal (Distributed data access layer), to achieve the sub-Library routing, The 1th store Dal is based on the Ibatis for the upper package.

Pagination processing

After the library, some paging queries need to traverse all the libraries, which are the largest victim of the repository.

To give a page example, such as the requirement to show the order of a business in chronological order, 100 records per page, because it is by business query, need to traverse all the database, assuming the number of libraries is 8, we look at the paging processing logic:

If you take the 1th page of data, you need to take the first 100 records in chronological order from each library, 8 libraries have 800, then the 800 records are sorted in the application two times, and the first 100 are taken.

If you take the 10th page of data, you need to take the first 1000 (100*10) records from each library, 8,000 records after the summary, and then the 8,000 records two times after sorting (900,1000) records.

In the case of the sub-Library, for the K-page records, each library to take 100* (K-1) records, all the libraries add up, more records more, so the more the next paging, the system consumes more memory and execution time.

Compared to the case of no library, whichever page you take, it is easy to fetch 100 records from a single db without having to do two sorting within the application.

How to solve the problem of paging in the case of the sub-Library? There are several ways to do this:

If it is to provide paging in the foreground application, then limit the user can only see the front n pages, this restriction is also reasonable in business, generally look at the back of the page is not very meaningful (if you must look, you can ask users to narrow the scope of re-query).

If a background batch task requires data to be fetched in batches, you can increase the page size, such as 5,000 records each time, effectively reduce the number of pages (of course offline access to the general library, to avoid the impact of the main library).

In the design of the sub-Library, there are generally supporting big data platform to summarize all the sub-Library records, some paging query can consider the big data platform.

Lookup Mappings

There is only one library field, such as the user ID, but the order table has other fields that uniquely distinguish the record, such as the order ID, given an order ID, and the corresponding record must be in a library. If you blindly query all the sub-libraries, it brings unnecessary overhead, and the lookup map can find the corresponding user ID based on the order ID, thus realizing the single-Library positioning.

All order IDs and user IDs can be retrieved in advance, saved in the lookup table, the lookup tables have the same number of records and the total number of order library records, but it has only 2 fields, so neither storage nor query performance is a problem. When actually used, the lookup performance is generally optimized through distributed caching. For new orders, write a lookup table in addition to the order form.

Overall architecture

The overall technical architecture of the order generation level sub-Library is as follows:

The upper-level app accesses the database through the Order Service/Library agent and DAL.

The agent transparently implements the order service, including aggregation operations, mapping of non-user IDs to user IDs.

The lookup table is used for order ID/user ID mapping, which guarantees access to a single library, which is the memory data image of lookup, improves performance, and accesses the lookup table directly when the cache fails.

The DAL provides a route to the library, navigates to a library based on the user ID, and for docu access, the DAL supports optional concurrent access patterns and supports simple record summarization.

The lookup table initializes the data from the existing library data, which is written directly by the agent when new records are added.

On-Line steps

Order form is the core business table, its horizontal split affect a lot of business, its own technical transformation is also very large, it is easy to make a mistake, on-line, it must be carefully considered, the whole program implementation process is as follows:

First, the Oracle and MySQL two sets of libraries are parallel, all data access points to the Oracle library, and data synchronizer to split the data from Oracle to multiple MySQL sub-libraries, such as 3-minute incremental synchronization.

Set up the whole system according to the above architecture diagram, select several access examples with low real-time data (such as Access history orders), turn to MySQL sub-library access, and then gradually add more non-real-time case to test the whole system feasibility.

If the performance and functionality are no problem, turn all real-time read-write access to MySQL and discard Oracle at once.

This on-line step is the development of data Synchronizer (about 1 People weekly workload, the risk is very low), but dispersed the risk, the first step of the technical risk (Lookup/dal infrastructure transformation) and the second step of the business function risk (Oracle change MySQL syntax) separate. Two-stage on-line is a one-time success, especially the second stage on-line, more than 100 relying party application simple restart to complete the upgrade, there is no big problem in the middle.

Summarize

Here, MySQL sub-database sub-table order generation system actual combat analysis is over, and the shortcomings of the hope that we all forgive!! Think harvest words can point a concern collection forward a wave oh, thank the big boys support. (Blow a wave, 233~~)

Here are some of the experiences of programming with you:

1, more write more knock code, good code and solid basic knowledge must be practice out of

2, testing, testing and re-testing, if you do not thoroughly test your code, then I am afraid that you develop more than just the code, may also be notorious.

3, simplify programming, speed up, code coquettish, after you have finished coding, you should turn back and optimize it. In the long run, some of the improvements here or there will make it easier for later support staff.

Java Internet Architecture-mysql Sub-database sub-list order generation system combat analysis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.