Database Sub-Library sub-table

Source: Internet
Author: User
Tags one table

    • 1. Database sub-Library sub-table
      • 1.1. Preface
        • 1.1.1. Noun explanation
      • 1.2. Evolution of the database architecture
      • 1.3. Issues before the sub-list
        • 1.3.1. User request volume is too large
        • 1.3.2. Library is too large
        • 1.3.3. single table Too large
      • 1.4. Ways and means of sub-database table
        • 1.4.1. Vertical Split
        • 1.4.2. Horizontal Split
      • 1.5. Problems faced after the sub-database is divided into tables
        • 1.5.1. Transaction support
        • 1.5.2. Multi-Library result set Consolidation (group By,order by)
        • 1.5.3. Cross-Library Join
      • 1.6. Sub-database sub-table solution products
      • 1.7. Why not suggest a sub-database table
      • 1.8. Reference
The preface to the database sub-Library sub-table

The company recently engaged in service separation, data segmentation, because the amount of data on a single sheet is too large, and also in the amount of 60W per day growth. Before I learned the database of the sub-list, read a few blog posts, but only know a vague concept, and now in retrospect, what is vague.

Today, I looked at the database sub-list of the afternoon, read a lot of articles, now do a summary, "excerpt" Down. (but more looking forward to the end of the actual operation) from the following aspects: The first part: the actual website development process faced with the problem. The second part: What kinds of segmentation, vertical and horizontal differences and applicable surface. The third part: There are some open-source products, technology, their advantages and disadvantages are what. Part IV: Probably the most important, why not recommend the horizontal sub-Library sub-table!? This will allow you to be cautious in planning and avoid the problems caused by the segmentation.

noun explanation

Library: Database; tables: Table; sub-Library table: sharding

Evolution of the database architecture

At first we only use the single-machine database is enough, and then faced with more and more requests, we have to separate the database writes and read operations, using multiple copies from the library (slaver Replication) responsible for reading, using the main library (master) responsible for writing, from the library to synchronize the update data from the main library Keep the data consistent. The schema is the master-slave synchronization of the database. The library can be scaled horizontally, so more read requests are not a problem.

But when the user level up, write requests more and more, what to do? Adding a master does not solve the problem, because the data to preserve consistency, the write operation requires 2 master synchronization between, the equivalent is duplicated, and more complex.

In this case, we need to use the sub-database (sharding) to slice the write operation.

Issues before the sub-list of the library

Any problem is too big or too small for the problem that we are dealing with too much data here.

User request volume is too large

Because the single-server TPS, memory, Io are limited. Workaround: Spread the request to multiple servers; In fact, the user requests and executes a SQL query is essentially the same, all requests a resource, but the user request will also go through the Gateway, routing, HTTP server and so on.

Library too large

A single database processing capacity is limited, the library is located on the server is not enough disk space, single-Library operation of the IO bottleneck solution: Cut into more and smaller libraries

Single table too big.

Crud is a problem; index bloat, query time-out workaround: Cut into smaller tables for multiple datasets.

How to divide the table in a library

In general, vertical and horizontal segmentation, which is a kind of segmentation method of result set description, is the segmentation of physical space. From the problems we face, we begin to address: first of all, the user request volume is too large, we will heap machine (this is not the focus of this article).

Then a single library is too big, and we have to look at it because there are more tables and more data, or because there is more data in a single table. If it is because there are more tables and more data, use vertical slicing, which is divided into different libraries according to the business.

If it is because the data of a single table is too large, then use horizontal segmentation, that is, the table's data in a certain rules are divided into multiple tables, or even more than one table on multiple libraries. the order of the sub-tables should be the vertical and horizontal points. because the vertical division is simpler, it is more in line with the way we deal with real world problems.

Vertical split
    1. Vertical Sub-table

      That is, "large table Split table", based on column fields. Generally, there are more fields in the table, which will be less commonly used, large data, long length (such as the text Type field) to the "extension table." This is generally the case for a large table of hundreds of columns and a "cross-page" problem caused by too much data when queried.

    2. Vertical Sub-Library

      Vertical Sub-Library is for a system of different services to split, such as User A library, the product Producet a library, order orders a library. After slicing, place it on multiple servers, not on a server. Why? Let's imagine that a shopping site provides services, and there will be crud for users, goods, orders, and so on. Before they were split, all fell to a single library, which made the database's single-Library processing capacity a bottleneck. By vertical Sub-Library, if still on a database server, as the user volume increases, this will make the processing capacity of a single database A bottleneck, as well as a single server disk space, memory, TPS and so very tight. So we have to split to multiple servers, so that the above problems are resolved, and will not face the problem of single-machine resources.

      The division of the database business level, and the service "governance", "downgrade" mechanism is similar, can also be different business data management, maintenance, monitoring, expansion and so on. Database is often the most vulnerable to application system, and the database itself is "stateful", compared to the Web and application server, it is more difficult to achieve "scale-out". Database connection resources are more valuable and single-machine processing capacity is limited, in the high concurrency scenario, the vertical sub-Library to a certain extent, can break through the IO, the number of connections and single-machine hardware resources bottleneck.

Split horizontally
    1. Horizontal Sub-table

      For a single table with a large amount of data (such as an order table), divide it into multiple tables according to a certain rule (range,hash, etc.). However, these tables are still in the same library, so database operations at the library level still have an IO bottleneck. Not recommended for use.

    2. Horizontal sub-Library sub-table

      Slicing data from a single table to multiple servers, each with a corresponding library and table, but with a different set of data in the table. The horizontal sub-library table can effectively alleviate the performance bottleneck and pressure of single machine and library, and break through the bottleneck of Io, connection number, hardware resource and so on.

    3. Split-table segmentation rules for horizontal sub-Libraries

      1. RANGE

        From 0 to 100,001 tables, 10001 to 200,001 tables;

      2. Hash take mode

        A mall system, usually the user, orders as the main table, and then they are related to the schedule, so that does not cause cross-library transactions and other issues. Take the user ID and then hash the module and assign it to a different database.

      3. Geographic area

        For example, according to East China, South China, north China so to differentiate business, seven qiniu should be so.

      4. Time

        In terms of time slicing, the data is cut out 6 months ago or even a year ago into another table, because as time goes by, the probability of the data being queried is smaller, so there is no need to put it together with "hot data", which is also a "separation of hot and cold data".

Problem support after the Sub-database table

After the sub-list, it becomes a distributed transaction. If you rely on the Distributed transaction management function of the database itself to perform transactions, it will pay a high performance cost, if the application to assist control, the formation of procedural logic transactions, but also cause a programming burden.

Multi-Library result set merging (group By,order by)

Todo

Cross-Library Join

TODO Sub-Library table after the association between the operation will be limited, we can not join the table in different libraries, also cannot join the table granularity different table, the result of the original query can complete the business, may require multiple queries to complete. Rough Solution: Global table: Basic data, all libraries are copied one copy. Field redundancy: Some fields do not have to be queried by join. System-Level assembly: Check out all of them separately, then assemble them and make them more complex.

Sub-database sub-table solution products

Currently on the market is relatively more sub-database sub-table middleware, which is based on the proxy mode of MySQL proxy and amoeba, based on the Hibernate framework is hibernate shards, based on the JDBC when SHARDING-JDBC, A mybatis-like maven plug-in Mushroom street tsharding with Mushroom Street, by rewriting the Cobar Client of Spring's Ibatis template class.

There are also open source products from big companies:

Why not suggest a sub-library table

Please refer to an answer on the stackoverflow of my translation, this answer is very strong, the old driver: MySQL Sub-Library sub-table scheme

Database Sub-Library sub-table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.