Database sub-table scheme __ Database

Source: Internet
Author: User
Tags dba lua database sharding

MySQL use why to divide the table
can use to say to use MySQL place, as long as the data quantity is big, will encounter a problem immediately, want to divide the storehouse to divide a table.
Here's a question. Why do I have to divide the tables? Can MySQL handle a large table?
is actually able to handle the big table. I have experienced a project in which the Tanku physical file size is more than 80G, the single table record number is above 500 million, and this table
Belong to a very nuclear table: a Friend relationship table.

But this is not the best way to say it. There are also many problems facing file systems such as EXT3 file systems that are larger than large files.
This level can be replaced with an XFS file system. But the MySQL sheet is too big after a problem is not resolved: Table structure adjustment related to the operation of the base
Ben is out of the question. Therefore, the use of large items in the application of the Sub-Library table.

From the InnoDB itself, there are only two locks on the btree of the data file, leaf node locks and child node locks, which you can think about when the page splits up or adds
The new leaf will cause the table to be unable to write data.
So the Sub-Library table is a better choice.

So how appropriate is the Sub-library table?
After testing in a single table 10 million records, write read performance is relatively good. So in the left buffer, then the single table is all the data font maintained in the
Below 8 million records, a single table with a character type remains below 5 million.

If you plan according to 100 library 100 tables, such as user business:
5 million *100*100 = 500 billion = 500 billion records.

There is a number in mind, according to business planning is relatively easy.


I. Background INTRODUCTION

1. Large amount of data storage needs a large amount of database resources;

2. The continuous growth of data volume requires that database storage be extensible;

3. In the case of guaranteeing large amount of data, the quality requirement of performance and high availability should be ensured.

4. The existing framework does not completely solve the storage problem of large data volume;

5. Oracle and other mass storage solutions are expensive, using MySQL for the Sub-Library table to save it costs.

Second, feasibility analysis

1. Risk Assessment

(1) Resource and specification requirements for DBA database management;

2. Scale of business data and impact of changes

1 in advance can be planned for the medium or above data scale, the use of a single library (a database instance, multiple tables), read-write separation, or multiple tables (multiple database instances, multiple tables) can meet business needs, and the corresponding design and implementation is relatively simple, error-prone.

2 for the initial data scale can not be accurately predicted, but as the business development of the growing data scale system, requires data storage scalability. This kind of scalability is solved by the divide-and-sink table, which requires the strong scalability of the sub-tables in the routing, which is also the difficulty of the sub-table, this scheme proposes a step-by-step approach to realize the problem.

3. Technology accumulation

1 The company has a simple sub-Library table scheme

2 Lack of extensibility for this program

3) This program will propose short-term implementation of a certain scalability, medium and long term High scalability program

4. Open source or product

1 commercial version of the database Sharding:mysql Proxy, providing MySQL protocol interface (non-JDBC), master-slave structure, can load balance, read-write separation, failover and so on, LUA syntax complex, does not support large data volume of the sub-Library;

2 Amoeba, supports the database instance, each data same table, does not support the transaction; similar to MySQL Proxy, the design abandons Lua, simpler;

3 Ali Group Research Institute Open Source Cobarclient, mainly for small-scale database sharding cluster access, based on the Ibatis, need to plan the scale of data, lack of extensibility; In addition to Cobar, Ali group within a complete DAL layer, the realization of a full JDBC agent;

4 Hibernateshards,hibernate provides the sharding, supports the Sub database instance, is more complex, plans the data scale beforehand, and the frame does not match;

5 Guzz, multiple libraries (virtual database, the actual database routing rules are still custom), table slitting, read-write separation, and multiple databases transparent distributed transaction support, the design goal is to support large-scale online production applications, the need to completely replace the Ibatis, completely inconsistent with the framework.

6 tddl, Taobao Dal, a strong ability to divide the database, still need to achieve the data volume planning, dynamic expansion Limited.

7 Some of the above products to a certain extent to meet our needs, but can not completely solve our large amount of data can be extended problem. third, performance indicators

1. The maximum delay <1ms; Four, feature List and roadmap for each operation compared with the absence of the introduction of the Sub-Library table

1. Vertical sub-Library, different business data using different database instance Storage

2. Data segmentation:

(a) hash model according to the segmentation field;

b to determine the data needs to be fragmented, as far as possible to the associated fragmentation data in a database instance, such as the same user's basic information, friend information or file information;

3. Short-term: sub-Library table

A) database instance number increment

b The table ordinal number in each database is incremented from 1, and is not globally numbered

C based on the data source (Ibatis) to intercept the establishment of access layer, the application of perception

D applications need to be at the bottom of the data source, distributed transaction considerations and management, etc.

E) Scalability: Supports scaling up only and does not support shrinking

4. Long-term: Database access layer

A to establish flexible data segmentation and routing rules

b) support for MySQL cluster

C Read-write separation and load balancing

d) Usability Probes

e) Distributed Transactions

f) Correspondence with Transparent

first, the Internet in the current database split process

For a newly-launched Internet project, because the number of active users in the early period is not much, concurrent volume is relatively small, so at this time the enterprise will generally choose to store all data ina databaseTo access the operation. But as the subsequent marketing efforts continue to strengthen, the number of users and concurrent volume rising, if only a database to support all the access pressure, is almostself-suicide。 So once you get to this stage, most MySQL DBAs will set the databaseread-write detach State, that is, a master node corresponds to multiple salve nodes. After the design of Master/salve mode, it can cope with the load pressure of the single database, and allocate the access operation to the salve nodes, so as to realize the real sense of reading and writing separation. But have you ever thought about how long a single master/salve model can resist? If the number of users and the amount of concurrency appearmeasure LevelRise, the single master/salve mode can not resist long, after all, a master node load is relatively high. To address this dilemma, the Mysql DBA will base the database on a single master/salve patternVertical Partitions(Sub-Library). The so-called vertical partition refers to the business itself can be different, the original redundancy in a Database business table to be separated, the data stored in different databases, while still maintaining master/salve mode. The master/salve mode after vertical partitioning can withstand unimaginable high concurrent access operations, but can neverRelaxOut. The answer is no, once the volume of data in the business table is large, from a maintenance and performance point of view, any crud operation is an extremely resource-intensive thing for the database. Even if an index is set,still can't hide the fact that database performance is falling due to too much data, so this time the MySQL DBA might be on the databaseHorizontal Partitioning(sharding), the so-called horizontal partition refers to the split of a business table into multiple child tables, such as USER_TABLE0, User_table1, User_table2. The child tables are associated with some kind of contract, each of which is stored in a single segment, such as USER_TABLE0 stores 1-10000 of the data, and User_table1 stores 10001-20000 of the data, and finally User_ Table3 stores 20001-30000 of the data. After the horizontal partition set up the business table, must be able to maintain a table of the vast amount of data allocated to the N child table for storage and maintenance, such a design in the domestic first-class Internet enterprises are more common, as shown in Figure 1-1:

Figure 1-1 Horizontal Partitioning

The above author simply explained the database of the sub-table principle of the library. Next, please think carefully. Originally a database can complete the access operation, now if according to the design of the Sub-Library table, it will appear very troublesome, this kind of trouble is especially reflected in the access Operation . Because the persistence layer needs to determine the corresponding data source, and the horizontal partition on the data source, this access method is called access " routing ". By common sense, the persistence layer should not be responsible for the work of the data Access Layer (DAL), it should only care about the operation of one to, so Taobao's Tddl framework was born to the natural.

second, the TDDL architecture prototype

Taobao developed the TDDL (Taobao Distributed Data Layer) framework based on its own business needs, which is mainly used to solve the access routes (persistence layer and data access layer) and data synchronization between heterogeneous databases in the context of the sub-table scenario. It is a JDBC DataSource implementation based on centralized configuration, which has the functions of sub-database, master/salve, Dynamic Data source configuration and so on.

For now, many manufacturers are also offering more excellent and community-supported DAL-layer products, such as hibernate shards, ibatis-sharding and so on. If you want to ask the author why still want to explain to the TDDL, then the author can only very helpless said that the company should do so, because many times the technology selection is not the author's decision, but the customer is the boss. When the author struggled with all efforts to find TDDL's related instructions and introductions on Google, in the heart of an inexplicable fire has begun to spread, for the update slow (almost a year has not updated SVN), almost no community support (ask never response) products, in addition to dwelling in the enterprise, will not go far , the final outcome is doomed to be sad. Well, now that I've complained, I'm going to stick to it anyway. TDDL is located between the database and the persistence layer, and it deals directly with the database, as shown in Figure 1-2:

Figure 1-2 TDDL location of the domain model

It is rumored that Taobao has long been on the data for the excessive database processing, the application layer to connect multiple data sources, the middle of a technology called Dbroute to the database for a unified routing access. Dbroute operation of data, data integration, so that the application layer like the operation of a data source to operate multiple databases. But as the volume of data grows, there is a higher requirement for the partition of the library table, for example, your data to Bai other times, any library can not be stored, so divided into 2, 4, 8, 16, 32 ... Until 1024, 2048. Well, divided into so many, data can be stored, then how to query it. At this time, the data query middleware will be able to undertake this task, to the upper level, it must query the data like a database query, and as fast as querying a database (each query requires a few milliseconds to complete), TDDL undertook such a job (other DAL products do better), As shown in Figure 1-3:

Figure 1-3 tddl Sub-table query strategy

The above author describes the TDDL in the library under the environment of the query strategy, then the author is necessary from Taobao official copy of their own TDDL the merits of some of the description of the authenticity is not sure, after all, not completely open source, and Community zero support, we look at it, don't take it seriously.

Taobao's Custom TDDL advantages:

1, the database master standby and dynamic switching;
2, with the weight of read and write separation;
3. Single-thread read retry;
4, centralized data source information management and dynamic change;
5, stripped of the stable jboss data source;
6, support MySQL and Oracle database;
7, based on the JDBC specification, it is easy to extend the data source to support the implementation of JDBC specification;
8, no Server,client-jar forms exist, the application of direct-attached database;
9, read and write times, concurrency Process control, dynamic change;
10, can analyze the log printing, log flow control, dynamic change;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.