The sub-database, as the name implies, is stored in a library of data blocks stored on multiple libraries, the original stored in a table of data chunked stored on multiple tables. So how much do you know about the sub-database table? Next, we will explain the DDM sub-database sub-table from what is the data sharding and how to do the sharding.
What is Data sharding
Sharding is a direct way to address the limits of database storage capacity. Shards include both vertical and horizontal shards.
Vertical shards
Vertical fragmentation is also called vertical segmentation, that is, in logical tables, the original database is cut into multiple databases. After slicing, different tables are stored on different databases.
Vertical sharding is closely linked to business architecture design. For example, from the business domain of the system architecture optimization, divided into a number of sub-business systems, sub-business system coupling is low. Communication and data exchange between sub-business systems by means of interface.
The business is clear after vertical splitting, the splitting rules are clear, and the system is easy to integrate and expand. Typically used for database upper-level architecture design.
Vertical shards
Horizontal sharding
Horizontal sharding is called horizontal segmentation, that is, the data rows in the logical table are recorded as units, the original logical database is divided into multiple physical database shards, the table data record distribution is stored on each shard.
Horizontal sharding mainly uses the business architecture cannot continue to subdivide, but in the database the single table data quantity is too big, the query performance degrades the scene. Through horizontal sharding, it solves the problem of single-library capacity and improves concurrent query performance.
Horizontal sharding
DDM enables automatic horizontal sharding, where the application does not need to be concerned about which piece of data is stored on the Shard. The level of the logical table is required according to a certain shard rules, such as an order tracking system, we select the order number (ORDERID) as a split key, respectively, "Order Flow table", "Order Details table" and "Logistics tracking table" horizontal split, split rules for the key value hash after the mold, The Shard calculation rule is as follows:
H (Key (OrderId)) = Hash (Key (OrderId))%N
where n denotes a total of n data shards, H (Key (OrderId)) represents the Shard number that the order is stored after the order number is hashed and modulo is evaluated.
Data storage after fragmentation
How to make a shard
In the distributed database, it is easy to solve the bottleneck of the single-table capacity of big data to reach the on-line database storage by the storage of the Sub-Library table. However, after the storage of the library, it is necessary to avoid the performance and resource consumption problems caused by cross-Library join operation.
Therefore, when creating logical libraries and logical tables, it is necessary to determine the actual situation:
1, the logical table is not fragmented?
The DDM logic table supports global tables, shard tables, and three types of single tables. The user can select the most appropriate logical table type creation according to the actual usage requirement of the data table.
Single table create tables and store data only on the first Shard, the global table creates tables in each shard and stores the full amount of data. The Shard table creates a table in each shard, and the data is stored in the Shard scattered by the split rule.
2, according to what rules points?
It is important to select a split key for a logical table. It is recommended to select the split key according to the actual business scenario, different logical table, if have e-r relationship, we recommend selecting the same field to do the split key, avoid cross-library join operation.
In practical use, the following suggestions are available for reference:
Tables with data volumes below 10 million are not recommended for sharding. By establishing appropriate indexes and adopting a read-write separation strategy, a single table can also be a good solution to performance problems.
Tables with data volumes above 10 million are recommended for sharding. Once the data is stored, it can solve the performance bottleneck caused by the large sheet size and increase the concurrency support. Be careful to choose the right split key to plan ahead.
Business reads as few as possible with multiple table joins, the same transaction avoids cross-sharding. Query criteria as far as possible with the split key, to avoid the full shard table scan.
The database middleware DDM manages the underlying database storage engine in a cluster manner, which is very convenient for users to use. The application does not need to be concerned about how many shards are specific. Similar to the operation of a single database, the user through the DDM Management console for database operations, the use of JDBC and other driver services or SQL Client connection database, data read and write. To learn more, welcome to the distributed database Middleware DDM view.
"Dry Goods" a brief discussion on the sub-list of distributed database middleware