Go to MySQL article "database table design, no best only fit"

Last Update:2017-06-22 Source: Internet

Author: User

Tags closure joins

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

http://mp.weixin.qq.com/s/a8klpzM5iam0_JYSw7-U4g

When we design the database, whether it will break through the conventional, to find the most suitable for their own needs of the design, the following example:

Commonly used adjacency table design, will add a parent_id field, such as Region table (country, province, city, district):

CREATE TABLE Area (

[ID] [int] not NULL,

[name] [nvarchar] () NULL,

[parent_id] [int] NULL,

[type] [int] NULL );

Name: PARENT_ID is the parent ID of the domain, the parent ID of the province is the state, the city's parent ID is the province, and so on.

Type is the class of the region: 1: Country, 2: Province, 3: City, 4: District

In the case of hierarchical comparison, it is not a problem to design a table, and it is convenient to call.

But using this adjacency table design approach does not satisfy all the requirements, and when we are unsure of the hierarchy, suppose I have the following comment structure:

Use the adjacency table to record the data for this review (comments table):

Did anyone find out, so design the table, if you want to query all the descendants of a node, it is difficult to achieve, you can use the association query to get a comment and his descendants:

SELECT c1.*, c2.* from comments C1 left OUTER joins comments c2 on c2.parent_id = c1.comment_id;

However, this query can only fetch two tiers of data. The nature of this tree is that it can be extended arbitrarily, and you need to have a way to get its depth data. For example, you might need to calculate the number of a comment branch, or calculate the total cost of a mechanical device.

In some cases, it is appropriate to use adjacency tables in your project. The advantage of adjacency table design is that it can quickly get a direct parent-child node of a given node, and it is easy to insert new nodes. If such a requirement is all about the hierarchical data of your project, then using the adjacency table will work well.

In the case of the tree model mentioned above, there are several scenarios that can be considered: path enumeration, nested sets, and closure tables. These solutions often look much more complex than adjacency tables, but they do make some operations that are more complex or inefficient using adjacency tables easier. If your project does need to provide these operations, then these designs will be a better choice for adjacency tables.

One, Path enumeration

In the comments table, we replace the original parent_id field with the path fields of type varchar. This path field stores content as the topmost ancestor of the current node to its own sequence, just like the UNIX path, you can even use '/' as the delimiter for the path.

You can query one node ancestor by comparing the path of each node. For example: to find comment # #, the path is the ancestor of 1/4/5/7 one, you can do this:

SELECT * from comments as C WHERE ' 1/4/5/7 ' Like C.path | | ‘%‘ ;

The query will match the nodes of the path to 1/4/5/%,1/4/% and 1/%, and these nodes are the ancestors of the comment # #.

You can also query all descendants of a given node by swapping the parameters on either side of the LIKE keyword. For example, query comment # #, path is ' 1/4 ' for all descendants, you can use the following statement:

SELECT * from comemnts as C WHERE c.path like ' 1/4 ' | | ‘%‘ ;

This query statement all can find the background path is: 1/4/5, 1/4/5/6, 1/4/5/7.

Once you can easily get a subtrees tree or a path from a descendant node to an ancestor node, you can easily implement more queries, such as querying the sum of values on all nodes of a subtree.

Inserting a node can also be as simple as using an adjacency table. All you need to do is copy a path to the parent node to insert the node, and append the ID of the new node to the end of the path.

The path enumeration also has some drawbacks, such as that the database does not ensure that the path is always in the correct format or that the nodes in the path do exist. A string that relies on the logic code of the application to maintain the path, and verifies that the string is expensive to correct. Regardless of the length of the varchar is set to how large, there are still limitations of length, and therefore can not support the infinite expansion of the tree structure.

two, nested sets

A nested set solution is to store information about a descendant node, rather than a direct ancestor of a node. We use two numbers to encode each node to represent this information, and we can call these two numbers nsleft and nsright.

Each node determines the values of Nsleft and Nsright as follows: The value of nsleft is less than all descendant IDs for that node, and the value of Nsright is greater than the ID of all descendants of that node. These numbers are not associated with comment_id values.

An easy way to determine these three values (Nsleft,comment_id,nsright) is to make a depth-first traversal of the tree, assign the Nsleft value incrementally in step-by-step, and then incrementally assign the value of Nsright on return. Get the following data:

Once you assign these numbers to each node, you can use them to find the ancestors and descendants of the specified node. For example, search comment # # and all its descendants, you can search for which node IDs are in the comment #4 between the nsleft and Nsright ranges, for example:

SELECT C2. * FROM comments as C1 joins comments as C2 on C2. Nsleft between C1. Nsleft

and C1. Nsright WHERE C1. comment_id = 4;

For example, searching for a comment on the nsleft and all its ancestors, you can search for the ID of the # in which nodes are between the range of the Nsright, for example:

SELECT C2. * FROM comments as C1 joins comments as C2 on C1. Nsleft between C2. Nsleft

and C2. Nsright WHERE C1. comment_id = 6;

The main advantage of using nested set design is that when you want to delete a non-leaf node, its descendants automatically replace the deleted node and become a direct descendant of its immediate ancestor node. This means that one layer has been automatically reduced.

However, some queries that appear simple in the design of adjacency tables, such as acquiring a node's immediate father or direct descendants, can become more complex in nested set designs. In a nested set, if you need to query the direct father of a node, we will do so, for example to find a direct father commenting on the:

SELECT parent. * FROM comments as C joins comments as parent on c. Nsleft between parent. Nsleft andparent. Nsright

Left OUTER joins comments as in_between on C. Nsleft between in_between. Nsleft and in_between. Nsright

and in_between. Nsleft between parent. Nsleft and parent. Nsright WHERE c. comment_id = 6

and in_between. comment_id is NULL;

It's a little complicated anyway.

Working with trees, such as inserting and moving nodes, using nested assemblies is much more complex than other designs. When inserting a new node, you need to recalculate the neighboring sibling nodes of the new insertion node, the ancestor nodes, and the siblings of its ancestors to ensure that their left and right values are larger than the value of the new node. Also, if the new node is a non-leaf node, you also need to check its descendant nodes.

If simple fast query is the most important part of the whole program, nested sets are the best choice, which is much easier and quicker than the operation of individual nodes. However, the insert and move nodes of a nested set are more complex because the left and right values need to be reassigned, and if your application needs to insert and delete nodes frequently, nested sets may not be appropriate.

Three, closure list

The closure table is a simple and elegant solution for tiered storage, which records the relationships among all the nodes in the tree, not just the direct parent-child nodes.

In designing the commenting system, we created an extra table called Tree_paths, which contains two columns, each of which points to the foreign key in comments.

Instead of using the comments table to store the structure of the tree, we store any node pairs in the tree that have (ancestor-descendant) relationships in the Treepaths table, even though the two nodes are not direct parent-child relationships, and we also add a line to the node itself.

Obtaining ancestors and descendants through the Treepaths table is more straightforward than using nested sets. For example, to get a comment on the descendants of # #, simply search the treepaths table for ancestors to be comment #4的行就行了. The same is true for descendants.

To insert a new leaf node, such as a comment on a child node of the treepaths, you should first insert yourself into your own relationship and then search for the node in the table whose descendants are commenting on the number of nodes, increasing the "ancestor one descendant" relationship of the node and the new insertion node (the new node ID should be 8):

INSERT into Treepaths (ancestor, descendant)

SELECT t. Ancestor, 8

From Treepaths as t

WHERE t. Descendant = 6

UNION all SELECT 8, 8;

To delete a leaf node, such as comment # #, you should delete all rows in the Treepaths table whose descendants are comments #7:

DELETE from treepaths WHERE descendant = 7;

To delete a complete subtree, such as comment # # and all its descendants, delete all the descendants in the Treepaths table as #4的行, and those that are descendants of the comment # # descendants.

The closure table is designed to be more straightforward than nested sets, both of which can quickly query the ancestors and descendants of a given node, but the closure table can maintain hierarchical information more easily. Both designs make it easier to query the immediate descendants and ancestors of a given node than using adjacency tables or path enumerations.

However, you can optimize the closure table to make it easier to query the direct Father node or child node: Add a Path_length field to the Treepaths table. The self-reference of a node is path_length to 0, the path_length of its immediate child node is 1, the next layer is 2, and so on. This makes it much more convenient to query them.

Summary: What kind of design should you use?

Each design has its own pros and cons, how to choose the design, depending on the application of what kind of operation is you most need performance optimization.

Hierarchical data Design Comparison

1, adjacency table is the most convenient design, and many programmers know it

2. If you are using a database that supports recursive queries with or CONNECT by PRIOR, that makes the adjacency table query more efficient.

3. The enumeration path can visually show the path between ancestors and descendants, but because it does not ensure referential integrity, the design is very fragile. The enumeration path also makes the storage of data more redundant.

4, nested sets are a smart solution, but may be too smart to ensure referential integrity. It is best to use it in a situation where the query performance requirements are high and other requirements are common.

5, the closure of the list is the most common design, and the above scheme can only allow a node to belong to more than one tree. It requires an additional table to store the relationship, and the use of space-time-saving schemes reduces the consumption of redundant computations during operation.

These designs are only part of our daily design, and we will certainly encounter more options in the development. The choice of which is to be practical, according to the needs of their projects, combined with the merits of the scheme, choose the most suitable one.

I met some developers, in order to muddle through, in the design of database tables, only consider the ability to complete the current task, not much attention to the future expansion of the problem, regardless of whether the query to consume performance. It is possible to see no impact when the amount of pre-data is small, but the amount of data is slightly more obvious (for example, you can use an outer join query, but you want to use a subquery).

I think design database is a very interesting and challenging work, it sometimes reflects how broad your horizons, sometimes it can make you sleep, in short, pain and happiness.

Go to MySQL article "database table design, no best only fit"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More