database table schema design of tree-shaped structure
In the process of programming, we often use the tree structure to characterize some data related relations, such as enterprise subordinate departments, column structure, commodity classification, etc., usually, these tree structure need to use the database to complete the persistence. However, at present, all kinds of relational-based databases record the data stored in the form of two-dimensional table, so it is not possible to put the tree into DBMS directly, and design the appropriate schema and its corresponding CRUD algorithm is the key to realize the tree structure in the relational database.
The ideal tree structure should have the following characteristics: The data storage redundancy is small and intuitive, the retrieval traversal process is simple and efficient, and the node additions and deletions are efficient for crud operation. Inadvertently on the Internet to search for a very clever design, the original is English, look after feeling a bit of meaning, so then tidied up a bit. This article will introduce two kinds of tree structure schema design scheme: One is the intuitive and simple design idea, the other is based on the left and right value coding improvement scheme.
I. Basic data
This article lists an example of a food genealogy, which organizes food by category, color and variety, and the tree-shaped structure is as follows:
II. schema design driven by inheritance relationship
The most intuitive analysis of the tree structure is the inheritance relationship between nodes, by showing the parent node of a node, so as to establish a two-dimensional relational table, the tree table structure of this scheme is usually designed as: {node_id,parent_id}, the above data can be described as shown in:
The advantages of this scheme are obvious: design and implementation are natural, very intuitive and convenient. The disadvantage of course is also very prominent: because of the direct record of the inheritance between the nodes, so any crud operation of the tree will be inefficient, mainly rooted in the frequent "recursive" operation, the recursive process of continuous access to the database, each time the database IO will have time overhead. Of course, this solution is not useless, in the case of a relatively small tree size, we can use the caching mechanism to optimize, the tree information loaded into memory for processing, to avoid direct performance overhead of database IO operations.
Third, the schema design based on the left and right value coding
In a general database-based application, the requirements of the query are always greater than the deletions and modifications. In order to avoid the "recursive" process of tree structure query, we design a new zero-recursive query and infinite grouping of left-and right-value coding schemes based on tree's pre-sequence traversal to save the tree data.
Seeing this table structure for the first time, I believe most people do not know how the Lvalue (Lft) and Rvalue (RGT) are computed, and the table design does not seem to preserve the inheritance of parent-child nodes. But when you point your finger at the number in the table from 1 to 18, you should find something. Yes, the order in which you move your fingers is the order in which the tree is pre-ordered, as shown in. When we start from the left side of food on the root node, Mark 1, and follow the direction of the previous sequence traversal, and sequentially mark the number on the traversed path, finally we go back to the root node food and write 18 on the right.
Seeing this table structure for the first time, I believe most people do not know how the Lvalue (Lft) and Rvalue (RGT) are computed, and the table design does not seem to preserve the inheritance of parent-child nodes. But when you point your finger at the number in the table from 1 to 18, you should find something. Yes, the order in which you move your fingers is the order in which the tree is pre-ordered, as shown in. When we start from the left side of food on the root node, Mark 1, and follow the direction of the previous sequence traversal, and sequentially mark the number on the traversed path, finally we go back to the root node food and write 18 on the right.
According to this design, we can infer that all the left value is greater than 2, and the right value is less than 11 of the nodes are fruit, the structure of the whole tree is stored by the left and right values. However, this is not enough, our aim is to be able to do crud operations on the tree, that is, we need to construct a matching algorithm.
four, tree-shaped structure crud algorithm
(1) Get the descendant nodes of a node
Only one SQL statement is required to return the list of pre-order traversal of the descendant nodes of the node, taking fruit as an example: select* from Tree WHERE Lft between 2 and one order by Lft ASC. The query results are as follows:
So how many descendant nodes does a node have? Through the left and right values of the node we can circle its descendants, the total number of descendants = (rvalue – Lvalue –1)/2, take fruit as an example, the total number of descendants is: (11–2–1)/2 = 4. At the same time, in order to show the tree structure more intuitively, we need to know the level of the node in the tree, through the left and right value of the SQL query can be implemented, taking fruit as an example: SelectCount (*) from the Tree WHERE Lft <= 2 and Rgt >=11. For the sake of description, we can create a view for the tree, add a hierarchy sequence, the value of the column can be written a custom function to calculate, the function is defined as follows:
[SQL]View Plaincopy
- CREATE FUNCTION dbo. Countlayer
- (
- @node_id int
- )
- RETURNS int
- As
- Begin
- declare @result int
- Set @result = 0
- declare @lft int
- declare @rgt int
- If exists (select node_id from Tree where node_id = @node_id)
- begin
- Select @lft = lft, @rgt = RGT from Tree where node_id = @node_id
- Select @result = count (*) from Tree where Lft <= @lft and Rgt >= @rgt
- End
- return @result
- End
- GO
Based on the hierarchical calculation function, we create a view that adds a new sequence of record node levels:
[SQL]View Plaincopy
- CREATE VIEW dbo. TreeView
- As
- SELECT node_id, Name, Lft, RGT, dbo. Countlayer (node_id) as Layer from dbo. Tree ORDER by Lft
- GO
Create a stored procedure that calculates all descendant nodes of a given node and the corresponding hierarchy:
[SQL]View Plaincopy
- CREATE PROCEDURE [dbo].[ Getchildrennodelist]
- (
- @node_id int
- )
- As
- DECLARE @lft int
- DECLARE @rgt int
- If exists (select node_id from Tree where node_id = @node_id)
- begin
- Select @lft = lft, @rgt = RGT from Tree where node_id = @node_id
- SELECT * from the TreeView where Lft between @lft and @rgt ORDER by Lft ASC
- End
- GO
Now, we use the stored procedure above to compute the node fruit all descendant nodes and corresponding levels, the query results are as follows:
From the above implementation, we can see the use of the right and left value coding design, in the tree query traversal, only need to make 2 database queries, eliminate recursion, plus the query conditions are the comparison of numbers, query efficiency is very high, with the growing scale of the tree, The design scheme based on the left and right value coding will be more efficient than the traditional recursive scheme query efficiency. Of course, we have just given a simple algorithm to get the descendants of the node, and really use this tree we need to implement the insertion, delete the same level of translation nodes and other functions.
(2) Get the genealogy path of a node
Assuming that we want to obtain a tree path for a node, we need only one SQL statement for left and right value analysis, taking fruit as an example: select* from Tree WHERE Lft < 2 and Rgt > One ORDER by Lft ASC, relatively complete of stored procedures:
[SQL]View Plaincopy
- CREATE PROCEDURE [dbo].[ Getparentnodepath]
- (
- @node_id int
- )
- As
- DECLARE @lft int
- DECLARE @rgt int
- If exists (select node_id from Tree where node_id = @node_id)
- begin
- Select @lft = lft, @rgt = RGT from Tree where node_id = @node_id
- SELECT * from the TreeView where Lft < @lft and Rgt > @rgt ORDER by Lft ASC
- End
- GO
(3) Adding descendant nodes to a node
Suppose we want to add a new child node "Apple" under the Node "red", the tree will become as shown, where the red node is the new node.
Looking closely at the changes in the left and right values of the graph, I believe you should be able to infer how to write SQL scripts. We can give a relatively complete stored procedure that inserts a child node:
[SQL]View Plaincopy
- CREATE PROCEDURE [dbo].[ Addsubnode]
- (
- @node_id int,
- @node_name varchar (a)
- )
- As
- DECLARE @rgt int
- If exists (select node_id from Tree where node_id = @node_id)
- begin
- SET xact_abort on
- BEGIN transction
- Select @rgt = RGT from Tree where node_id = @node_id
- update Tree Set Rgt = Rgt + 2 where Rgt >= @rgt
- update Tree Set Lft = Lft + 2 where Lft >= @rgt
- INSERT INTO Tree (Name, Lft, RGT) values (@node_name, @rgt, @rgt + 1)
- COMMIT TRANSACTION
- SET xact_abort OFF
- End
- GO
(4) Delete a node
If we want to delete a node, all descendants of that node are also deleted, and the number of nodes deleted is: (the right value of the deleted node-the left value of the deleted node + 1)/2, while the left and right values of the remaining nodes are adjusted to be greater than the left and right values of the deleted node. Look at what happens to the tree, take beef as an example, and delete the effect as shown.
Then we can construct the corresponding stored procedure:
[SQL]View Plaincopy
- CREATE PROCEDURE [dbo].[ Delnode]
- (
- @node_id int
- )
- As
- DECLARE @lft int
- DECLARE @rgt int
- If exists (select node_id from Tree where node_id = @node_id)
- begin
- SET xact_abort on
- BEGIN transction
- Select @lft = lft, @rgt = RGT from Tree where node_id = @node_id
- Delete from Tree where Lft >= @lft and Rgt <= @rgt
- update Tree Set Lft = lft– (@rgt-@lft + 1) where Lft > @lft
- update Tree Set Rgt = rgt– (@rgt-@lft + 1) where Rgt > @rgt
- COMMIT TRANSACTION
- SET xact_abort OFF
- End
- GO
V. Summary
We can make a summary of this tree structure schema design scheme which realizes infinite grouping by left and right value coding:
(1) Advantages: In the elimination of recursive operation under the premise of the realization of infinite grouping, and the query condition is based on the comparison of shaping numbers, high efficiency.
(2) Disadvantage: the addition, deletion and modification of nodes are costly, which will involve the changes of various data in the table.
Of course, this article only gives a few more common crud algorithm implementation, we can also add themselves such as the same level of node translation, node down, node move up and other operations. Interested friends can do their own code implementation, here is not listed. It is worth noting that implementing these algorithms can be cumbersome and involves the sequential execution of many of the update statements, and if the sequential scheduling is poorly thought out, a bug would have an astonishing effect on the entire tree-shaped structure table. Therefore, in the large-scale modification of the tree structure, you can use the temporary table as an intermediary to reduce the complexity of the code, at the same time, it is strongly recommended to make a full backup of the table before making changes for a rainy day. In the vast majority of database-based application systems, this scheme is more suitable than the traditional database schema constructed by parent-child inheritance relationship.