Discovery of several tree structure database storage schemes

Source: Internet
Author: User

Recently in the development of Jsqlbox process, study the operation of tree structure, suddenly found a new tree structure database storage scheme, found on the internet, did not find the same (may be spent time is not enough), is introduced as follows: The current common tree-structured database storage scheme has the following four kinds, but there are some problems :
1) adjacency List:: Record parent node. The advantage is simple, the disadvantage is that the access subtree needs to traverse, emit many SQL, the pressure on the database is large.
2) Path enumerations: records the entire path with a string. The advantage is easy to query, the disadvantage is that when inserting a new record to manually change all the following paths of this node, it is easy to error.
3) Closure table: A special table to maintain the path, the disadvantage is that large space, operation is not intuitive.
4) Nested sets: Record the left and right values, the disadvantage is complicated and difficult to operate.
The above methods have a common disadvantage: the operation is not intuitive, can not directly see the tree structure, not conducive to development and debugging.
This article describes the method I temporarily call it "simple brute multi-column storage Method", which is a bit similar to path enumerations, but the difference is to use a lot of database columns to store a placeholder (1 or null), such as (https://github.com/drinkjava2/ multiple-columns-tree/blob/master/treemapping.jpg) The tree structure on the left, which is mapped in the database, is shown in the table on the right:

The various SQL operations are as follows:

1. Get (or delete) all child nodes under the specified node, the line number of the known node is "X", the column name is "CY": SELECT * (or delete) from TB where line>=x and line< (select min (lines) from T b where line>x and (Cy=1 or C (Y-1) =1 or C (Y-2) =1 ... or c1=1)) For example, get the D node and all its child nodes: SELECT * from TB where line>=7 and Lin e< (line) from TB where line>7 and (c2=1 or c1=1) Delete the D node and all its child nodes: Delete from TB where line>=7 and line& Lt (line) The from TB where line>7 and (C2=1 or c1=1) obtains only the subordinate sub-nodes of the D node: SELECT * from TB where line>=7 and c3=1 an  D line< (line) from TB where line>7 and (C2=1 or C1=1)) 2. Query the root node of the specified node, the row number of the known node is "X", the column name "CY": SELECT * from TB where line= (line) from TB where line<=x and c1=1) such as the root node of the I node: SELECT * from TB where line= (select Max (line) F Rom TB where line<=12 and c1=1) 3. Query the parent node at the top level of the specified node, the line number of the known node is "X", the column name "CY": SELECT * from TB where line= (select Max TB where line<x and C (Y-1) =1) For example the upper parent node of the L node: SELECT * from TB where line= (select Max (line) from TB where line<11 and C 3=1) 3. Query all parent nodes of the specified node, The line number of the known node is "X", Column name "CY": SELECT * from TB where line= (select Max (line) from TB where line<x and C (Y-1) =1) union SELECT * FROM TB where line= (line) from TB where line<x and C (Y-2) =1) ... union SELECT * FROM TB where line= (select Max (Lin e) from TB where line<x and c1=1) For example, all parent nodes of the I node: SELECT * from TB where line= (select Max (line) from TB where line<12 an D c2=1) Union SELECT * FROM TB where line= (select Max (line) from TB where line<12 and C1=1) 4. Insert a new node: Depending on the requirements, such as inserting a new one between J and K Node T:update TB set line=line+1 where Line>=10;insert into TB (LINE,ID,C4) VALUES (, ' T ', 1) This is the biggest difference from the path enumerations pattern, Insert is very convenient, only need to use SQL to all the following line number plus 1, no need to spend a lot of effort to maintain the path string, not easy to error. In addition, if the table is very large, in order to avoid the update TB set line=line+1 cause full table updates, affect performance, you can consider adding a groupid field, all nodes under the same root node share a groupid, all operations are within the GroupID group, For example, insert a new node instead: Update TB set line=line+1 where groupid=2 and Line>=8;insert into TB (GROUPID,LINE,C4) VALUES (2, 8, ' T ') because a G Roupid operation will not affect the other groupid, for complex additions and deletions can even be done in memory after the operation, delete the entire group at once and re-insert a new group.

Summarize:
The advantages of the method described above are:
1) Intuitive, easy to debug, is all tree structure database scheme is the only WYSIWYG, can directly see the shape of the tree scheme, the use of empty values make the tree structure at a glance.
2) SQL query, delete, insert is very convenient, no use of like syntax.
3) Only a single table is required
4) compatible with all databases
5) placeholders are places where the actual content to display should appear, facilitating the writing of tabular display controls such as grids

The disadvantage is 1) is not an infinite depth tree, the maximum allowable number of columns of the database is limited, usually up to 1000, which causes the depth of the tree can not exceed 1000, and considering the number of columns too much on the performance also has an impact, when used to recommend a relatively small depth limit such as 100.
2) SQL statements are long and often appear c9=1 or c8=1 or c7=1 ... or c1=1 this n-factorial query condition
3) The overall movement of the nodes of the tree is troublesome, it is necessary to translate the entire subtree or up and down, when the nodes need to move frequently, it is not recommended to use this scheme. For some only increase or decrease, infrequently mobile node applications such as forum posts and comments are more appropriate.
4) The column is very long, the space occupies a bit big.

The following is an additional content, based on the above, a simpler infinite depth tree scheme

Suddenly found that the above method is still too stupid, if not multiple columns instead of only one column to store the depth level, you can not be limited by the number of database columns, thus evolving into an infinite depth tree, although no longer have the WYSIWYG effect, but in performance and simplicity far more than the above "simple rough multi-column storage method", temporarily named it "Zhu Shi Depth Tree V2.0 method" (Note: If someone has invented this method, delete the first two words is good), the method is as follows: (Https://github.com/drinkjava2/Multiple-Columns-Tree/blob/master /treemappingv2.png) The tree structure on the left, which is mapped to the structure in the database, is shown in the table on the right, noting that the last line of each table must have an end tag, level set to 0:

1. Gets all the child nodes under the specified node, the line number of the known node is X,level y, GroupID is zselect * from TB2 where groupid=z and Line>=x and line< (select min From TB where line>x and Level<=y and groupid=z) For example, get the D node and all its child nodes: SELECT * from TB2 where groupid=1 and line>=7 and line< (select min (line) from TB2 where groupid=1 and Line>7 and level<=2) are similar, as long as a select * in SQL is replaced with a delete. Get only subordinate all sub-nodes of the D node: (query condition plus one level=y+1): SELECT * from TB2 where groupid=1 and line>=7 and Level=3 and line< (select Mi N (line) from TB2 where groupid=1 and Line>7 and level<=2) 2. Query the root node of any node, the groupid of the known node is Zselect * from TB2 where GroupI D=z and line=1 (or level=1) 3. Query the parent node at the top level of the specified node, the line number of the known node is X,level y, GroupID is zselect * from TB2 where groupid=z and line= (Selec T Max (line) from TB2 where Groupid=z and Line<x and level= (Y-1)) For example, check the upper parent node of the L node: SELECT * from TB2 where groupid=1 and L Ine= (line) from TB2 where groupid=1 and Line<11 and level=3) 4. Query all parent nodes of the specified node, the row number of the known node is x, and the depth is y:select * from TB 2 where Groupid=z and LIne= (select Max (line) from TB2 where Groupid=z and Line<x and level= (Y-1)) union SELECT * from TB2 where groupid=z and Line= (select Max (line) from TB2 where Groupid=z and Line<x and level= (Y-2)) ... union select * FROM TB2 where groupid=z and line= (line) from TB2 where Groupid=z and Line<x and level=1) For example, all parent nodes of the I node: SELECT * from TB2 where group Id=1 and Line= (select Max (line) from TB2 where groupid=1 and Line<12 and level=2) union SELECT * FROM TB2 where group Id=1 and Line= (line) from TB2 where groupid=1 and Line<12 and level=1) 5. Insert a new node: for example, insert a new node between J and K T:update TB2 Set line=line+1 where groupid=1 and Line>=10;insert into TB (groupid,line,id,level) VALUES (1,10, ' T ', 4);

Summary: The advantages of this method are:
1) is an infinite depth tree
2) Although not as the first scheme with the WYSIWYG effect, but still have intuitive, easy to debug features.
3) can make full use of SQL, query, delete, insert is very convenient, SQL than the first solution is much simpler, and did not use like fuzzy query syntax.
4) Only a single table is required.
5) compatible with all databases.
6) Small space consumption

Disadvantages are:
1) The tree node overall movement operation is a bit troublesome, applies to some only increases and decreases, infrequently moves the node the occasion such as forum posts and comments and so on. When it is really necessary to perform complex mobile node operations, one scenario is to perform the entire tree operation in memory and complete the sorting, deleting the entire old group after the operation is complete, and then inserting the new group into the database as a whole once.

January 22 Supplement:
The move operation of the node is a bit cumbersome, but it's not as hard to say as it is for query/delete/INSERT. For example, under MySQL, move the entire B-node tree to the H-node, and the operation between J and K is as follows:
Update TB2 set tempno=line*1000000 where groupid=1;
Set @nextNodeLine = (select min (line) from TB2 where groupid=1 and line>2 and level<=2);
Update TB2 set Tempno=9*1000000+line, level=level+2 where groupid=1 and line>=2 and line< @nextNodeLine;
Set @mycnt = 0;
Update TB2 Set line= (@mycnt: = @mycnt + 1) where groupid=1 order by Tempno;
The above example needs to add an integer type column named Tempno in the table, which is a lazy algorithm, although simple and clear, but the whole tree is reordered, so the efficiency is not high. It may be more appropriate to use the Adjacency list scheme in situations where frequent movement of nodes is required.

If you need to move the nodes frequently, and want to preserve the advantages of the 2 efficient query, there is one more way to add a parent node PID field and two auxiliary fields Tempno and temporder for sorting, (temporarily referred to as "depth tree V3.0 method"), This is equivalent to the combination of V2.0 and adjacency list mode, the advantage is that each mobile node, only need to change the PID, do not need a complex algorithm, can arbitrarily move, add, delete more than one node at a time, and finally unified call the following algorithm simple reordering can be done, The following example shows a complete conversion of the adjacency list mode to V2.0 mode, which is equivalent to a process of re-indexing the tree query:

CREATE TABLE TB3 (ID varchar), comments varchar, PID varchar, line Integer,level integer,tempno bigint, Temporder integer) insert into TB3 (id,comments,pid) VALUES (' A ', ' found a bug ', null); INSERT into TB3 (id,comments,pid) Valu Es (' B ', ' is a worm ', ' a '), inserts into TB3 (id,comments,pid) VALUES (' C ', ' no ', ' a '), and insert into TB3 (id,comments,pid) VALUES ( ' D ', ' A Bug ', ' a '), insert into TB3 (id,comments,pid) VALUES (' E ', ' Oh, a bug ', ' B '); insert into TB3 (ID,COMMENTS,PID) values (' F ', ' Solve it ', ' B '), insert into TB3 (id,comments,pid) VALUES (' G ', ' Careful it Bites ', ' C '); insert into TB3 (id,comments, Pid) VALUES (' H ', ' It does not bits ', ' d '), insert into TB3 (id,comments,pid) VALUES (' I ', ' found the reason ', ' D '); INSERT into TB 3 (Id,comments,pid) VALUES (' J ', ' solved ', ' h '); insert into TB3 (id,comments,pid) VALUES (' K ', ' uploaded ', ' H '); TB3 (Id,comments,pid) VALUES (' L ', ' Well done! ', ' H '); set @mycnt =0;update tb3 set line=0,level=0, tempno=0, temporder= (@ MYCNT: = @mycnt + 1) Order by Id;update Tb3 set LEvel=1, line=1 where PID is null;update tb3 set tempno=line*10000000 where line>0; Update Tb3 A, tb3 b set a.level=2, A.tempno=b.tempno+a.temporder where a.level=0 and A.pid=b.id and B.level=1;set @mycnt =0 ; Update tb3 Set line= (@mycnt: = @mycnt + 1) where level>0 order by Tempno;update Tb3 set tempno=line*10000000 where Lin e>0; Update Tb3 A, tb3 b set a.level=3, A.tempno=b.tempno+a.temporder where a.level=0 and A.pid=b.id and B.level=2;set @mycnt =0 ; Update tb3 Set line= (@mycnt: = @mycnt + 1) where level>0 order by Tempno;update Tb3 set tempno=line*10000000 where Lin e>0; Update Tb3 A, tb3 b set a.level=4, A.tempno=b.tempno+a.temporder where a.level=0 and A.pid=b.id and B.level=3;set @mycnt =0 ; Update tb3 Set line= (@mycnt: = @mycnt + 1) where level>0 order by Tempno;


The above algorithm takes advantage of the SQL function, the original may require a lot of SQL recursive query process into a finite number of times (= Tree maximum depth) of SQL operations, in order to highlight the algorithm, the above example assumes that there is only one root node, delete the GroupID and Endtag, the actual use to perfect this detail , the order by ID can also be changed to sort by another field. Because of the time relationship I will not give the V2.0 mode to adjacency list mode inverse algorithm (also known as the PID null, according to the V2.0 table upside down to the PID assignment process), but this algorithm is not important, because usually v3.0 table in each row will always have a PID).
To summarize:
Adjacency list mode: Easy to move/increment/truncate points, inconvenient to query
Depth Tree V2.0 mode: Easy to query, add/Remove points convenient, but there are efficiency problems, mobile node inconvenient
Depth Tree V3.0 mode: move/increase/cut point convenient, easy to query, the disadvantage is that each move/increment/cut point after the reconstruction line and level values for query. It is a combination of the last two patterns, and can be focused on, at any time in the two modes (modify mode and query mode) between the switch. The v3.0 method is equivalent to designing a query index for the adjacency list pattern.

Discovery of several tree structure database storage schemes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.