Hierarchical data Model (infinite level directory) algorithm __ algorithm

Source: Internet
Author: User
Tags mysql command line
Original: http://bbs.blueidea.com/viewthread.php?tid=2780498&highlight=
Note: Although this article is based on MySQL as an example to introduce, but the same applies to other databases.

Also describes the articles for nested collection models: http://www.nirvanastudio.org/php/hierarchical-data-database.html
The author of this article: Yimin http://liyimin.net/blog
Deepen reading Celko joe-trees and hierarchies in SQL for Smarties






Management of hierarchical data in MySQL
by Mike Hillyer
Introduction
Most users have processed hierarchical data (hierarchical data) in the database, and think that the management of hierarchical data is not the purpose of relational databases. The reason for this is that the tables in the relational database are not hierarchical, they are simply flattened lists, and the hierarchical data has a parent-child relationship, and it is obvious that the tables in the relational database do not naturally exhibit their layered characteristics.
We believe that hierarchical data is a collection of data for each item with only one parent and 0 or more subkeys (except the root item, which has no parent). Tiered data exists in many database based applications, including categories in forums and mailing lists, business organization charts, classification of content management systems, and product classifications. We intend to use the following product categories for a fictitious electronic store:

These classifications are similar to the hierarchy of classifications in some of the examples mentioned above. In this paper, we will start from the traditional adjacency list (adjacency list) model, and describe 2 kinds of models for processing hierarchical data in MySQL.

adjacency Table Model
The classification data for the above example will be stored in the data table below (I have given all the data table creation, the data inserts the code, you may follow):
CREATE TABLE Category (
category_id INT auto_increment PRIMARY KEY,
Name VARCHAR is not NULL,
Parent INT DEFAULT NULL);
INSERT into category
VALUES (1, ' Electronics ', NULL), (2, ' televisions ', 1), (3, ' tube ', 2),
(4, ' LCD ', 2), (5, ' PLASMA ', 2), (6, ' portable Electronics ', 1),
(7, ' MP3 players ', 6), (8, ' FLASH ', 7),
(9, ' CD players ', 6), (Ten, ' 2 WAY radios ', 6);
SELECT * from category order by category_id;
+-------------+----------------------+--------+
|         category_id | name | Parent |
+-------------+----------------------+--------+
| 1 |   Electronics | NULL |
| 2 |     Televisions | 1 |
| 3 |     Tube | 2 |
| 4 |     LCD | 2 |
| 5 |     PLASMA | 2 |
| 6 |     Portable Electronics | 1 |
| 7 |    MP3 Players | 6 |
| 8 |    FLASH | 7 |
| 9 |    CD Players | 6 |
| 10 |    2 WAY Radios | 6 |
+-------------+-----------------------+------+
Rows in Set (0.00 sec)
In the adjacency table model, each item in the datasheet contains an indicator pointing to its parent. In this example, the parent of the topmost item is a null value (NULL). The advantage of the adjacency table model is that it is simple enough to see that Flash is the MP3 of the player, which is the child of the portable electronics, and which is the electronics child. Although the adjacency table model is fairly simple to handle in client-side coding, the model has many problems if it is pure SQL coding.
retrieving an entire tree
Typically, the first task in dealing with layered data is to render a complete tree in some form of indentation. For this reason, the usual practice in pure SQL encoding is to use a self-join:
SELECT T1.name as Lev1, t2.name as Lev2, t3.name as Lev3, t4.name as Lev4
From category as T1
Left JOIN category as t2 on t2.parent = t1.category_id
Left JOIN category as T3 on t3.parent = t2.category_id
Left JOIN category as T4 on t4.parent = t3.category_id
WHERE t1.name = ' electronics ';
+-------------+----------------------+--------------+-----+
|     Lev1 |  Lev2 | Lev3 | Lev4 |
+-------------+----------------------+--------------+------+
| Electronics | Televisions | Tube | NULL |
| Electronics | Televisions | LCD | NULL |
| Electronics | Televisions | PLASMA | NULL |
| Electronics | Portable Electronics | MP3 Players | FLASH |
| Electronics | Portable Electronics | CD Players | NULL |
| Electronics | Portable Electronics | 2 WAY Radios | NULL |
+-------------+----------------------+--------------+------+
6 rows in Set (0.00 sec)
Retrieving all leaf nodes
We can use left join (left JOIN) to retrieve all the leaf nodes in the tree (nodes without children):
SELECT T1.name from
Category as T1 left JOIN category as T2
On t1.category_id = T2.parent
WHERE t2.category_id is NULL;
+--------------+
| name |
+--------------+
| Tube |
| LCD |
| PLASMA |
| FLASH |
| CD Players |
| 2 WAY Radios |
+--------------+
retrieving a single path
We can also retrieve a single path by using the connection:
SELECT T1.name as Lev1, t2.name as Lev2, t3.name as Lev3, t4.name as Lev4
From category as T1
Left JOIN category as t2 on t2.parent = t1.category_id
Left JOIN category as T3 on t3.parent = t2.category_id
Left JOIN category as T4 on t4.parent = t3.category_id
WHERE t1.name = ' electronics ' and t4.name = ' FLASH ';
+-------------+----------------------+-------------+------+
|    Lev1 | Lev2 | Lev3 | Lev4 |
+-------------+----------------------+-------------+--------+
| Electronics | Portable Electronics | MP3 Players | FLASH |
+-------------+----------------------+-------------+-------+
1 row in Set (0.01 sec)
The main limitation of this approach is that you need to add a self connection for each layer of data, and as the level increases, the connection becomes more and more complex, and the performance of the retrieval is naturally reduced.
Limitations of adjacency Table model
It is difficult to realize the adjacency table model with pure SQL coding. Before we retrieve the path to a category, we need to know the level at which the taxonomy resides. In addition, we should be particularly careful when deleting nodes, because it is possible to isolate a Shang tree (all of his subcategories are orphaned when the portable electronics category is deleted). Some of the limitations can be solved by using client-side code or stored procedures, we can start from the bottom of the tree to get a tree or a single path, we can also delete the node to the child node to a new parent node, to prevent the generation of orphaned subtree.


nested Sets (Nested set) model
I want to focus on a different approach in this article, commonly known as a nested set model. In the nested collection model, we will look at our hierarchical data in a new way, not as a line or a point, but as a nested container. I tried to draw the electronics category in the form of a nested container:


From the image above we can see that we still maintain the level of the data, and the parent category surrounds its subcategories. In a datasheet, we represent the hierarchical nature of the data in a nested collection model by using the left value of the nested relationship that represents the node and the right value:
CREATE TABLE Nested_category (
category_id INT auto_increment PRIMARY KEY,
Name VARCHAR is not NULL,
LfT INT not NULL,
RGT INT not NULL
);
INSERT into Nested_category
VALUES (1, ' Electronics ', 1,20), (2, ' televisions ', 2,9), (3, ' tube ', 3,4),
(4, ' LCD ', 5,6), (5, ' PLASMA ', 7,8), (6, ' portable Electronics ', 10,19),
(7, ' MP3 players ', 11,14), (8, ' FLASH ', 12,13),
(9, ' CD players ', 15,16), (Ten, ' 2 WAY radios ', 17,18);
SELECT * from Nested_category order by category_id;
+--------------+----------------------+-----+------+
| category_id | name | LfT | RGT |
+--------------+----------------------+-----+------+
| 1 | Electronics | 1 | 20 |
| 2 | Televisions | 2 | 9 |
| 3 | Tube | 3 | 4 |
| 4 | LCD | 5 | 6 |
| 5 | PLASMA | 7 | 8 |
| 6 | Portable Electronics | 10 | 19 |
| 7 | MP3 Players | 11 | 14 |
| 8 | FLASH | 12 | 13 |
| 9 | CD Players | 15 | 16 |
| 10 | 2 WAY Radios | 17 | 18 |
+--------------+----------------------+-----+----+
We use LFT and RGT instead of left and right because in MySQL, leave and right are reserved words.
Http://dev.mysql.com/doc/mysql/en/reserved-words.html, there is a detailed list of MySQL reserved words.
So, how do we determine the left and right values? We start at the far left of the outer node, numbering from left to right:

This numbering is also true for a typical tree structure:

When we number the tree structure, we go from left to right, one layer at a time, and we iterate from left to right before assigning the right value to the node.
The child node assigns the left and right values to its child nodes. This method is called an improved first order traversal algorithm.
retrieving an entire tree
We can connect the parent node to the child node to retrieve the whole tree by a lft because the child node's value is always in its
Between the LFT value of the parent node and the RGT value:
SELECT Node.name
From Nested_category as node,
Nested_category as Parent
WHERE node.lft BETWEEN Parent.lft and PARENT.RGT
and parent.name = ' electronics '
Order BY NODE.LFT;
+---------------------+
| name |
+---------------------+
| Electronics |
| Televisions |
| Tube |
| LCD |
| PLASMA |
| Portable electronics|
| MP3 Players |
| FLASH |
| CD Players |
| 2 WAY Radios |
+---------------------+
Unlike previous adjacency table models, this query can work well regardless of how deep the tree is. We don't care about node's RGT value in the between clause because the parent node that uses node's RGT value is always the same as the LFT value.
Retrieving all leaf nodes
All leaf nodes are retrieved, and the method of using nested collection model is much simpler than the left join method of adjacency table model. If you look carefully at the Nested_category table, you may have noticed that the left and right values of the leaf nodes are contiguous. To retrieve a leaf node, we just look for the node that satisfies the rgt=lft+1:
SELECT Name
From Nested_category
WHERE RGT = lft + 1;
+---------------+
| name |
+---------------+
| Tube |
| LCD |
| PLASMA |
| FLASH |
| CD Players |
| 2 WAY Radios |
+---------------+
retrieving a single path
In a nested collection model, we can retrieve a single path without multiple connections:
SELECT Parent.name
From Nested_category as node,
Nested_category as Parent
WHERE node.lft BETWEEN Parent.lft and PARENT.RGT
and node.name = ' FLASH '
Order BY NODE.LFT;
+------------------------+
| name |
+------------------------+
| Electronics |
| Portable Electronics |
| MP3 Players |
| FLASH |
+------------------------+
retrieving the depth of a node
We already know how to present a whole tree, but in order to better identify the level of the node in the tree, how can we retrieve the node in the tree depth? We can add the Count function and the GROUP BY clause to the previous query statement to implement the following:
SELECT Node.name, (COUNT (parent.name) 1)
As depth
From Nested_category as node,
Nested_category as Parent
WHERE node.lft BETWEEN Parent.lft and PARENT.RGT
GROUP by Node.name
Order BY NODE.LFT;
+----------------------+--------+
| name | Depth |
+----------------------+---+
| Electronics | 0 |
| Televisions | 1 |
| Tube | 2 |
| LCD | 2 |
| PLASMA | 2 |
| Portable Electronics | 1 |
| MP3 Players | 2 |
| FLASH | 3 |
| CD Players | 2 |
| 2 WAY Radios | 2 |
+----------------------+----+
We can indent the category name according to the depth value, using the concat and repeat string function 1:
SELECT CONCAT (REPEAT ("", COUNT (parent.name) 1),
Node.name) as Name
From Nested_category as node,
Nested_category as Parent
1 [The] indentation is phpmyadmin under the display, it is recommended to run the query under the MySQL command line.
WHERE node.lft BETWEEN Parent.lft and PARENT.RGT
GROUP by Node.name
Order BY NODE.LFT;
+---------------------------+
| name |
+---------------------------+
| Electronics |
| Televisions |
| Tube |
| LCD |
| PLASMA |
| Portable Electronics |
| MP3 Players |
| FLASH |
| CD Players |
| 2 WAY Radios |
+---------------------------+
Of course, in a client application you might use the depth value to directly show the level of the data. The web Developer will traverse the tree and add <li></li> and <ul></ul> tags as the depth value increases and decreases.
Retrieving the depth of the subtree when we need the depth information for the subtree, we cannot limit the node or parent in the connection, because doing so disrupts the order of the DataSet. So, we added a third self join as a subquery to come up with the depth value of the new start of the subtree:
SELECT Node.name, (COUNT (Parent.name) (
Sub_tree.depth + 1)) as depth
From Nested_category as node,
Nested_category as parent,
Nested_category as Sub_parent,
(
SELECT Node.name, (COUNT (parent.name) 1)
As depth
From Nested_category as node,
Nested_category as Parent
WHERE node.lft BETWEEN Parent.lft and PARENT.RGT
and node.name = ' portable Electronics '
GROUP by Node.name
ORDER BY Node.lft
) as Sub_tree
WHERE node.lft BETWEEN Parent.lft and PARENT.RGT
and Node.lft BETWEEN Sub_parent.lft and SUB_PARENT.RGT
and sub_parent.name = Sub_tree.name
GROUP by Node.name
Order BY NODE.LFT;
+----------------------+-------+
| name | Depth |
+----------------------+------+
| Portable Electronics | 0 |
| MP3 Players | 1 |
| FLASH | 2 |
| CD Players | 1 |
| 2 WAY Radios | 1 |
+----------------------+---+
This query can retrieve the depth value of a tree of ideas, including the root node. The depth value here is related to the node you specified.
Retrieving direct child nodes of a node
You can imagine the classification of electronic products on your retail website. When the user clicks on the category, you will be presenting the product under the category, as well as listing the direct subcategories under the category, rather than all the categories under the category. To do this, we only present the node and its immediate child nodes, and no longer present a deeper node. For example, when rendering portableelectronics classifications, we only present MP3 players, CD players, and 2 WAY radios categories, not flash classifications. To implement it is very simple to add a HAVING clause on a previous query statement:
SELECT Node.name, (COUNT (Parent.name) (
Sub_tree.depth + 1)) as depth
From Nested_category as node,
Nested_category as parent,
Nested_category as Sub_parent,
(
SELECT Node.name, (COUNT (parent.name) 1)
As depth
From Nested_category as node,
Nested_category as Parent
WHERE node.lft BETWEEN Parent.lft and PARENT.RGT
and node.name = ' portable Electronics '
GROUP by Node.name
ORDER BY Node.lft
) as Sub_tree
WHERE node.lft BETWEEN Parent.lft and PARENT.RGT
and Node.lft BETWEEN Sub_parent.lft and SUB_PARENT.RGT
and sub_parent.name = Sub_tree.name
GROUP by Node.name
Having depth <= 1
Order BY NODE.LFT;
+----------------------+-------+
| name | Depth |
+----------------------+-------+
| Portable Electronics | 0 |
| MP3 Players | 1 |
| CD Players | 1 |
| 2 WAY Radios | 1 |
+----------------------+----+
If you do not want to render the parent node, you can change the having depth <= 1 as having depth = 1.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.