How to use mysql to store tree relationships

Last Update:2014-04-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I recently encountered a storage requirement for a file system similar to my business. I have discussed how to store a tree in mysql and shared it to see if there are any better solutions.

I. Current situation

First, assume that there is such a tree with a total of nine nodes. 1 is the root node, and there are three layers in total. (Of course, the actual business is not that simple)

The original table structure is as follows:

Id	Parents_id	Name	Full_path
1	0	A	/
2	1	B	/A/B
3	1	C	/A/c
4	1	D	/A/d
5	4	E	/A/d/e
6	4	F	/A/d/f
7	5	G	/A/d/e/g
8	5	H	/A/d/e/h
9	5	I	/A/d/e/I

The following basic requirements must be met:

1. Expand the directory level from top to bottom

2. Check the full path of a directory.

3. rename the name of a path

4. Move a directory to another directory

The existing table structure can meet the above requirements:

, Id parents_id, full_path id, name $ newname id, parents_id $ new_parents_id, full_path = $ new_full_path id $ id; (modify the parent-child relationship to the new relationship)

However, the problem with the existing table structure is that there are 3rd and 4th requirements, instead of updating only one row, because of the existence of full_path, all the nodes that have been modified, the full_path of all nodes under it must be modified. This adds a lot of write operations. If the tree is deep, for example, if there are 100 layers, modify the data of 3rd layers, this means that all data in layer 97 under it needs to be modified, and the resulting write operation is a bit scary.

As shown in the column, if the name of 4 is modified, it will affect the update of 6 rows of data in a total of 4, 5, 6, 7, 8, 9. This write logic is slightly magnified.

   namex,full_path  id   full_path  id   full_path  id   full_path  id   full_path  id   full_path  id;

How can this problem be solved?

1. Remove the full_path field.

The most serious problem described above is the write logic amplification problem. After the full_path field is removed, six updates will become one update.

This optimization seems to perfectly solve the write logic amplification problem, but introduces another problem, that is, it is difficult to meet requirement 2.

The original SQL statement is:

 full_path    id$id;

However, after the full_path field is removed, it becomes:

 parents_id    id  parents_id    id  $parents_id;

For example, to obtain the full path of 9, you need the following SQL

 parents_id,name    id   parents_id,name    id   parents_id,name    id   parents_id,name    id  ;

When parents_id = 0 is final, all names are aggregated.

If front-end implementation is required for all operations, the front-end interaction with DB is required four times. The Link Cost consumed during this period will greatly prolong the overall response time, which is basically unacceptable.

If you want to use this solution, it seems that only the stored procedure is used to return all computing tasks locally to the end to ensure that one request and one return are the most efficient, however, it is highly risky to use mysql stored procedures.

2. Product Specifications

Our problems may occur when there are many layers of trees, so we can use product specifications to limit them. For example, we can only have four layers at the deepest level, in this way, the problem is contained before it occurs. (Of course, sometimes the most effective optimization solution is the least possible, and the product won't be so easy to compromise)

3. Add cache

Since the write logic is amplified, what if we change the optimization idea from reducing the number of writes to improving the write performance?

We introduce nosql storage such as redis and store id and full_path in redis. After each database update, we update redis. redis has a much higher write performance than mysql. This problem can also be solved.

However, because the update is not synchronous, data will eventually be consistent after asynchronous update, but there may be inconsistencies at a specific time point.

In addition, due to changes in the storage architecture, some compromise needs to be made on the code, whether it is a read operation or a write operation.

4. Change the overall Storage Structure

The above scheme is based on the fact that the existing table structure is not changed much. Is it possible that the situation will be different after the table structure is modified?

We introduce the hierarchical concept to the storage structure shown in the example, remove full_path, And see if the problem can be solved.

The new table structure is as follows:

Id_name (ing between id and name)

Id	Name
1	A
2	B
3	C
4	D
5	E
6	F
7	G
8	H
9	I

Relation (parent-child relationship)

Id	Chailds	Depth
1	1	0
1	2	1
1	3	1
1	4	1
1	5	2
1	6	2
1	7	3
1	8	3
1	9	3
2	2	0
3	3	0
4	4	0
4	5	1
4	7	2
4	8	2
4	9	2
5	5	0
5	7	1
5	8	1
5	9	1
6	6	0
7	7	0
8	8	0
9	9	0

The first of the two new tables does not need to be explained. The second id field stores the node id, and the chailds field stores all its subnodes (instead of directly storing the chaild, but all the child nodes at any level ), the depth field stores the hierarchical relationship between its subnodes and the current node.

Let's see if this design can meet the first four requirements:

Requirement 1: Expand directories layer by layer

 id,depth  table2  id name  table1  id$id;

Because each id stores all its subnodes, If you query all the sub-directories of table 4, select id, depth from table2 where id = 4; one SQL statement can obtain all the results, as long as the front-end is working on it.

Id	Chailds	Depth
4	4	0
4	5	1
4	6	1
4	7	2
4	8	2
4	9	2

Requirement 2: obtain the full path of a directory

 id,depth  table2  chailds  $id;

Because each id stores all subnodes, the contrast is also an SQL task. For example, to query the full path of 9, select id, depth from table2 where chailds = 9; the result is

Id	Chailds	Depth
9	9	0
5	9	1
4	9	2
1	9	3

Through the above results, the front-end computing can get a full path of 9, which is obtained by an SQL statement. The front-end does not need to interact with the db multiple times or introduce the stored procedure.

Requirement 3: Change the directory name

 table1  name  $new_name  id  $id ;

This is the easiest thing to do. You only need to change the ing table.

Requirement 4: Modify the parent-child relationship of a node

 id  table2  id$id  depth    table2  id  id  table2  id  table2  ($sql2_id,$id,$depth);

This requirement currently seems to be the most troublesome. As shown in the example, what operations are required to move 5 to 3?

I: first, find out which nodes belong to each of the five nodes.

Select id from table 2 where id = 5 and depth> 0;

Id	Chailds	Depth
1	5	2
4	5	1

II: delete these records.

Delete from table2 where id = 1 and chailds = 5;

Delete from table2 where id = 4 and chailds = 5;

III: Find out the child nodes of the new parent node.

Select id, depth from table where chailds = 3 and depth> 0;

Id	Chailds	Depth
1	3	1

IIII:

Insert a new link based on the result of III.

Insert into table2 values (1, 5, 2 );

Because the new parent node is only a child node of 1, you only need to add 5 and 1 relationships, and because 3 is the new parent node of 5, then the relationship between 5 and 1 should be "+ 1" than that between 3 ".

The nodes under all five do not need to be changed, because in this design, all nodes are the root of their own subnodes, so you only need to modify the relationship of five.

However, this solution clearly shows that the relationships to be stored are many times more than the original ones, especially when the relationship level is deep.

Iii. Summary

At present, it seems that there are no outstanding solutions at a certain cost.

Ps: the ideas in this Article refer to "SQL anti-pattern". If you are interested, you can study it.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to use mysql to store tree relationships

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How to use mysql to store tree relationships

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support