Tutorial on implementing recursive query in PostgreSQL, postgresql Recursion
Introduction
I am working on a program in nilint32 (Open Source !) It is used to design and initiate surveys.
The following is an example of a survey:
Internally, it indicates the drop:
A Survey involves many questions (question ). A series of problems can be classified into (optional) one category. Our actual data structure will be a little more complex (especially the sub-question section of the sub-problem), but first it will only have question and category.
In this way, we save question and category.
Each question and category has an order_number field. It is an integer used to specify its own relative relationship with other brothers.
For example, for the above investigation:
Bar order_number is smaller than Baz.
In this way, problems under a category can appear in the correct order:
# In category.rb def sub_questions_in_order questions.order('order_number')end
In fact, we started the entire fetch investigation. Each category gets all sub-problems under it in order, and accordingly traverses the entire entity tree.
This gives the order of depth precedence for the entire tree:
The investigation of more than five layers of embedded data and more than 100 problems is very slow.
Recursive query
I have also used gems such as awesome_nested_set, but none of them support fetch across multiple models.
Later, he accidentally discovered a document saying that PostgreSQL has support for Recursive queries! Well, this can be.
Let's try to use recursive queries to solve this problem. (at this time, my brother is still very familiar with it. If it is not in place, Do not spray it ).
To perform recursive queries in Postgres, you must first define an initialization query, that is, the non-recursive part.
In this example, question and category are at the top. No parent category exists for the top-level elements, so their category_id is empty.
( SELECT id, content, order_number, type, category_id FROM questions WHERE questions.survey_id = 2 AND questions.category_id IS NULL)UNION( SELECT id, content, order_number, type, category_id FROM categories WHERE categories.survey_id = 2 AND categories.category_id IS NULL)
(This query and the subsequent query assume that the investigation with id 2 is to be obtained)
In this way, the upper-level elements are obtained.
The following section describes recursion. Follow the s document below:
The recursion part is to obtain all the sub-items of the elements obtained in the initialization part.
WITH RECURSIVE first_level_elements AS ( -- Non-recursive term ( ( SELECT id, content, order_number, category_id FROM questions WHERE questions.survey_id = 2 AND questions.category_id IS NULL UNION SELECT id, content, order_number, category_id FROM categories WHERE categories.survey_id = 2 AND categories.category_id IS NULL ) ) UNION -- Recursive Term SELECT q.id, q.content, q.order_number, q.category_id FROM first_level_elements fle, questions q WHERE q.survey_id = 2 AND q.category_id = fle.id)SELECT * from first_level_elements;
And so on. The recursive part can only obtain question. What if the first subcategory of a subitem is a classification? Postgres does not reference non-recursive items more than once. Therefore, UNION cannot be performed on the question and category result sets. Here we need to make a transformation:
WITH RECURSIVE first_level_elements AS ( ( ( SELECT id, content, order_number, category_id FROM questions WHERE questions.survey_id = 2 AND questions.category_id IS NULL UNION SELECT id, content, order_number, category_id FROM categories WHERE categories.survey_id = 2 AND categories.category_id IS NULL ) ) UNION ( SELECT e.id, e.content, e.order_number, e.category_id FROM ( -- Fetch questions AND categories SELECT id, content, order_number, category_id FROM questions WHERE survey_id = 2 UNION SELECT id, content, order_number, category_id FROM categories WHERE survey_id = 2 ) e, first_level_elements fle WHERE e.category_id = fle.id ))SELECT * from first_level_elements;
Before joining the non-recursive part, the category and question result sets are UNION.
This generates all the investigation elements:
Unfortunately, the order seems wrong.
Sort in recursive query
This problem occurs when the first-level element obtains all the second-level elements, but this is done with a breadth-first query. What is actually needed is a depth-first query.
How can this problem be solved?
Postgres can be used to create an array during query.
Then, create an array that stores the serial numbers of the fetch elements. Call this array path. The path of an element is:
Path of the parent category (if any) + order_number
If you use path to sort the result set, you can turn the query into a depth-first query!
WITH RECURSIVE first_level_elements AS ( ( ( SELECT id, content, category_id, array[id] AS path FROM questions WHERE questions.survey_id = 2 AND questions.category_id IS NULL UNION SELECT id, content, category_id, array[id] AS path FROM categories WHERE categories.survey_id = 2 AND categories.category_id IS NULL ) ) UNION ( SELECT e.id, e.content, e.category_id, (fle.path || e.id) FROM ( SELECT id, content, category_id, order_number FROM questions WHERE survey_id = 2 UNION SELECT id, content, category_id, order_number FROM categories WHERE survey_id = 2 ) e, first_level_elements fle WHERE e.category_id = fle.id ))SELECT * from first_level_elements ORDER BY path;
This is almost successful. But there are two What's your favorite song?
This is caused by the comparison ID to find the subitem:
WHERE e.category_id = fle.id
Fle includes both question and category. However, you only need to match category (because question does not have subitems ).
For each such query, a hard encoding type is required, so that you do not need to try to check whether question has any subitem:
WITH RECURSIVE first_level_elements AS ( ( ( SELECT id, content, category_id, 'questions' as type, array[id] AS path FROM questions WHERE questions.survey_id = 2 AND questions.category_id IS NULL UNION SELECT id, content, category_id, 'categories' as type, array[id] AS path FROM categories WHERE categories.survey_id = 2 AND categories.category_id IS NULL ) ) UNION ( SELECT e.id, e.content, e.category_id, e.type, (fle.path || e.id) FROM ( SELECT id, content, category_id, 'questions' as type, order_number FROM questions WHERE survey_id = 2 UNION SELECT id, content, category_id, 'categories' as type, order_number FROM categories WHERE survey_id = 2 ) e, first_level_elements fle -- Look for children only if the type is 'categories' WHERE e.category_id = fle.id AND fle.type = 'categories' ))SELECT * from first_level_elements ORDER BY path;
This looks OK. Done!
Let's take a look at the performance.
Using the script below (after a survey is created on the Interface), Brother generates 10 subproblem sequences, each of which is as deep as six layers.
survey = Survey.find(9)10.times do category = FactoryGirl.create(:category, :survey => survey) 6.times do category = FactoryGirl.create(:category, :category => category, :survey => survey) end FactoryGirl.create(:single_line_question, :category_id => category.id, :survey_id => survey.id)end
Each problem sequence looks like this:
Let's see if recursive queries are faster than the first one.
pry(main)> Benchmark.ms { 5.times { Survey.find(9).sub_questions_using_recursive_queries }}=> 36.839999999999996 pry(main)> Benchmark.ms { 5.times { Survey.find(9).sub_questions_in_order } }=> 1145.1309999999999
Faster than 31 times? Nice.