A tutorial to implement recursive query in PostgreSQL _ database other

Source: Internet
Author: User
Tags benchmark postgresql

Introduced

In Nilenso, the elder brother is engaged in one (open source of Oh!) Used to design and initiate investigations.

Here's an example of an investigation:

In the interior, it is such a drop:

A survey covered a number of issues (question). A range of issues can be grouped into (optional) a taxonomy (category). Our actual data structure is a bit more complicated (especially sub-question part of the sub problem), but first it's only question and category.


This is how we save question and category.

Each question and category has a order_number field. is an integral type that specifies its own relative relationship to other brothers.

For example, for the above survey:

Bar is smaller than the order_number of Baz.

Problems in such a classification can occur in the correct order:

# in CATEGORY.RB
 
def sub_questions_in_order
 questions.order (' order_number ')
end

Actually, that's how we got the whole investigation started. Each category gets to all of its child problems sequentially, and so on, traversing the entire entity tree.

This gives the depth-first order of the entire tree:

For more than 5 layers of embedded, more than 100 problems in the survey, so it is extremely slow to run.

Recursive query

Costa Rica has also used those gems such as Awesome_nested_set, but as far as I know, none of them is supported by multiple model to fetch.

Later, brother accidentally found a document said PostgreSQL have the support of recursive query! Well, this can have.

Then try to use recursive query to do this problem (at this time, the understanding of the elder brother is still very water, there is not in place, do not spray).

To do a recursive query in Postgres, you must first define an initialization query, which is not a recursive part.

In this case, the top question and category. The topmost elements do not have a parent category, so their category_id is empty.

(
 SELECT ID, content, order_number, type, category_id from questions
 WHERE questions.survey_id = 2 and QUESTIONS.C ATEGORY_ID is NULL
)
UNION
(
 SELECT ID, content, order_number, type, category_id from Categories
 WHERE categories.survey_id = 2 and categories.category_id is NULL
)

(This query and the next query assume that you want to get a survey with an ID of 2)

This takes you to the topmost element.

Here's how to write the recursive section. According to the following Postgres documentation:

The recursive part is all the subkeys of the element to get to the previously initialized part.

With recursive first_level_elements as (
 --non-recursive term
 (
   SELECT ID, content, Order_number, category_id from questions
   WHERE questions.survey_id = 2 and questions.category_id is NULL
  UNION
   SELECT ID, c Ontent, Order_number, category_id from Categories
   WHERE categories.survey_id = 2 and categories.category_id are null
   )
 )
 UNION
 --Recursive Term
 SELECT q.id, Q.content, Q.order_number, q.category_id from
 First_level_elements fle, Questions q
 WHERE q.survey_id = 2 and q.category_id = Fle.id
)
SELECT * from firs t_level_elements;

And so on, the recursive part can only get question. If the first subcategory of a subkey is a taxonomy? Postgres does not give references to non recursive entries more than once. So it's not possible to do union on question and category result sets. There's got to be a makeover here:


With recursive first_level_elements as (
 (
   SELECT IDs, content, Order_number, category_id from Questions
   where questions.survey_id = 2 and questions.category_id is NULL
  UNION
   SELECT ID, content, order_number, Categor y_id from Categories
   WHERE categories.survey_id = 2 and categories.category_id is NULL
  )
 UNION
 (
   SELECT e.id, E.content, E.order_number, e.category_id
   from
   (
    --Fetch questions and categories< C16/>select ID, content, Order_number, category_id from questions WHERE survey_id = 2
    UNION
    SELECT ID, content,  Order_number, category_id from categories where survey_id = 2
   ) e, first_level_elements fle
   where e.category_id = Fle.id
 )
)
SELECT * from first_level_elements;

Category and question result sets are union before the join to the non-recursive part.

This creates all the investigative elements:

Unfortunately, the order seems to be wrong.

sort within a recursive query

The problem is that while it is effective to get all level two elements for the first level element, this is a breadth-first lookup that actually requires depth first.

How is this going to work?

Postgres has the ability to build an array at query time.

Then build an array that holds the ordinal number of the element you are going to fetch. Call this array path. The path of an element is:

Path of the parent category (if any) + own order_number

If you use path to sort the result set, you can turn the query into a depth priority!

With recursive first_level_elements as (
 (
   SELECT ID, content, category_id, Array[id) as path from questions
   WHERE questions.survey_id = 2 and questions.category_id is NULL
  UNION
   SELECT ID, content, category_id, Array[id] as path from categories
   WHERE categories.survey_id = 2 and Catego RIES.CATEGORY_ID is NULL
  )
 UNION
 (
   SELECT e.id, E.content, e.category_id, (Fle.path | | e.id) C13/>from
   (
    SELECT ID, content, category_id, order_number from questions WHERE survey_id = 2
    UNION
    SELECT ID, content, category_id, order_number from categories WHERE survey_id = 2
   ) e, first_level_elements fle
   WHERE e.category_id = fle.id
 )
)
SELECT * from first_level_elements order by path;

It's close to success. But there are two What ' s your favourite song?

This is caused by the comparison ID to find the subkey:

WHERE e.category_id = fle.id

Fle contains both question and category. But what is needed is to match category only (because question will not have children).

Then give each of these queries a hard-coded type (type) so you don't have to try to check question for any subkeys:


With recursive first_level_elements as (
 (
   SELECT ID, content, category_id, ' questions ' as type, Array[id] A S path from questions
   WHERE questions.survey_id = 2 and questions.category_id are NULL
  UNION
   SELECT ID, content , category_id, ' categories ' as type, Array[id] as path from categories
   WHERE categories.survey_id = 2 and CATEGORIES.C ATEGORY_ID is NULL
  )
 UNION
 (
   SELECT e.id, E.content, e.category_id, E.type, Fle.path | | e.id)
   from
   (
    SELECT ID, content, category_id, ' questions ' as type, order_number from questions WHERE survey_id = 2
    union
   select ID, content, category_id, ' categories ' as type, order_number from categories WHERE survey_id = 2
   ) e, a _level_elements fle
   --Look for children only if the type is ' categories '
   WHERE e.category_id = fle.id and fle.t ype = ' categories '
 )
SELECT * from first_level_elements order by path;

This looks like it's OK. Get!

Let's look at the performance of this.


With this script (after creating a survey on the interface), Coson became a sequence of 10 child problems, each with a depth of 6 layers.

Survey = Survey.find (9)
10.times do
 category = Factorygirl.create (: Category,: Survey => survey)
 6.times Do
  category = Factorygirl.create (: category,: Category => category,: Survey => survey) end
 Factorygirl.create (: single_line_question,: category_id => category.id, survey_id => survey.id) End

Each problem sequence looks like this:

Let's see if the recursive query is any faster than the one that started it.

Pry (Main) > benchmark.ms {5.times {survey.find (9). Sub_questions_using_recursive_queries}}
=> 36.839999999999996
 
Pry (main) > benchmark.ms {5.times {survey.find (9). Sub_questions_in_order}}
=> 1145.1309999999999

More than 31 times times faster? Not bad.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.