Create a Mysql model that implements the Disqus comment template _mysql

Source: Internet
Author: User
Tags connection pooling postgresql disqus

For a long time, PostgreSQL was considered a replacement for MySQL. During that time, however, there was no way to get him to the level that MySQL could achieve. These problems have not been solved in recent years and have produced many interesting tools to make up for pg. We used two slony and pgbouncer in the Disqus. Slony allows us to replicate data (and sometimes partition), while Pgbouncer solves the problem of maintaining links and connection pooling.

In addition, let's look at their language: I am happy this week to learn how to use recursive queries in PGSQL8.4, they are too powerful. This is what I really want to discuss with you in this article. MySQL allows us to work and work well, but you can only do it within the structure of the engine. Although it is still true in PG, you have more options. Therefore, I would like to talk about the problem of tree's clue.

Everyone knows that Disqus is not just the biggest Django site (we have nearly 1 million visits per month), but he is also one of the biggest online comment systems. We provide many functions for thousands of sites, the most basic of which is commenting as a tree-like structure of the clues.

PostgreSQL offers a number of clues about the solution. The most common (and most efficient) method is the improved version of the forward traversal. Simply put, he adds a left order, a right order, and they will be updated when you add a comment. We also have another standard method (Reddit used very happily), that is "take out everything, and then complete the operation in memory." In fact, it's not just Reddit.


To continue to look at what Pgsql has to offer us, we can also find two options (lowest in version 8.4). One of these is the use of a PG built-in module called Ltree. He allows you to store the full path of a node (all parent nodes) while allowing you to query them through standard SQL statements. It can be very useful when you need to sort by "the earliest release", since it becomes a simple sort of "ltree--column". However, as most of the time, Disqus's situation is not so simple.

Our second solution is a recursive query. It took me a long time to understand how he worked, but when I understood it, I was fascinated by his ability. Postgre offers a number of features that MySQL does not have, such as over () modifiers. They really did very well.

Let's go deeper into our problems, which would be a big problem. Now, Disqus and Reddit are just as simple as other solutions on the web, as are the ways to handle multithreading. I said that the poor is not to say that the code is not good, but his optimization did not do what he should do. It wasn't until someone (you, Obama) started using the program, and everyone wanted to reply to him, we found out that something was wrong. Once again we think of Django (even if they are getting bigger) and group them through business logic.

Since 8.4, we've been able to use recursive queries to solve this problem (in many cases we've started doing it ourselves, albeit a little bit more complex) this is quite simple.

So let's take a basic example. We have a comment model and it looks a bit like this:

CREATE TABLE comments (
  ID serial PRIMARY KEY, message
  VARCHAR,
  author VARCHAR,
  parent_id INTEGER REFERENCES Comments (ID)
);
Insert into Comments (message, author, parent_id)
  values (' This thread is really cool! ', ' David ', NULL), (' Ya David, W E love it! ', ' Jason ', 1, (' I agree david! ', ' Daniel ', 1), (' Gift Jason ', ' Anton ', 2),
  (' Very interesting post! ', ' the DZ ', NULL), (' You sir, are wrong ', ' Chris ', 5), (' Agreed ', ' G ', 5), (' Fo sho, yall ', ' Mac ', 5);

What we are doing now is to build a basic evaluation model. Our message, the author's parent comment (which is optional). Now, let's learn how to use recursive queries to easily reorder this datd, sorted by ID in ascending order.


With recursive CTE (IDs, message, author, path, parent_id, depth) as (
  SELECT IDs, message
    ,
    author,
    Array[i D] As path,
    parent_id,
    1 as depth
  from  comments
  WHERE  parent_id are NULL
 
  UNION
 
  All SELECT comments.id,
    comments.message,
    comments.author,
    cte.path | | comments.id,
    comments.parent_id,
    cte.depth + 1 as depth
  from  comments
  JOIN cte on comments.parent_id = cte.id< c21/>)
  SELECT ID, message, author, path, depth to CTE order by
Path;

It's sweet, isn't it? Oh, wait, are you confused? So the query I've been looking for is more complicated by a whole bunch of amazing bugs.
Pgexperts the right path for us.

Now, I'm not going to drill too much because there are better tutorials in this mode to handle recursive queries, but we finished our results.

We're dealing with a huge information set, and some comments have nearly thousands of replies. If 99% of comments have only 100 replies, it's not a problem to put them in memory, but when they start to grow, we end up wasting a lot of time. Recursive queries in Pgsql allow us to simply give this work to the database (and sometimes they handle it much faster than we do), and save us a lot of time and resources spent on network propagation and web processing.

There is an example that will give you a more intuitive understanding of how efficient he is, and we've seen nearly 500% of the time that we've been saving on the SQL processing time of a large database (returning 25 results instead of 1000). This does not even include our cost at the program level. Yes, yes, these SQL statements are 5 times times faster than other databases on the database tier.

All in all, as a champion of MySQL, I was shocked by the performance, scale, and flexibility that Disqus used PostgreSQL to achieve. I am looking forward to discovering what else we can do through this platform to find the challenges that await us.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.