Goodbye to MongoDB, hello PostgreSQL

Source: Internet
Author: User
Olery was founded almost five years ago. It started with a single product (OleryReputation) developed by Ruby proxy. Over time, we began to work on a series of different products and applications.

Olery was founded almost five years ago. It started with a single product (Olery Reputation) developed by Ruby proxy. Over time, we began to work on a series of different products and applications.

Olery was founded almost five years ago. It started with the single product (Olery Reputation) developed by Ruby proxy. Over time, we began to work on a series of different products and applications. Today, our products include not only (Olery) Reputation, but also Olery Feedback, Hotel Review Data API, and widgets, which can be embedded into websites and more products/services in the near future.

We have increased the number of applications. Today, we have deployed more than 25 different applications (all Ruby), some of which are web applications (Rails or Sinatra), but most of them are background running programs.

What we are most proud of is what we have achieved so far, but there is always one thing behind these achievements, that is, the basic database. Since the founding of Olery, We have installed a database, which uses MySQL to store (user, contract, etc.) core data, use MongoDB to store comments and similar data (that is, which data is easily restored when data is lost ). At the beginning, such installation and operation were very good. However, as the company grew up, we began to encounter various problems, especially MongoDB problems. Some of these problems are caused by the interaction between applications and databases, and some are caused by the Databases themselves.

For example, at a certain time point, we need to delete 1 million documents from MongoDB and then re-insert the data into MongoDB. This processing method makes the entire database almost locked for several hours, and the natural service performance will be reduced. It will not be unlocked until the database is repaired (that is, the repairDatabase command is executed on MongoDB. In addition, it takes several hours to complete the restoration, and the number of hours it takes depends on the size of the database.

In another instance, we noticed the performance degradation of our applications and managed to track MongoDB clusters. However, after further checks, we cannot find the real cause of the problem. No matter how we install the tool or what command we use, we cannot find the cause. It is not until we replace the primary election of the cluster that the performance returns to normal.

This is just two examples. We already have many such cases. At the core of this problem is that this is not only the database is running, but there is no absolute indication at any time as to what causes the problem.

No Mode Problems

In addition, the core issue we face is one of the important features of mongoDB: the lack of models. The lack of patterns may sound interesting and, in some cases, beneficial. However, for many usage of the schema-free storage engine, it causes internal problems between some modes. These models are not defined by your storage engine but by the behavior of your application and its possible needs.

For example, you may have a set of string-type title fields that your application requires. This pattern is very consistent with the current situation, even if it is not clearly defined. However, if this data result changes and times out, especially if the original data is not migrated to the new data structure, this becomes a problem (some non-mode storage engines are quite problematic ). For example, you may have the following Ruby code:

Post_slug = post. title. downcase. gsub (/\ W + /,'-')

In this way, it works normally for every document that has a title field and returns a String. However, for documents that use different field names (such as post_title) or do not have a title field at all, it will not work properly. To handle this situation, you need to adjust the code to the following:

If post. title
Post_slug = post. title. downcase. gsub (/\ W + /,'-')
Else
#...
End

Another way is to define a pattern in your model. For example, Mongoid, a popular MongoDB ODM for Ruby, allows you to do this. However, when you use these tools to define a schema, you may wonder why they do not define the schema in the database. In fact, this can solve another problem: reusability. If you only have one application, defining patterns in code is not a big problem. However, if you have many applications, this will soon become a big headache.

The schema-free storage engine wants to delete restrictions on the schema to make your work easier. But the reality is that the responsibility for ensuring data consistency is put on the user's own. Sometimes a non-mode engine can work, but I bet more times are counterproductive.

Good Database Requirements

With more special requirements, Olery forced me to seek a better database to solve the problem. For systems, especially databases, we pay great attention to the following points:

  • Consistency

  • Visualization of data and system behavior

  • Correctness and clarity

  • Scalable

  • Consistency is important because it helps us set clear expectations for the system. If the data is always stored in the same way, the system can easily use the data. If the database layer requires that one column of the table must exist, the application layer does not need to check whether the column data exists. Even if the database is under high pressure, the integrity of each operation must be ensured. Nothing is more frustrating than simply inserting data. After a few minutes, you cannot find the data.

    Visibility includes two points: the system itself and the degree to which data is obtained. If a system error occurs, debugging should be easy. In turn, the user should easily find the data to be queried.

    Correctness refers to the behavior of the system as we expected. If a field is defined as a numeric type, no one can insert text like it. In this regard, MySQL is notorious. Once you do this, you will get pseudo results.

    Scalability not only targets performance, but also involves financial aspects and how well the system can respond to changing needs. A system can hardly perform well without a large amount of money or slowing down the development cycle on which the system depends.

    Move from MongoDB

    With the above requirements in mind, we began to look for a database that replaces MongoDB. The features mentioned above are usually a core set of traditional RDBM features, so we have locked two candidates: MySQL and PostgreSQL.
    Originally, MySQL was the first candidate because some of our key data is already being stored. However, MySQL also has some problems. For example, when a field is defined as int (11), you can easily insert text data into the field because MySQL will try to convert it. The following are some examples:

    Mysql> create table example ('number' int (11) not null );
    Query OK, 0 rows affected (0.08 sec)

    Mysql> insert into example (number) values (10 );
    Query OK, 1 row affected (0.08 sec)

    Mysql> insert into example (number) values ('wat ');
    Query OK, 1 row affected, 1 warning (0.10 sec)

    Mysql> insert into example (number) values ('what is this 10 nonsense ');
    Query OK, 1 row affected, 1 warning (0.14 sec)

    Mysql> insert into example (number) values ('10 ');
    Query OK, 1 row affected, 1 warning (0.09 sec)

    Mysql> select * from example;
    + -------- +
    | Number |
    + -------- +
    | 10 |
    | 0 |
    | 0 |
    | 10 |
    + -------- +
    4 rows in set (0.00 sec)

    It is worth noting that MySQL will issue a warning in these cases. However, they are only warnings. They are usually ignored (if not always.

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.