Don't use MongoDB? Are you sure?!

Source: Internet
Author: User
Tags benchmark connection pooling mongodb mongodb sharding

Someone sent an article don ' t use MongoDB's blood and Tears, I translated the original text as follows, you can see. However, I think we have to go to see the 10gen CTO's reply to this matter, we have to go to the Reddit to see what everyone said, 10gen CTO's reply to this matter behind also a bunch of people are talking about this matter, and some programmers began to read the source of MongoDB, hehe. It seems that these things in MongoDB are not true.

10gen the CTO was not fully aware of the matter, and responded to each of the articles in reply. I put the general meaning of its reply in the original text. However, it is interesting that the discussion of the programmers. I suggest you take a look.

Body

For various political reasons, I have not said anything this time, but now I feel responsible for the community, so I want to prevent you from putting your business on MongoDB.

My team is using MongoDB on a system that has a huge number of users (a large company with tens of thousands of users), which gives MongoDB a very heavy load. Early on, we thought that using MongoDB would be like the 10gen company (the company behind MongoDB) to advertise its many benefits in long-term performance expansion. However, we are wrong, and this rant (lengthy complaint) is to make you not believe those so-called successful experience and make the same mistake as us. If someone can avoid you being fooled, then I have to write so much. Hope to be able to alert more people.

Note that for the experience of dealing with 10gen, they have given us a lot of enthusiasm and help, and very good. But that doesn't make me a reason not to tell everyone about their product failure.

Why do you say that?

The database should be correct, or only possibly correct, because the database will be more error-able than other uses. Not only because of its impact on operation, performance, overhead, and its value, but also because of its associated things. It's a lot of work to go in a hurry to migrate terabytes of data compared to a logical error in code changes. You have a desperate feeling that you need to recover terabytes of data after a system problem and you are limited.

A database is a complex system that is like a black box for developers. You need to have absolute trust in the database you use, trust it to do the right thing, and keep it consistent trunk and usability.

Why is MongoDB popular?

To be fair, we must admit that MongoDB is popular because the following reasons make it very popular:

    • It's very easy to run
    • Very free schema model, and can easily be mapped with JSON class data results, which is very appealing to programmers (it fully conforms to the programmer's logical thinking), and programmers are always in the project can do technical selection of people.
    • Mature and sub-robust, documented, tested by real use case, and so on. This is a typical choice for system administrators and operations professionals who prefer to choose a mature technology.
    • It's a single system, low-read concurrency performance test is very surprising, and for those inexperienced evaluators, this is basically the most important.

Now, you may be developing a casual web site, or a prototype, or a project that only considers the speed of development without considering anything else. To be honest, for this kind of item, it doesn't matter what kind of technology you use, just get the job done.

However, if you want to engage a large-scale system on MONGODB and run the real business on it, please do not use MongoDB.

Why not?

1) MongoDB uses an insecure write method to win benchmark tests by default

If you do not call GetLastError (), MongoDB will not return after confirming the completion of the database write operation, which introduces at least two issues:

    • In a concurrent environment (connection pooling, etc.), successive read operations after a read "Done" error, MongoDB does not have a "fence lock" to know when to finish writing.
    • An unknown number of save operations are discarded because the queue for the save operation is in a different place. such as TCP cache. These things are discarded when you connect to the database because some of them are meant to be disconnected.

10gen CTO reply: This has nothing to do with benchmark, and said this is the design of the API, it is given to the user to choose, because there are many ways to write.

2) MongoDB will lose data in an alarming way

Here is a list of the data that we have gone through:

    • The data is missing, the reason is unknown.
    • Recovering data from a damaged database is unsuccessful, such as a transaction log.
    • Data duplication between master and slave nodes leads to loss of data from nodes. Yes, there is no checksum, and yes, you will also see the data copied over.
    • Data replication sometimes stops and there are no errors. You can monitor your replication status.

10gen CTO replies: 1) Never a data loss bug we didn't fix things right away. Can you tell me the number of the question you quoted us? We must at least be clear about what is going on. If it's our problem, we'll fix it right away. 2) is it not quite normal to recover data from a damaged database? However, it should be better if you have a master-slave server for each backup. 3) Please tell me your problem number, we have never received such a bug report. If there is, it is serious indeed. 4) It is possible to say that there is no notification when the error condition occurs. In addition, you can monitor the data copy of the write operation, you can use w=2 for GetLastError parameters.

3) MongoDB requires a global write lock to request a write operation

This is tantamount to killing you when writing is frequently done. If you run a blog, you may not care about it because your reading and writing operations are not high.

10gen CTO reply: Read-write lock is always a problem, but 2.0 will be better, 2.2 will solve better.

4) MongoDB sharding (partition) will stop working under high load

Adding a shard to a high load is a nightmare. MONGO either moves its data block too fast and causes a Dos attack to generate a lot of traffic that consumes bandwidth or completely rejects more blocks of data. This can make a high-traffic site suffer heavy write operations.

10gen CTO Reply: If the system has exceeded its load, then moving the data will certainly become difficult. Every time I speak very clearly, do not in the system performance is not the time to add Shard, this can not.

5) Mongo Unreliable

The architecture of the mongod/configuration server/mongos is reasonable and intelligent. Unfortunately, MONGOs is completely rubbish. In the case of a load, it crashes from time to time, sometimes for hours, sometimes for several days. Process Restart monitoring is sometimes not used, because he throws some assertions that forge a critical thread that causes the process to run. Double Fail.

Worst of all, the only way to do this is to put a haproxy (a load balancer) in front of a bunch of mongos instances, run a job that slowly accesses these mongos instances, and periodically kill them to make it possible to restart the new instance. I'm not joking.

10gen CTO reply: There is no such thing, can you tell me more details?

6) MongoDB Once even deleted the entire database

MongoDB 1.6, in the data synchronization configuration, sometimes configured an incorrect node (often an empty node) is the most recent data node. So the data on the results of the other synced data is wiped out (I'm talking about 700GB of good data) because it synchronizes the empty junction data back to the node with the data. The database is never supposed to do this. If this is the case, the database should throw an error and let the DBA choose a reasonable operation, or force the correct configuration to be used. Instead of deleting all the data (too bad that day).

They fixed the problem in 1.8, and I dropped God.

10gen CTO reply: Can't find such a thing, also can't find the corresponding code, can you give more information?

7) Released something that should not be published

It is well known that some awkward bugs can be found in the stable version which can cause data problems-and we always tell us these questions after the problem, because we bought 10gen of their platinum technical support for their super scam. They responded by sending us a hot patch, their internal RC stuff, and then letting the hot patch run on our data.

10gen CTO reply: About Platinum Technical support, all of the issues we take over will be public and fix will be made public. There is no specific situation, this kind of thing is difficult to discuss. We will respond differently depending on the situation. We hope that the problems of our users can be solved as soon as possible.

8) The replicator is eclipsed on a busy server

The replicator often launches Dos attacks to master, or the replication is very slow, takes a huge amount of time, and Oplog is almost exhausted (even if it is a 50GB oplog).

We have a busy, big data set we won't copy him because it's dynamic. It's been a painful one months, or we need to cross two fingers before choosing a different database system (Note: Good luck gesture)

10gen CTO reply: This looks like the server is overloaded. I've mentioned it before.

But the worst problems are:

You might say that my problems are past, that they fix all these problems or that they will fix them in the next release, and that the X problem can be mitigated by the practice of Y. Wait, wait.

Unfortunately, you say that these things do not have a point of use.

The real problem is that so many problems are the primary problem. Database developers should be able to hold a higher standard than the average programmer. In other words, your priority should look like this:

    1. Don't lose the data, you have to be completely sure of the data.
    2. Ensure availability through practice
    3. Scalability of multi-node performance
    4. The minimum delay should remain between 99% and 95%
    5. Number of requests per second for each resource

The order of 10gen seems to be #5 for each, other items casually, #1 not in the top 3.

10gen CTO Reply: This is obviously not true. Take a look at our submitted code and take a look at our fix. We never hide a bug in the release version. If we care much about the benchmark of performance, we will spend our energies to solve the problems of locking, so that multithreading can be faster.

MongoDB is a new thing, there is a lot of things to be polished. If you want to meet us, we welcome you to meet us.

These failures, and the implied priorities of the company, point to one of the most basic corporate culture issues that will make the issue appear in either release: because they lack the design discipline of the necessary database system.

Please consider these warnings carefully.

Don't use MongoDB? Are you sure?!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.