Some complaints about MongoDB

Source: Internet
Author: User
Tags mongodump
First of all, we declare that our use of MongoDB is not in-depth, and I do not have much experience, so the following complaints are not necessarily true. Some of the problems mentioned below may be that our use method is incorrect; some may not find a suitable solution; some may not be suitable for MongoDB use scenarios at all. In short, I hope these will not

First of all, we declare that our use of MongoDB is not in-depth, and I do not have much experience, so the following complaints are not necessarily true. Some of the problems mentioned below may be that our use method is incorrect; some may not find a suitable solution; some may not be suitable for MongoDB use scenarios at all. In short, I hope these will not

First of all, we declare that our use of MongoDB is not in-depth, and I do not have much experience, so the following complaints are not necessarily true. Some of the problems mentioned below may be that our use method is incorrect; some may not find a suitable solution; some may not be suitable for MongoDB use scenarios at all. In short, I hope these will not be your reference for using/not using MongoDB. If you are a master of MongoDB, please do not hesitate to inform me.

0. Background

We use MongoDB to store historical data retrieved by air tickets for later analysis. It is a typical OLAP method. Once the data is written, there will be no Update, only read. The size of a single piece is huge, and the simplified JSON size also has dozens of hundreds of KB (I suspect that MongoDB is not suitable for storing too many documents ).

We initially used three nodes to deploy MongoDB to form a Replication Set. Two of them actually carry data, and the other act as Arbiter and only vote.

Later, the Secondary was repaired due to a serious hardware fault, and the Replication Set was actually running on a single node. After the repair, Secondary won't keep up with Primary, and the reconstruction of Secondary cannot be done for some reasons (detailed later.

The current situation is that the original Master runs on a single node (Arbiter is useless), but the disk partition is quickly occupied and migration is in urgent need; Secondary is a newly installed instance, and we plan to switch directly, run two single instances without performing Replication Set.

1. Terrible disk space usage

As mentioned above, a single Document contains dozens of hundreds of KB, and 100 to 2 million of such data are generated every day. Currently, the database has hundreds of millions of records, occupying more than 5 TB of a 6 TB partition.

If MongoDB occupies a large disk space because of its working principle based on Memory Mapped File, there is no simple way to release the disk space. What is the situation?

The disk space applied by MongoDB from the operating system will not be released. The only method is to executerepairDatabase,repairDatabaseYou must have a free disk space equivalent to the amount of data before proceeding:

repairDatabaseRequires free disk space equal to the size of yourcurrent data set plus 2 gigabytes.

If you are so unlucky as we are, MongoDB occupies 5 TB, and the actual data occupies 3.5 TB, but the remaining space of all partitions is less than 3 TB, congratulations!repairDatabaseRelease the 1-plus TB space that it occupies.

What do you meancompact? People said,compactThe disk space will not be released at all:

compactRequires up to 2 gigabytes of additional disk space whilerunning. UnlikerepairDatabase, Compact does not free space on thefile system.

Now we have switched to another server, so the old instance will no longer increase data, but how can I come back with this 1 TB space, what if MySQL on the same server is used ......

2. Initial Sync

As mentioned above, our Secondary machine was sent for repair. When it comes back, the data difference is too big to keep up with Primary and needs to be reinitialized. This is also very painful, because it is too slow, slow to the officialNot recommendedDo this. Because Initial Sync is too slow, it is possible that Sync is complete, and there are too many lags behind the master.

If your database is large, initial sync can take a long time tocomplete. For large databases, it might be preferable to copy thedatabase files onto each host.

The recommended method is to copy data files directly, because the data files must be consistent. It is best to combine mechanisms such as LVM Snapshot to create a Snapshot and then copy it from the Snapshot to ensure data consistency. But unfortunatelyNoUsing LVM, but using ext4 directly, I did not find a way to directly take snapshots of ext4 after Google for a long time (isn't it supported ?).

Well, it seems that I only need to stop the service to copy it. It may take about 24 hours to transmit data of nearly 6 TB to a gigabit Nic ...... Think twice, there should be a lot of these data files that should not be used yet, and the actual data should not be so big. Is there any other way, such as usingmongodump?

Then,mongodumpOnly one master can be initialized, but not the slave in the Repl Set ...... I am happy.

So now I don't know how to add a Repl Set with only one Primary and a large data volume, like ours.

3. Very poor performance

In our use cases, MongoDB's performance is obviously not as good as the various Benchmark claims on the Internet. It can be seen from the log that the time consumed to insert a piece of data ranges from 100 ms to 1000 ms. Logs on the server:

[conn205885] insert queryhistory.queryDetail ninserted:1 keyUpdates:0 locks(micros) w:18173 1930ms[conn205882] insert queryhistory.queryDetail ninserted:1 keyUpdates:0 locks(micros) w:5385 1841ms[conn205879] insert queryhistory.queryDetail ninserted:1 keyUpdates:0 locks(micros) w:37265 1869ms[conn205902] insert queryhistory.queryDetail ninserted:1 keyUpdates:0 locks(micros) w:566 1870ms[conn205900] insert queryhistory.queryDetail ninserted:1 keyUpdates:0 locks(micros) w:71031 294ms[conn205880] insert queryhistory.queryDetail ninserted:1 keyUpdates:0 locks(micros) w:15367 335ms[conn205911] insert queryhistory.queryDetail ninserted:1 keyUpdates:0 locks(micros) w:1464 337ms[conn205894] insert queryhistory.queryDetail ninserted:1 keyUpdates:0 locks(micros) w:20006 348ms

The query performance is even worse. A very simple query that can hit the index often takes a long time. In extreme cases, the query results will be rolled out in a few minutes.

Of course, I suspect that the cause of poor performance is related to the large size of a single Document. Maybe MongoDB is not suitable for this large Document.

4. Summary

In other words, I am not always right, but I found some unpleasant things during use. Looking back, I don't think we should choose MongoDB at the beginning. It is not suitable for our use scenarios. If you try again, I think MySQL and some table sharding mechanism may be enough to cope with our data volume, or you can directly study distributed solutions such as Spark + Shark. I really want to explore in this direction if I have time and opportunities.

Original article address: I would like to express my gratitude to the original author for sharing some complaints about MongoDB.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.