MongoDB optimization with some details to note

Source: Internet
Author: User
Tags mongo shell

Here is a summary of the use of MONGO in this period of time, listed a few points to note.

1. System parameters and MONGO parameter setting

The MONGO parameter is primarily Storageengine and Directoryperdb, and these two parameters cannot be changed at the beginning of a subsequent selection.

Directoryperdb is mainly to store the database in folders, convenient for subsequent backup and data migration.

The Storageengine (storage engine) is MMAPV1 by default, with 3.0 of the newly added engines Wiredtiger recommended. The actual disk space used by Wiredtiger is 1/2 of the size of the mmap, the index is 1/5, the query speed is much higher, and more importantly, the engine provides a document-level lock that does not require blocking reads when the collection inserts or updates data. The only problem is that there are not many tools available on the market that support the engine query, and Mongovue cannot find the collection that the engine stores, Nosqlmanager-mongo can find but need. NET environment support. Personally feel familiar with MONGO command with MONGO Shell is enough, so it is strongly recommended to use the Wiredtiger engine.

2. No need to slice the collection horizontally

Because the relational database has been used all the time, the relational database is often used when the single-table data volume is very large, and the data table is divided into tables. It is natural to think that this is still useful when using MONGO. Since the system's sub-tables are dynamically generated, it is later found that this move to the performance of MONGO is far more than the increase in maintenance costs.

Analysis of the relational database sub-table will be the biggest reason for high performance is that many relational databases a table is a file, the table can avoid a file too large caused by the data extraction speed slow. But MONGO is not stored this way, so this is not true.

Used to know that MONGO depends on the index is very large, if the collection can not be designed from the beginning, then the subsequent index will have to write scripts to create. Here is a script to dynamically create an index for MONGO large tables:

//Find out all the data volume over 100w of information collection info and create an index for it (function () {var infos = [];    var collnames = Db.getcollectionnames ();        for (var i = 0; i < collnames.length; i++) {var collname = collnames[i];        var collsize = db.getcollection (collname). Count (); if (Collsize > 1000000 && collname.indexof ("Info_") ==0) {db.getcollection (collname). Ensureindex ({Publ            Ishdate:-1,blendedscore:-1,publishtime:-1,isrubbish:1},{name: "Scoresortidx", background:true}); Db.getcollection (Collname). Ensureindex ({similarnum:-1,publishtime:-1,isrubbish:1},{name: "HotSortIdx",            Background:true});            Db.getcollection (Collname). Ensureindex ({publishtime:-1,isrubbish:1},{name: "Timesortidx", background:true});        Infos.push ("Name:" + collname + "Index creation succeeded"); }} return infos;} ());

So the dynamic creation of indexes can still be solved, but one of the pits is that sharding is completely out of the way. Shard needs to specify the set and partition keys to be Shard, which is not dynamically specified in advance. Therefore MONGO collection does not need to do the horizontal slicing (at least tens does not need, the larger direct shard off), just need to separate according to the business.

3. Using capped Collection

Some people use MONGO for data caching, and they cache a fixed amount of data, still use normal collections, and then periodically clean up the data. In fact at this time with capped collection performance will be much better.

4. Production environment Be sure to use a replica set

Many people in the online environment or with a single version, although the deployment of fast but many MONGO natural features are not used like automatic failover, read and write separation, these to the subsequent system expansion and performance optimization is too important. I think the use of MONGO should be the amount of data to a certain level, query performance is very important, so it is strongly recommended to use the replica set directly on-line.

5. Learn to use explain

have been accustomed to using tools to query, now found that you should use MONGO shell command to query, and use explain to view the query plan. The hint command is also useful when looking for an optimal index.

Db.info.find ({publishdate:{$gte: 20160310, $lte: 20160320},isrubbish:{$in: [0,1]},title:{$regex: ". *test.*"}, $or: [{ USEID:10},{GROUPID:20}]}). Explain ("executionstats");

6. Write operations frequently cannot use read/write separation

Because the system writes more, cause a variety of W-level locks often appear (this kind of lock is generally block read) and the system for data consistency requirements are not too much (mostly background write, foreground read, so allow a certain delay) so want to use replica set to do read and write separation. When a real test is found, the read on the replica set is often blocked. Through Db.currentop () found that a frequent op:none operation in the application for global write lock, then all the state of operation is in Waitingforlock:true, the problem Google for a long time did not find a solution. The following pit is found in the FAQ about concurrency in the official documentation:

How does concurrency affect secondaries?

In replication, MongoDB does isn't apply writes serially to secondaries.
Secondaries collect Oplog entries in batches and then apply those
Batches in parallel. secondaries do not allow reads while applying the
Write operations, and apply write operations in the order that they
appear in the Oplog.

The original MongoDB copy in the Replication Master node Data Execution Oplog, read is blocked, this basic declaration can not read data in the copy, wasted a few days of energy. So MONGO official does not recommend to do read-write separation , the original pit is here ... In fact, write more read less of the situation do not read and write separation, because the performance bottleneck is mainly in the write, read generally do not consume much resources (another Wiredtiger engine lock to do the doc level, so the situation of the lock is relatively small). The official recommended practice is Shard, which can effectively allocate writes to multiple servers to increase write speed and enable the system to scale horizontally.

7, don't let the disk full

When you're 80%, start noticing that if your data grows particularly fast, it's likely that you haven't split the disk so that MongoDB hangs up. If the amount of data is large, try to use the Shard, do not use the replica set, do the disk capacity planning, is the use of sharding is also early expansion, after all, chunk migration is still so slow.

8. Security risk

MongoDB is the default does not prompt the user to set the password, so if you do not configure the password and put MongoDB on the public network, then "Congratulations", you may have become a broiler

9. database-level locks

MongoDB lock mechanism and general relational database such as MySQL (InnoDB), Oracle is very different, InnoDB and Oracle can provide row-level granularity lock, and MongoDB can only provide library-level granularity lock, which means when MongoDB a write lock is occupied , other read and write operations have to wait.

At first glance, library-level locks have serious problems in large concurrency, but MongoDB can still maintain large concurrency and high performance, because MongoDB's lock granularity is very extensive, but the lock processing mechanism and relational database lock are very different, mainly manifested in:

    • MongoDB does not have full transaction support, Operation Atomicity only to a single document level, so the operation is usually small granularity;

    • The actual elapsed time of MongoDB lock is memory data calculation and change time, usually quickly;

    • MongoDB Lock has a temporary waiver mechanism, when the need to wait for slow IO read and write data, you can temporarily discard, and so on after the IO is completed and then regain the lock.

Usually no problem does not mean that there are no problems, if the data operation is not appropriate, will still cause a long time to occupy the write lock, such as the following mentioned in the foreground building index operation, when this situation, the entire database is in a completely blocked state, can not do any read and write operations, the situation is very serious.

To solve the problem, try to avoid a long time to occupy the write lock operation, if there are some set operation is difficult to avoid, you can consider to put this set into a separate MongoDB library, because MongoDB different library locks are isolated from each other, the separation of the collection can avoid a set operation caused by a global blocking problem.



MongoDB optimization with some details to note

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.