When using NOSQL MongoDB to create an index, pay attention to the following suggestions and Explain optimization analysis: nosqlmongodb

Source: Internet
Author: User

When using NOSQL MongoDB to create an index, pay attention to the following suggestions and Explain optimization analysis: nosqlmongodb


First, the MongoDB index is very similar to the MySQL index, and many MySQL index optimization methods are also applicable to MongoDB.
 

Second, more importantly, the recommendation of these indexes is limited for your application.

The best Indexing policy for applications should be based on many important factors. Contains the type you want to query,

The ratio of Data Reading to writing, or even the idle memory on your server. That is,

You need to perform a lot of Testing and Analysis on online products to adjust the Optimal Index policy.

There is no good way to replace practical experience.


Index Policy

The basic principles of some indexes below

The created index must match the query.

If you only want to query a single field. Index this field. For example

db.posts.find({ slug : 'state-of-mongodb-2010' })

In this example, the unique index is the best

db.posts.ensureIndex({ slug: 1 }, {unique: true});

However, generally multiple keys are queried and sorted. In this case, composite indexes are the best. The example is as follows:

db.comments.find({ tags : 'mongodb'}).sort({ created_at : -1 });

The created index is as follows:

db.comments.ensureIndex({tags : 1, created_at : -1});

Note that if we sort created_at in ascending order. The indexing efficiency is low.

Each query has one index.

Sometimes you need to query multiple keys and Multiple indexes. In MongoDB, this is fine.

If you have a query that matches multiple keys and you want to use the index more effectively, use the composite index.

Make sure that all indexes are in RAM.

Shell provides a command to view the index size, as follows:

db.comments.totalIndexSize();
65443

If your query is slow, you should check whether all the indexes are saved in RAM.

For an instance, if you are running on a 4 gb ram machine and have a 3 GB index, the indexes may not be all in RAM.

You need to add RAM and/or verify the actual index usage.

Be careful about the low selectivity of single-key indexes.

If you have a field named 'status', there are two values: new and processed.

If you create an index on status, this is a low-selective index.

This means that the query has no advantages and occupies a large amount of space.

A better strategy, of course, depends on the specific query requirements. You can create a composite index including this low-selective field.

For example, you can create a composite index on the status and created_at fields.

Another choice, of course, depends on your needs. You can separate collection and one status.

Of course, these suggestions must be tested and the best solution should be selected.

Use explain. MongoDB provides an explain command to view the Query Process and check whether a microcosm is used. Explain can be used in the driver or in the SHELL:
db.comments.find({ tags : 'mongodb'}).sort({ created_at : -1 }).explain();
A lot of useful information is returned. Contains the number of entries to be retrieved, the number of milliseconds consumed, the index tried by the optimizer, and the final index used.
 

If you have never used explain, start using it.

Understanding explain.

The explain output mainly contains three fields:

  • Cursor: the cursor is notBasicCursor is BtreeCursor. The second means that the index is used.
  • Nscanned: number of rows scanned for document.
  • N: number of rows returned by the query. You want the value of n to be close to that of nsanned. To avoid collection scanning,
  • That is, access all documents.
  • Millis: the number of milliseconds after the query is completed. This is useful for comparing index and non-index performance.
Pay attention to the read/write ratio of applications

This is important because adding an index means adding, updating, and deleting indexes are slow.

If your application is biased towards reading data, it is very good to use indexes.

However, if your applications prefer writing, you should be careful when creating indexes. Adding indexes affects the write performance.

Generally, do not add any index. The index should be added according to your query.

There are always a lot of reasons for adding indexes, and a lot of tests are required to select an appropriate index policy.

Index features

Composite indexes have many features to remember.

The following example assumes that a, B, and c create a composite index. Therefore, the index creation statement is as follows:

db.foo.ensureIndex({a: 1, b: 1, c: 1})

 

1. Sort the column at the end of the index.

Okay:

  • Find (a = 1). sort ()
  • Find (a = 1). sort (B)
  • Find (a = 1, B = 2). sort (c)

Bad:

  • Find (a = 1). sort (c)
  • Even if column c is the last column of the index, column a is the last column used. Therefore, you can only sort columns a or B.
2, 3, and 4 are no longer applicable to 1.6 +. We recommend that you use version 1.6 or later.

5. MongoDB's $ ne or $ nin operations are invalid due to a cable injury.
  • To exclude a small number of documents. The best way is to query the results in MongoDB and exclude them on the server.

 

 

==================

Index optimization http://blog.nosqlfan.com/html/4117.html for MongoDB range queries

We know that the MongoDB index is B-Tree, which is very similar to the MySQL index. So you should have heard of this suggestion: when creating an index, you should consider the sort operation and try to put the fields used by the sort operation behind your index. However, in some cases, this will lower your query performance.

Problem

For example, we perform the following query:

db.collection.find({"country": "A"}).sort({"carsOwned": 1})

The query condition is {"country": "A"}, Which is sorted in the forward order of the carsOwned field. Therefore, the index can be easily created. You can directly create the Union index of the country and carsOwned fields. Like this:

db.collection.ensureIndex({"country": 1, "carsOwned": 1})

Let's look at a slightly more complex query:

db.collection.find({"country": {"$in": ["A", "G"]}}).sort({"carsOwned": 1})

This time we want to query the data entries whose country is A or G. The results are also sorted by the carsOwned field.

If we still use the above index and use explain () to analyze this query, we will find a "scanAndOrder": true Field in the output, in addition, the value of nscanned may be much larger than expected, and even specifying limit does not work.

Cause

What is the reason? Let's take a look at the figure below:

If not, the left side is an index created in the order of {"country": 1, "carsOwned": 1. On the right is an index created in the order of {"carsOwned": 1, "country": 1.

If we execute the preceding query and use the index on the left, we need to set the country value to A (one on the left) all sub-nodes and all sub-nodes whose country value is G (one on the right of the Left graph) are also retrieved. Then sort the obtained data by carsOwned value.

So the above explain output a "scanAndOrder": true prompt, that is to say, this query first performs scan to obtain data, and then performs an independent sorting operation.

If we use the index on the right for the query, the results will be different. We didn't put the sorting field at the end, but put it at the front. Instead, we put the filtering field behind. The result is that we will traverse the node with the value of 1 (one on the left of the right). If the country value is A or G, it will be directly placed in the result set. After the query is completed for the specified number of limit instances. We can directly return the results, because at this time, all the results are arranged in the carsOwned forward order.

For the preceding dataset, if we need two results. We need to scan four records through the left graph index, and then sort the four records to return the results. On the right side, we only need to scan two results to directly return the results (because the Query Process is to traverse the index in the desired order ).

Therefore, when performing a Range Query (including $ in, $ gt, $ lt, and so on), it is usually ineffective to append the sort index to the end. During the Range Query Process, the result set we get is not ranked by the append field, and an additional sort is required. In this case, the index may be created in reverse order (the sorting field is in the front and the range query field is in the back), but it will be a better choice. Of course, whether it is better is also related to specific datasets.

Summary

To sum up, Let's take two examples.

When the query is:

db.test.find({a:1,b:2}).sort({c:1})

Then you can directly create a joint index of {a: 1, B: 1, c: 1} or {B: 1, a: 1, c: 1.

If the query is:

db.test.find({a:1,b:{$in:[1,2]}}).sort({c:1})

It may be appropriate to create a joint index of {a: 1, c: 1, B: 1. Of course, there is only one more way of thinking here. Whether or not to use it depends on your data.

Source: architects.dzone.com

======================================== What is the difference between MongoDB and traditional databases-Batch insert and batch query http://blog.sina.com.cn/s/blog_56545fd301013zav.html

I often encounter a problem similar to this when I answer a question on 100x: Does MongoDB have an ODBC driver like MySQL? Can MongoDB obtain the field name or type like MySQL.

My answer is: no, because MongoDB is not MySQL. This answer seems that MongoDB is too weak. My original intention is that you cannot ask a good physics teacher to help you with mathematics. Maybe he can do basic teaching, however, it is difficult for him to be an excellent math teacher so comprehensive.

 

The questions discussed today are: Batch insert and batch query.

Someone asked me yesterday about how to write data for MongoDB batch inserts. I have never used this. On the one hand, MongoDB is fast enough so that I never wanted to find this method, on the other hand, this method cannot be found on the official MongoDB website and APIs.

There are two problems.

Question 1: Isn't there any faster batch insert?

This problem is completely technical. How can MongoDB be slower than MySQL? This involves the essential differences between traditional relational databases and NoSQL, which are frequently used. Each NoSQL operation is lightweight and small, with no additional operations except Data Writing. Another example: MongoDB just throws things into the corresponding Cabinet (database) when putting things (data, mySQL needs to maintain communication with the deliverer (two-way connection), and store the data in different la S (transaction + mode ). The batch insert of MySQL reduces the communication and sharding processes, and MongoDB itself does not have these processes. Therefore, MongoDB does not have the concept of batch insert. The conclusion is that normal inserts in MongoDB are faster than batch inserts in MySQL, or normal inserts in MongoDB are batch inserts.

Question 2: Is it impossible to put multiple operations into one transaction and execute them together?

This problem is more technical. Does MongoDB have a transaction? Do not use NoSQL as a relational database. Isn't it because MongoDB has poor data integrity and security? This must be repeated again. MongoDB is designed to process large-scale data, so it does not have to be so strict with data integrity. If it is really necessary, MongoDB can also handle this situation, that is, getLastError, which sacrifices performance to obtain the correct data operation. You can call this method once after inserting a batch of data in batches, if an error occurs, repeat this batch of data and call getLastError once to ensure both performance and data security.

 

Batch query

For more information about batch query, the batch query corresponds to the batch select concept on the official website. It can be understood as a batch of data query at a time. Many people use the following database:

Statement stmt = a. createStatement ();

ResultSet rs = stmt.exe cuteQuery (SQL );

For (int I = 1; I <10000; I ++ ){

        // Read data from rs

}

In this way, will all the data in the database be read into the memory or will each data be read once in the database? In fact, neither of them. MySQL will put some data into the memory. If this data is read, it will read another part. Because MySQL has not been used for a long time, I remember that a class in the C ++ driver is used to read all the data into the memory, but this method is rarely used.

The query for MongoDB is like this. You can use Cursur to query data. If the batch size parameter is not set, MongoDB will return 101 data records by default. When the 101 data entries are read, that is to say, if you want to read 102nd pieces of data, the driver will go to MongoDB again to obtain the next batch of data. This batch of data is not counted, but is limited to a maximum of 4 MB, return the 4 M data for the user to continue reading, and then apply for 4 M after reading. Of course, you can change this value by batch size. If this value is set, batch size data is returned every time it is returned.


Reprinted please indicate the source: http://blog.sina.com.cn/s/blog_56545fd301013zav.html ================== how to use mongoose to traverse a 1 million + mongodb table http://cnodejs.org/topic/51508570604b3d512113f1b3===================== MongoDB optimization principles http://nosqldb.org/topic/50cac0c8ee680fee790015661.query Optimization
Check whether your query fully utilizes the index. Run the explain command to check the query execution and add necessary indexes to avoid table scanning.
2. Find out your hot data size
Maybe your dataset is very large, but this is not that important. What matters is how big your hot dataset is, the amount of data you frequently access (including frequently accessed data and all indexed data ). With MongoDB, you 'd better ensure that your hot data is under the memory size of your machine and that the memory can accommodate all hot data.
3. Select the correct File System
MongoDB data files are pre-allocated. In Replication, non-Arbiter nodes of the Master and Replica Sets create sufficient Empty files in advance to store operation logs. These file allocation operations may be very slow on some file systems, resulting in the process being blocked. Therefore, we should select the file systems with fast space allocation. The conclusion here is that do not use ext3 or ext4 or xfs.
4. select an appropriate Hard Disk
The selection includes the selection of disk RAID and the comparison of disk and SSD.
5. Try to use less in queries, especially on shard, which will let your queries run once on a shand,
If you have to use it, create an index on each shard.
The optimization of in is to break in into a single query. The speed will increase by 40-50 times
6. reasonably design the sharding key
Increamenting sharding key (incremental sharding-key) is suitable for sharding range fields, such as integer, float, and date. It is faster to query.
Random sharding key (random sharding-key) is suitable for scenarios with frequent write operations. In this case, if a shard is used, the shard load will be higher than the other one, which is not balanced enough, therefore, we hope to query the key by hash and distribute the write data on multiple shard instances.
Considering the compound key as the sharding key, the general principle is that the query is fast and cross-shard queries are minimized, with less balance balancing times.
Mongodb uses a single record of 16 MB by default. Pay attention to the shrading-key design when using GFS.
Unreasonable sharding-key occurs. Multiple documents are stored on one chunks. At the same time, GFS stores large files, as a result, mongodb cannot use sharding-key to separate these documents from different shard files during balance,
At this time, mongodb will continuously report errors
[Conn27669] Uncaught std: exception: St9bad_alloc, terminating. Eventually, mongodb is dumped.
Solution: Increase the chunks size and design a reasonable sharding-key ).
7. mongodb can use profile to monitor and optimize data.
Check whether the profile function is enabled.
Use the db. getProfilingLevel () command to return the level. The value is 0 | 1 | 2, indicating that 0 indicates that the level is disabled, 1 indicates that the record is slow, and 2 indicates that all
To enable the profile function, run the following command:
Db. setProfilingLevel (level); # level, with the same value as above
When the level is 1, the default value of the slow command is 100 ms, Which is changed to db. setProfilingLevel (level, slowms), such as db. setProfilingLevel (), which is changed to 50 ms.

View the current monitoring log through db. system. profile. find.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.