Analysis on the related problems of MongoDB performance optimization

Analysis on the related problems of MongoDB performance optimization _mongodb

Last Update:2017-01-18 Source: Internet

Author: User

Tags mongodb one table

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective

How can you make the software more performance-efficient? I think this is a problem that most developers have been thinking about. Performance often determines the quality of a software, if you develop an Internet product, then your product performance will be more tested, because you are facing the vast number of Internet users, they are not so patient. Seriously, the page loading speed each additional second may cause you to lose a part of the user, that is, load speed and user volume is inversely proportional. So what is the load speed that users can accept?

If the page load time exceeds 10s then the user will leave, if the 1s–10s words need to be prompted, but if our page does not need to prompt how fast loading speed? Yes, 1s.

Of course, it's a product manager's point of view, but what if it's a technician's point of view? Loading speed and user volume is proportional to the number of users you need to deal with the more data, of course, the more the load speed of course, the slower. This is a very interesting thing, so if your product is an exciting product, what you need to do as a technician is to increase the performance of the software and the number of users at the same time, even faster than the increase in user volume.

MongoDB Performance Optimization

The database performance has the vital influence to the software overall performance, to the MongoDB database common performance optimization method mainly has:

1, normalization and inverse normalization;

2, the use of filling factor;

3, the use of the index;

I. Paradigm and anti-normalization

The paradigm is a standardized standard for eliminating redundant data from duplication, so that the data in the database is better organized, and the more efficient use of disk space is a prerequisite for satisfying the high class paradigm that satisfies the lower-class paradigm. In the database design phase, it is a very important step to tune the MongoDB database performance by defining the purpose of the collection. According to the most commonly used operation of the data in the collection, the most important thing we need to focus on is the degree of normalization for frequent updates and frequent queries.

1.1 Paradigm

The advantages of 1.1.1 paradigm:

1, the paradigm of the database updated faster;

2, after normalization, only a little duplication of data, only need to modify less data;

3. The normalized table is smaller and can be executed in memory;

4, little redundant data, need fewer distinct or group by statements when querying.

Disadvantages of 1.1.2 Paradigm:

1, the normal form of the table, in the query often need a lot of association, because there is no redundancy in a single table and duplicate data. This results in a slightly more complex query statement that may require more than a few associations on the schema of the query paradigm. This increases the cost of making the query, or it may invalidate some index policies. Because normalization stores columns in separate tables, these columns can belong to the same index in one table.

Examples of 1.1.3 Paradigm design:

To store a book and its authors for example, the author's information includes the author's name, age, nationality. The use of a normalized design is as follows:

"'
{'
_id ': ObjectId (" 5124b5d86041c7dca81917 "),
" title ":" How to Use MongoDB ",
" author ": [
ObjectId ( "144b5d83041c7dca84416"),
ObjectId ("144b5d83041c7dca84418"),
ObjectId ("144b5d83041c7dca84420"),
]
}

Add the ID array of the author (comment) as a field to the book. This design approach is commonly used in non relational databases. In MongoDB, we extract the author's details that are not directly related to the primary key to another collection and associate the query in the way that the primary key is stored. When we want to query articles and authors, we need to first query the required articles, and then get the author ID from the author, and finally get the complete article and its author details.

In this case, query performance is obviously not ideal, because more associated queries are required. However, when an author's information needs to be modified, the normalization of the maintenance advantage is highlighted, we do not need to consider the book associated with this author, directly modify this author's field.

1.2. Anti-normalization

1.2.1 The advantages of anti-normalization:

1. Association can be avoided, since almost all the data can be displayed on a single table;

2. Can design an effective index;

1.2.2 The disadvantages of anti-normalization:

1. More redundancy in the table, the deletion of data will cause the table some useful information loss.

Examples of 1.2.3 anti-normalization design:

To store a book and its authors for example, the author's information includes the author's name, age, nationality. The design that uses the inverse normalization is as follows:

{
  "_id": ObjectId ("5124b5d86041c7dca81917"),
  "title": "How to Use MongoDB",
  "author": [
    {
     "name": "Ding　　　　 Lei "" Age ":", "nationality": "", "" ","
      {"
     name": "Ma"
     "Age":
     "Nationality": "The", "" "," {"" "
     name": "Zhang Zhaozhong"
     "Age": ","
     nationality ":" Chin A ",
    },
  ]
 }

In this example we embed the author's field completely into the book, when querying a book directly to get the full information of the corresponding author, but because one author may have many books, when modifying the information of an author, we need to traverse all the books to find the author, modify it.

1.3 Normalization and anti-normal mixing

In order to take into account the advantages and disadvantages of normalization and inverse normalization, it is often used in the mixture of normalization and inverse normalization, and the design of mixed normalization and inverse normalization is as follows:

"'
{
" _id ": ObjectId (" 5124b5d86041c7dca81917 "),
" title ":" How to Use MongoDB ",
" author ": [
{
] _ ID ": ObjectId (" 144b5d83041c7dca84416 "),
" name ":" Ding Lei "
},
{
" _id ": ObjectId (" 144b5d83041c7dca8 4418 "),
" name ":" Ma Yun "
},
{
" _id ": ObjectId (" 144b5d83041c7dca84420 "),
" name ":" Zhang Yi Loyalty "
},
]
}

This time we'll extract the most commonly used part of the author field. When we only need to get the book and author names, we do not need to enter the author collection to query, only in the book collection query can be obtained.

This approach is a relatively eclectic way, both to ensure the query efficiency, but also to ensure the efficiency of the update. However, such a way is obviously more difficult to master than the first two, the difficulty lies in the need to combine with the actual business to find the appropriate extraction fields. As described in Example 3, the name is clearly not a frequently modified field, such a field if extracted is not a problem, but if the extracted field is a frequently modified field (such as age), we still need to update the field in a wide range of search and update.

In the above three examples, the first example has the highest update efficiency, but the query efficiency is the lowest, while the second example has the highest query efficiency but the least update efficiency. So in the actual work we need to according to their actual needs to design the fields in the table in order to achieve the highest efficiency.

2. Understanding Filling Factor

What is a fill factor?

The fill factor (padding factor) is the growth space reserved for MONGODB extensions to the document, because MongoDB documents are stored in sequential tables and are very compact between each document, as shown in the figure.

(Note: Picture source: "MongoDB the Definitive Guide")

1. There is no extra room to grow between elements.

2. When we increase the size of an element in a sequential table, it can cause the original allocated space to be insufficient and can only be asked to move backwards.

3. When the modified element is moved, subsequent inserts of the document will provide a certain fill factor, so that the document is frequently modified, if no longer the document is moved by the increase, the subsequent inserted document fill factor will be reduced accordingly.

The understanding of filling factors is important, because the movement of the document is very consuming performance, frequent movement will greatly increase the burden of the system, in the actual development of the most likely to make the size of the document is an array of factors, so if our document will frequently modify and increase space, we must fully consider the fill factor.

So how do we improve performance if our document is a frequent extension?

Two scenarios

1. Increase the initial allocation space. A usepowerof2sizes property is included in the collection's properties, and when this option is true, subsequent inserted documents, the initial space, are assigned to the N-second side of 2. This allocation mechanism applies to a collection that is frequently changed by a data he will have more space for each document, but so the allocation of space is not as efficient as it used to be, and if your collection does not move frequently during updates, this allocation will result in a relatively slow write speed.

2. We can use the data to force the initial allocation space to expand.

Db.book.insert ({
 "name": "MongoDB", "
 Publishing": "Tsinghua University Press",
 "author": "John"
 "tags": []
 " Stuff ":" Ggggggggggggggggggggggggggggggggggggg
    ggggggggggggggggggggggggggggggggggggg
    Ggggggggggggggggggggggggggggggggggggg "
})

Yes, this may not look very elegant ... But sometimes it works! When we make a growth change to this document, just delete the stuff field. Of course, this stuff field how you name it, including the fill characters inside of course can also be added at will.

Three. Use of indexes

The impact of an index on a database I believe you must understand that if a query command enters the database, the query optimizer does not find an appropriate index, then the database performs a full set scan (also known as a full table scan in the RDBMS), and the performance impact of the full set query is catastrophic. A query without an index is like getting a word you want in a dictionary that has no regular mass of words, but this dictionary is not a directory, it can only be found by page by line. Such a lookup may take you a few hours, but if you are asked to query the vocabulary as frequently as the user visits ... Hey, I'm sure you'll yell, "I quit!" ”。 Apparently the computer won't shout it, it has always been a diligent employee, no matter how demanding the request he will complete. So please use the index to treat your computer well. But using an index has two points to note: 1. The less the index the better; 2. The less index particles the better.

3.1 Fewer indexes, better.

Indexes can greatly improve query performance, so is the more index the better? The answer is no, and the index is not as much as the better, but the less the better. Every time you create an index, the system adds an index table for you, used to index a specified column, but when you insert or modify an indexed column, the database needs to reorder the original index table, and the reordering process is very performance-intensive, but it is not very stressful to handle a small number of indexes. But if the number of indexes is more, the effect on performance can be imagined. So when you create an index, you need to be careful to index it, to maximize the functionality of each index, that is, if you can meet the index requirements, the fewer indexes the better.

An implicit index

Establish a composite index
Db.test.ensureIndex ({"Age": 1, "No": 1, "name": 1})

We can quickly sort the age,no fields when we query, which means that if the field we want to sort is included in the established composite index, there is no need to duplicate the index.

Db.test.find (). Sort ("Age": 1, "no": 1)
 
db.test.find (). Sort (' age ': 1)

If the above two sorted queries are available, you can use the composite index above without having to re-establish the index.

Flip Index

Create composite Index
Db.test.ensureIndex ({"Age": 1})

Flip an index Well, we don't have to think about the direction of the indexed column when we sort the query, for example, we can write the sort criteria as "{' Age ': 0}" In this example, and still do not affect performance.

3.1 Index particles less the better

What do you mean, the smaller the particle, the better? The number of repetitions of each data in an indexed column is called a particle, also called the cardinality of the index. If the particles of the data are too large, the index cannot perform that performance. For example, we have an "age" column index, if 20 years old in the year column, 50%, if you want to query a 20-Year-old, named "Tom", we need to query in the table 50% of the data, the role of the index is greatly reduced. So, when we build the index, we should try to put the column with small data particles on the left of the index to ensure that the index plays the most important role.

Four. Summary

The above is the entire content of this article, I hope this article content for everyone's study or work can bring certain help, if there are questions you can message exchange.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More