MongoDB core contributor: Not MongoDB does not work, but you do not understand it!

Source: Internet
Author: User
Tags mongodb documentation mongodb query

MongoDB has been shot frequently on hack news recently. Many people even claim to hate MongoDB. David mytton revealed many existing problems in his blog. However, hate people have a lot of preferences, as a counterattack: Russell Smith brings a summary of years of work experience. Russell Smith once served as a consultant for OPS and large-scale website scaling and helped several companies such as guardian and Experian, co-founder of MongoDB London user group. As a MongoDB master (MongoDB core contributor organization officially recognized by MongoDB and shares its expertise through the community), it has participated in more than 30 thousand queries per server per infrastructure per second, the daily active data is more than 1 TB.

Let's take a look at Russell's analysis of some common and uncommon MongoDB problems:

32-bit vs 64-bit

Currently, most servers support 32-bit operating systems, and many new types of hardware support 64-bit operating systems that allow more RAM.

MongoDB also released 32-bit and 64-bit databases. It is attributed to the memory ing file used by MongoDB. The 32-bit version only supports 2 GB Data Storage. For the standard replica set, MongoDB only has a single processing policy-mongod. If you want to store more than 2 GB of data in the future, use a 64-bit version of MongoDB. If you have a multipart installation, the 32-bit version can also be used.

Summary: Use the 64-bit version or understand the 32-bit version restrictions.

File Size Limit

Unlike RDBMS, which stores data in rows and columns, MongoDB data is stored in files. These files are stored in binary format in bson format similar to JSON format.

Like other databases, the storage size of a single file is limited. In the old version of MongoDB, a single file is limited to 4 MB. The new version of MongoDB supports a single file size of 16 Mb. Such restrictions may be annoying, but 10gen's opinion is: If this setting keeps bothering you, is there a problem with your design model; alternatively, you can use gridfs with no file size limit.

In this case, we recommend that you avoid storing too many files and occasionally update various objects stored in the database. Services such as Amazon S3 or rackspace cloudfiles may be better choices, rather than overload the infrastructure if necessary.

To sum up, keep each file below 16 Mb.

Write failed

By default, MongoDB allows High-Speed writing and updating. The cost is that there is no clear error notification. By default, most drivers are doing asynchronous and "insecure" writes-this means that the driver cannot immediately report error messages, similar to insert delayed in MySQL. If you want to know whether a task is successful, you must use getlasterror to manually check the error message.

In some cases, if you need to get the error message immediately after the error occurs, that is, most drivers can easily implement synchronous "security" queries. This will kill MongoDB, unlike the advantages of traditional databases.

Compared with "completely secure" synchronous writing, you need more performance and a certain degree of security, then you can use getlasterror with 'J' to send an error report notification after MongoDB only submits one log. Logs are output to the disk at a speed of 100 milliseconds, instead of 60 seconds.

Conclusion: If you must confirm the write, you can use secure write or getlasterror.

The weakening of the data structure model is not equal to the absence of the data structure model.

RDBMS generally has a predefined data structure model: the rows and columns of a table. Each field has a name and data type. If you want to add a column to one row, you must add a column to the entire table.

MongoDB removes this setting, and there is no mandatory model limitation for collection and files. This is beneficial for rapid development and easy modification.

Of course, this does not mean that you can ignore the structure model design. A proper structure model can give you the best performance of MongoDB. Read the MongoDB documentation or watch a video on the design of these structural models!

  • Schema Design Basics
  • Schema Design at scale
  • Schema Design Principles and Practice

Conclusion: design the structure model and make full use of the characteristics of MongoDB.

By default, only a single file is modified when the statement is modified.

In traditional RDBMS, unless the limit clause is used, the modified statement applies to all matching places. However, by default, the equivalent "Limit 1" setting is used for each MongoDB query. Although "limit 5" cannot be implemented, you can use the following statement to remove the limit:

DB. People. Update ({age: {$ GT: 30 },{ $ set: {past_it: True }}, false, true)

There is also a similar option in the official driver-'multi '.

Conclusion: you can modify multiple files by specifying multi as true.

Case Sensitive Query

String query may not develop as expected-this is due to the Case sensitivity of MongoDB by default.

For example, DB. People. Find ({Name: 'russell '}) is different from DB. People. Find ({Name: 'russell. The ideal solution here is to confirm the data to be queried. You can also query through regular expressions, such as DB. People. Find ({Name:/Russell/I}), but this will affect the performance.

Conclusion: The query is case-sensitive. Regular Expressions can be used at the sacrifice of speed.

No error tolerance for input data

When you try to insert error-type data to a traditional database, the traditional database generally converts the data to a predefined type. However, this does not work in MongoDB because the MongoDB file does not have a predefined data model. In this case, MongoDB inserts any data you enter.

Conclusion: use accurate data types.

About locks

When a resource is shared by multiple parts of the Code, make sure that the lock must ensure that the resource can only be operated in one place.

The old version of MongoDB (pre 2.0) has a global write lock. This means that there is only one place throughout the server for write operations. This may cause the database to stop due to the locking of the overload somewhere. This problem has been significantly improved in version 2.0 and has been further enhanced in the current version 2.2. MongoDB 2.2 uses database-level locks to take a big step on this issue. Collection-level locks are also expected to be released in the next version.

Even so, Russell believes that most applications that are subject to this restriction are affected by MongoDB, rather than the program itself.

Conclusion: only the latest stable version can achieve the highest performance.

About packages

Many people have encountered problems such as "outdated versions" when installing Ubuntu and Debian systems. The solution is simple: With the 10gen official library, installation on Ubuntu and Debian will be as smooth as installation on fedora and centos.

Summary: use an official package with most of the latest versions.

Use an even number of replica set members

Replica set is an effective way to increase redundancy and improve MongoDB data cluster performance. Data is copied to all nodes and selected as the master node. If the master node fails, one of the other nodes will be elected as the new master node.

Using two machines in the same replica set is tempting. It is cheaper than three machines and is also the standard operating style of RDBMS.

However, in MongoDB, the number of members in the same replica set can only be an odd number. If you use an even number of members, when the master node fails, other nodes will become read-only. This occurs because the remaining number of nodes to be selected does not meet the requirements for the primary node to be selected.

If you want to save costs and support failover and redundancy enhancement, you can use arbiter. Arbiter is a special replica set Member that does not store any user data (which means they can use very small servers ).

Summary: only an even number of replica set members can be used, but arbitter can be used to reduce costs.

No join statement

MongoDB does not support join: If you want to retrieve data from multiple collections, you must perform multiple queries.

If you think you have made too many queries manually, you can re-design your data model to reduce the total number of queries. MongoDB files can be of any type, so you can easily perform de-Normalize on the data. In this way, it can always be consistent with your application.

Conclusion: Without join, you may wish to see how to design a data structure model.

Journaling

MongoDB uses a memory ing file and sends a notification every 60 seconds to the disk, which means that you may lose 60 seconds and send a notification to the hard disk to the maximum extent.

To avoid data loss, MongoDB has added journaling since version 2.0 (enabled by default ). Journaling changes the time from 60 seconds to 100 ms. If the database is accidentally shut down, it will be restarted before it is started to ensure that the database is in a consistent state. This is also the closest link between MongoDB and traditional databases.

Of course, journaling will slightly affect performance, about 5%. But for most people, the extra security is definitely worth the money.

Conclusion: It is best not to disable journaling.

No identity authentication by default

MongoDB is not authenticated by default. MongoDB considers itself in a trusted network with a firewall. However, this does not mean that it does not support authentication, and can be easily enabled if needed.

Conclusion: The security of MongoDB can be ensured by using the firewall and binding the correct interface. Of course, authentication can also be enabled.

Lost Data in replica set

Using replica set is an effective way to improve system reliability and ease of maintenance. In this way, it is crucial to clarify the fault occurrence and transfer mechanism between nodes.

Members in the replica set generally pass information through oplog (a list of operations such as adding, deleting, and modifying data). When one of the members changes the oplog, other members will also follow oplog. If the node responsible for processing new data resumes operation after an error, it will be rolled back to the last oplog public point. However, in this process, the lost "new data" has been transferred from the database by MongoDB and stored in your data directory 'rollback' for manual recovery. If you do not know this feature, you may think that the data has been lost. Therefore, you must check this directory whenever a member recovers from an error. It is easy to restore the data using the standard tools released by MongoDB. View the official documentation for more information.

Summary: data lost during fault recovery will appear in the rollback directory.

The part is too late.

Sharding is used to split data into multiple machines and improve performance when the replica set is running too slowly. MongoDB supports automatic sharding. However, if you make the slice too late, the problem arises. Because data splitting and block migration require time and resources, when the server resources are basically exhausted, it is likely that you cannot split the slice when you need it most.

The solution is simple. You can use a tool to monitor MongoDB. Perform the most accurate evaluation on your server and perform the sharding before 80% of the overall performance. Similar monitoring tools include MMS, Munin (+ Mongo plugin), and cloudwatch.

If you are sure to process shards from the very beginning, it is better to use AWS or similar cloud services for sharding. On a small server, shutdown or machine adjustment is much more direct than transferring thousands of data blocks.

Conclusion: Early partitioning can effectively avoid problems.

You cannot change the shard key in the file.

For shard settings, the shard key is the credential used by MongoDB to identify the corresponding files in the chunks. After you insert a file, you cannot change the shard key of the file. The solution here is to delete the document and re-establish it, so that it can be specified to the corresponding part.

Summary: The Shard key cannot be modified. If necessary, the file can be deleted and re-created.

You cannot partition collections larger than GB.

It is too late to return to the sharding issue. MongoDB does not allow sharding for collections that have grown to 256 GB or higher. The previous version does not have GB. This limitation will be removed in the future, and there is no better solution here. Only recompilation can be performed or the size can be controlled below GB.

Summary: fragments are made before the collection reaches GB.

Unique index and sharing

The Uniqueness constraint of indexes can only be guaranteed by shard key.

More details

Incorrect shard key selected

Mongdb requires you to select a shard key to partition data. If the wrong shard key is selected, it will be very troublesome to change it.

Click to view how to change

Conclusion: read this document before selecting the shard key.

Unencrypted communication with MongoDB

The connection to MongoDB is not encrypted by default, which means that your data may be recorded and used by a third party. If your MongoDB is used in your own non-wan, this situation is impossible.

However, if you access MongoDB through the public network, you will certainly want your communication to be encrypted. MongoDB of the public version does not support SSL. Fortunately, you can easily customize your own version. 10gen users have customized encryption versions. Fortunately, most of the official drivers support SSL, but it is also inevitable to make a small effort. Click to view the document.

Conclusion: when using a public network connection, you must note that the communication with MongoDB is not encrypted.

Transactions

Unlike traditional databases that support multi-row data atomic operations such as MySQL, MongoDB only supports atomic modification of a single file. One way to solve this problem is to use Asynchronous commit in the application; the other is to create more than one data storage. Although the first method does not apply to all situations, it is obviously better than the second method.

Summary: Multi-file transactions are not supported.

Slow log pre-distribution

Mongdb may tell you that you are ready, but in fact it is still allocating logs. If you choose to let the machine allocate it on its own, and your file system and disk are slow, then the troubles happen. Normally, this will not be a problem, but you can use uninitialized ented flag-nopreallocj to disable the pre-allocation.

Summary: if the file system and disk of the machine are too slow, the log pre-distribution may be slow.

NUMA + Linux + MongoDB

Linux, NUMA, and MongoDB always run poorly. If you are running MongoDB on the numa hardware, we recommend that you disable it directly. This comes with a variety of strange problems, such as the speed of the phase or a significant reduction in CPU usage.

Conclusion: Disable NUMA.

Process restrictions in Linux

If you encounter a segmentation fault error when MongoDB is not fully loaded, you may find that this error is caused by the use of too low or default file opening or user process restrictions. We recommend that you set the limit to 4 K +. However, the size of the limit depends on the actual situation. Read more about ulimit.

Conclusion: It takes a long time to add soft or hard file or user process restrictions to MongoDB in Linux.

Original article: MongoDB gotchas & how to avoid them

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.