MongoDB Core contributors: Not MongoDB no, but you do not understand!

Source: Internet
Author: User
Tags mongodb documentation mysql insert

Recently MongoDB is frequently shot in the hack news. Many people are more claiming to hate Mongodb,david Mytton on his blog revealing many of the existing problems in MongoDB. But the hated one has the same many preferences, as a comeback: Russell Smith brings a summary of years of work experience. Russell Smith worked as an OPS and large Web site scaling consultant and helped Guardian, Experian and many others, the co-founder of the MongoDB London User group. As a MongoDB Master (MongoDB's officially recognized MongoDB core Contributor organization and sharing its expertise through the community), its participating infrastructure single server queries more than 30,000 times per second, with more than 1TB of active data every day.

Here's a look at some common and uncommon problems with MongoDB: Russell

32-bit vs 64-bit

Most servers now support 32-bit operating systems, and many new hardware supports 64-bit operating systems that allow more RAM.

MongoDB also publishes 32-bit and 64-bit, two-version databases. Due to the memory-mapped file used by MongoDB, the 32-bit version only supports storage of 2G data. The standard replica Set,mongodb only has a single processing strategy--mongod. If you want to store more than 2G of data in the future, please use the 64-bit version of MongoDB. If you have a shard installation, the 32-bit version is also available.

Summary: Use the 64-bit version or understand the limitations of the 32-bit version.

File Size Limits

Unlike RDBMS, which stores data in rows and columns, MONGODB data is stored in a file. These files are in binary storage format and are formatted in Bson format similar to JSON format.

As with other databases, the storage size of individual files is limited. In older versions of MongoDB, individual files are limited to within 4M. And the new version of the MongoDB single file has been supported to 16M size. This limitation may be annoying, but the 10gen opinion is: If this setting keeps bothering you, then there is a problem with your design pattern, or you can use a file with no size limit Gridfs.

The usual recommendation in this case is to avoid storing too large files, and occasionally updating various objects stored in the database. Services such as Amazon S3 or Rackspace CloudFiles can often be a better choice, and it's better not to overload the infrastructure without the need.

Summary: Keep each file below 16M, then everything is fine.

Write Failed

MongoDB allows high-speed writes and updates by default, but at the cost of no explicit error notification. By default, most drivers are doing asynchronous, "unsafe" writes-which means that the driver cannot immediately feed back the error message, similar to the MySQL insert DELAYED. If you want to know if something is successful, you must use GetLastError to manually check the error message.

In some cases, if you need to get an error message immediately after an error occurs, it is easy to synchronize "secure" queries in most drivers. This will kill the advantages of MongoDB different from the traditional database.

If you need a bit more performance compared to a "fully secure" synchronous write, and you want some level of security, you can use GetLastError with ' J ' to have MongoDB send only one copy of the log before issuing an error report notification. Then the log will be output to disk at a rate of 100 milliseconds, instead of 60 seconds.

Summary: If you have to write acknowledgments, you can use secure writes or GetLastError.

the weakening of data structure model is not equal to no data structure model

An RDBMS typically has a predefined data structure model: The rows and columns of a table, each with a name and data type. If you want to add a column to one of these lines, you have to include a column for the entire table.

MongoDB is the removal of this setting, and there is no mandatory model qualification for collection and files. This is useful for rapid development and easy modification.

Of course this doesn't mean that you can ignore the design of the structural model, and a suitable structural model allows you to get the best performance of MongoDB. Read the MongoDB documentation quickly, or watch the video about the structural model design!

    • Schema Design Basics
    • Schema Design at scale
    • Schema Design Principles and practice

Summary: Design the structure model and make full use of MongoDB features.

by default, modified statements modify only a single file

In a traditional RDBMS, unless the limit clause is used, modifying the statement will be all the matching places. However, MongoDB uses the equivalent "LIMIT 1" setting by default on each query. Although you cannot do "limit 5", you can remove the entire limit by following the statement:

Db.people.update ({age: {$gt: $}}, {$set: {past_it:true}}, False, True)

There are similar options in the official drive-' multi '.

Summary: Multiple file modifications can be done by specifying the multi of multiple files as true

queries are case-sensitive

String queries may not evolve as expected-this is due to the default case sensitivity of MongoDB.

For example: Db.people.find ({name: ' Russell '}) is different from Db.people.find ({name: ' Russell '}). The ideal solution here is to confirm the need to query the data. You can also query through regular expressions, such as Db.people.find ({name:/russell/i}), but this will affect performance.

Summary: Queries are case-sensitive and can take advantage of regular expressions at the expense of speed.

no fault tolerance for input data

When you try to insert the wrong type of data into a traditional database, traditional databases typically convert the data to predefined types. This does not work in MongoDB, however, because MongoDB files are not pre-defined data models. In this case, MONGODB will insert any data you enter.

Summary: Use an accurate data type.

about Locks

When a resource is shared by multiple parts of the code, you need to be sure that the lock must ensure that the resource can be manipulated only in one place.

The older version of MongoDB (PRE 2.0) has a global write lock. This means that there is only one place to write throughout the server. This can cause the database to stall because of a lock on the overload somewhere. This issue has been significantly improved in version 2.0 and has been further enhanced in the current 2.2 release. MongoDB 2.2 Using a database-level lock is a big step forward on this issue. Equally desirable collection-level locks are also scheduled to be launched in the next release.

Nonetheless, Russell that most of the applications that are subject to this limitation are more directly related to the problem of the program itself than to the one that is affected by MongoDB.

Summary: Use the latest stable version for maximum performance.

About Packages

When installing on Ubuntu and Debian systems, many people have problems with "obsolete versions". The solution is simple: with the 10gen official library, installing on Ubuntu and Debian will be as smooth as installing on fedora and CentOS.

Summary: Use the official package with most recent versions.

using an even number of replica set members

Replica set is an effective way to increase redundancy and improve the performance of MONGODB data cluster. The data is replicated in all nodes and selected as the primary node. If the primary node fails, the other node will be voted as the new master node.

It is tempting to use two machines in the same replica set, which is cheaper than 3 machines and is also a standard style of RDBMS.

But to MongoDB here, the number of members in the same replica set can only be an odd number. If you use an even number of members, then the other nodes become read-only when the primary node fails. This happens because the number of nodes left to select does not meet the requirements of the main polling node.

If you want to save money and want to support failover and redundancy enhancements, then you can use arbiter. Arbiter is a special replica set member that does not store any user data (which means they can use a very small server).

Summary: You can use only an even number of replica set members, but you can use Arbitter to cut costs.

no join statement

MongoDB does not support joins: If you want to retrieve data in multiple collection, you have to do multiple queries.

If you think you've done too many queries manually, you can redesign your data model to reduce the overall number of queries. The files in MongoDB can be any type, so you can easily de-normalize the data. This allows it to always be consistent with your application.

Summary: No join can take a look at how to design a data structure model.

Journaling

MongoDB uses a memory-mapped file and outputs a notification to the disk every 60 seconds, which means that you may lose up to 60 seconds plus all data to the hard drive output notification during that time.

To avoid data loss, MongoDB has added journaling from version 2.0 (which is on by default). Journaling change the time from 60 seconds to 100ms. If the database is unexpectedly down, it will be re-enabled before starting to ensure that the database is in a consistent state. This is also where MongoDB is closest to a traditional database.

Of course journaling will have a slight effect on performance, about 5%. But the extra security for most people is certainly worth the money.

Summary: It is best not to close journaling.

no identity authentication by default

MongoDB does not have authentication under the default settings. MongoDB will consider itself in a trusted network with a firewall. However, this does not mean that it does not support authentication and can be easily turned on if needed.

Summary: MongoDB security can be guaranteed by using firewalls and binding the correct interface, and of course, authentication can be turned on.

Replica loss of data in set

Using replica set is an effective way to improve system reliability and easy maintenance. In this way, it becomes crucial to understand the occurrence and transfer mechanism of the fault between nodes.

Members of the Replica set typically pass information through Oplog (a list of actions that occur in the data, such as an increment, delete, change, etc.), and when one of the members changes Oplog, the other members are executed according to the Oplog. If the node that you are responsible for processing the new data resumes running after an error, it will be rolled back to the last Oplog common point. However, in this process: The Lost "new data" has been transferred from the database by MongoDB and placed in your data directory ' rollback ' waiting to be manually restored. If you do not know this feature, you may think that the data has been lost. So whenever a member recovers from an error, the directory must be checked. It's easy to recover this data by using the standard tools that MongoDB publishes. Check out the official documentation for more information.

Summary: Lost data in recovery will appear in the rollback directory.

Shard Too Late

Sharding is the ability to split data across multiple machines, often for performance gains when the replica set runs too slowly. MongoDB supports automatic sharding. However, if you let the shards go too late, the problem arises. Because the splitting of data and the migration of blocks takes time and resources, if the server resources are basically exhausted, it is likely that you will not be able to split the slices when you need the shards most.

The workaround is simple, using a tool to monitor MongoDB. Make the most accurate assessment of your server and shard it before 80% of total performance. Similar monitoring tools are: MMS, Munin (+mongo Plugin), and Cloudwatch.

If you're sure that you'll be slicing from the start, the better advice would be to use AWS or a similar cloud service for sharding. On a small server, shutting down or adjusting the machine is significantly more straightforward than transferring thousands of pieces of data.

Summary: Early fragmentation can effectively avoid problems.

You cannot change the Shard key in a file

For the Shard setting, Shard key is the credential that MongoDB uses to identify the chunked file. When you insert a file, you cannot make changes to the file's Shard key. The solution here is to delete the document and re-establish it so that it is allowed to be assigned to the corresponding chunking.

Summary: Shard Key can not be modified, when necessary, you can delete the file re-established.

can not shard over 256G collection

Back to the problem of fragmentation too late--MONGODB does not allow collection to grow to more than 256G, the previous version of the setup has not 256G. This limit will definitely be removed in the future, and there is no better solution. You can only recompile or control the size below 256G.

Summary: Shards are made before the collection reaches 256G.

uniqueness Index and sharing

The uniqueness constraints of an index are guaranteed only by Shard key.

More details

The wrong Shard Key was selected

MONGDB requires you to select a Shard key to shard the data. If you choose the wrong Shard key, it will be a hassle to change it.

Click to see how to change

Summary: Read this document before selecting Shard key.

unencrypted communication with MongoDB

The connection to MongoDB is non-encrypted by default, which means that your data may be recorded and used by third parties. This is unlikely to happen if your mongodb is used under your own non-wan.

However, if you are accessing MongoDB through the public network, then you will certainly want your communication to be encrypted. The public version of MongoDB does not support SSL. Fortunately, it can be very simple to customize their own version. 10gen users have a specially crafted encrypted version. Fortunately, most of the official drivers support SSL, but small trouble is also unavoidable. Click View Document.

Summary: When using the public network connection, be aware that the communication with MongoDB is not encrypted.

Transactions

Unlike MySQL, which supports multi-line data atomic manipulation of traditional databases, MongoDB only supports the atomic modification of single files. One way to solve this problem is to use asynchronous commits in your application, and the other is to create more than one data store. Although the first method does not apply to all situations, it is clearly better than the second one.

Summary: No support for multi-file transactions.

log pre-allocation slow

Mongdb may tell you that you are ready, but in fact it is still allocating the logs. If you choose to let the machine allocate itself, and it happens that your file system and disk speed is slow, then the annoying thing happened. Normally this will not be a problem, but once there is a undocumented FLAG–NOPREALLOCJ can be used to turn off pre-allocation.

Summary: If the machine file system and disk is too slow, then the log pre-allocation may be very slow.

NUMA + Linux +mongodb

Linux, Numa and MongoDB are not always very good when they run into each other. If you run MongoDB on NUMA hardware, it is recommended to turn it off directly. Because of the strange problems that follow, for example, the speed will be phased or when the CPU occupancy rate is very high when the sharp decline.

Summary: No NUMA is prohibited.

process limitations in Linux

If you have segmentation fault errors when MONGODB is not fully loaded, you may find that this is due to the use of too low or default open files or user process restrictions. 10gen recommends setting the limit to 4k+, however the size of the setting depends on the specific situation. Read Ulimit to learn more.

summary: Long For MongoDB in Linux plus soft or hard open files or user process restrictions.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.