MongoDB's Guide to the beginning

Source: Internet
Author: User
Tags ack benchmark mongodb schema design redis cluster

If the technology is just like the beginning, then will the pit?

MongoDB has been introduced in the system for several years, initially because there is a single table record in MySQL growing too fast (tens of millions of bar per day) easy to slow down MySQL master-slave replication. And this kind of data grows rapidly the water table, to the data consistency also not so high request, and the business also does not need to correlate queries it, considers to divide out. Why is MongoDB? Just in time to catch up with the company DBA team introduced this database, someone to help operations, the business team has become a natural choice. However, if you want to use it in a production environment for any technical product, it is best to have a comprehensive understanding of its architecture and operating mechanism.

Form

MongoDB is a NoSQL database, which is fundamentally different in the form of data storage and the relational database such as MySQL. The basic object of MongoDB storage is document, so we call it a documentation database, and the collection of documents makes up the Collection. The concept analogy with SQL, Collection corresponds to Table and Document corresponds to Row. Document is expressed using a BSON (Binary JSON) structure, JSON is familiar to everyone, like this.

How is the Document stored internally? Each Document is saved in a Record. The Record is equivalent to a piece of space that is allocated within MongoDB, and some additional padding may be reserved In addition to the contents of the Document being saved. If the document being written is updated, it may result in an increase in the document length, which can take advantage of the additional padding space. If the business does not update or delete the document after it is written (such as monitoring logs, pipelining records, etc.), you can specify a no-fill record allocation policy, which is more space-saving.

On the basis of understanding the document form, we'll say a few more access operations for document. The new Wiredtiger storage engine provides concurrent operations at the Document level, so concurrency performance has improved. In addition MongoDB provides only the ACID guarantee of transactions for single document, and transaction characteristics cannot be guaranteed if one operation involves multiple document. Different business data have different requirements for transactional consistency, so the application developer needs to know the possible consistency effect of putting data in different Document writes. Detailed operation API directly look at the official documents, do not repeat.

Safety

Here the security refers to the data security, security means that the data is stored safely, and will not be lost. A lot of controversy has been raised about MongoDB data security in earlier versions (1.x). (Can see reference [2])

Security and efficiency are in fact mutually restrictive, the more secure the less efficient, the more efficient the more insecure. MongoDB's design scenario is about dealing with large amounts of data writing and querying, while the data is less important. So the default setting for MongoDB is between security and efficiency, which is more efficient.

Let's look at how the next Document is handled internally after it is written to MongoDB. The MongoDB API provides different security-level write options to allow the consumer to choose the nature of their data flexibly.

Write to Buffer without ACK

In this mode MongoDB is not to confirm the write request, client-side call driver after writing if there is no network error is considered successful, actually the actual write success is not indeterminate. Even if there is no problem with the network, it is saved in memory Buffer and then asynchronously written to the journaling log, which has a 100ms (default) time window on the disk (write disk). The general database is designed to write journaling log, and then asynchronously write the real data file to disk, this may be longer, MongoDB is 60 seconds or journaling log to reach 2G.

Write to Buffer with ACK

This is a little bit better than the previous one, and MongoDB receives a write request that writes a memory Buffer and then sends back an ACK acknowledgment. The client side ensures that MongoDB receives write data, but there is still a brief journaling log drop-off time that can lead to potential data loss.

Write to journaling with ACK

This mode ensures that the Ack is sent back at least after the journaling log is written, and that the client ensures that the data is at least written to disk and is highly secure.

Write to Replica Buffer with ACK

This mode is for multi-replica sets, in order to improve data security, in addition to the timely writing to disk can also be written by multiple copies of the promotion. In this mode, the data is written to at least 2 copies of the memory Buffer before the Ack acknowledgement is sent back. Although all in memory Buffer, but two instances in the short disk 100ms time difference in the probability of simultaneous failure is very low, so security has improved.

Understand the different write mode options, we can better really choose the right level of security for the nature of the data. In the Back efficiency section, we analyze the efficiency differences in different write modes.

Capacity

Consider the capacity of the Document as the base unit before considering the overall storage capacity of MongoDB. Document This JSON form is inherently associated with data storage redundancy, primarily field attributes, which are saved once for each Document. At present, the 3.2 version of MongoDB already has the new Wiredtiger as the default storage engine, it provides the compression function, has two kinds of compression form:

    • Snappy the default compression algorithm to strike a balance between compression rate and CPU overhead.
    • Zlib higher compression rates, but it also leads to higher CPU overhead.

And each Document still has the maximum capacity limit, cannot grow indefinitely, this limit is currently 16MB. So what do I need to do with a file larger than 16MB, MongoDB provides gridfs to store files over 16MB in size. As shown, a large file is broken up into small files Chunk, each Chunk size 255KB, and stored in a document. Gridfs uses 2 Collection to store file Chunk and file metadata, respectively.

The capacity of a single machine is always limited by the size of the disk, and the MongoDB solution remains fragmented. is to use more machines to provide greater capacity, the Shard cluster uses the proxy model ("Lien and the Redis cluster" in the article), such as.

The data on each shard is organized in Chunk form (similar to the Slot concept of the Redis Cluster) to facilitate data migration and rebalancing within the cluster. It is easy to confuse the Chunk here is not the Chunk mentioned in the previous Gridfs, their relationship is like (under the Groove, why use the same name of the term to express a completely different concept).

MongoDB Cluster, which supports both horizontal scaling and data rebalancing, is no longer a problem with the basic data capacity.

Efficiency

The preceding "Security" section lists the different write modes, and we look at how efficient writes are in these different modes. As the official does not provide benchmark performance test data, the following data comes from reference [5] a written benchmark data shared by a professional technology company blog that has been used by MongoDB since 2009. I do some analysis based on the results of the data, the following is the test Results data table and graphic display.

The difference between a test type and a journaling log on an SSD and a mechanical hard drive allows us to intuitively feel the difference in the performance of SSDs and mechanical hard drives in sequential write situations. The biggest performance constraint for a mechanical hard drive is moving in the head, so the MongoDB official document also recommends placing journaling logs and data files on separate disks. Ensure that the head of the sequential write journaling log is not affected by the random write data file, and that the data file is written as an asynchronous process buffered through memory buffer, which has little effect on the interaction performance delay.

According to the data of test results, there is a time difference between the response delay and no Ack, which is basically one more delay waiting period of network transmission. Open journaling guarantee timely landing, whether it is SSD or mechanical hard disk, this delay has risen 2 orders of magnitude, doubled, and SSD's sequential write than mechanical hard drive average 3 times times faster. While the average delay of writing a double copy is much higher than I expected, it should be said that the delay fluctuates very much, not as much as the minimum, maximum, and average values of the write disk delay. In theory, the case of double-copy non-landing time delay should be more than a single case of network overhead plus some program overhead, while the actual test data show much higher than expected and the delay fluctuation range is much larger. In this mode, MongoDB delay performance fluctuation range is too large, not stable enough, specifically to achieve the defect or test is not accurate, it is unclear. And then the test version is 2.4.1 do not know the latest version of 3.2, if the use of this type of writing mode, you can simulate their own production environment measured to draw conclusions.

As for the reading performance is unable to do benchmark, different document models, choose different query conditions, performance may be different. Although MongoDB is schemaless, it does not mean that the document schema should not be designed, and the performance impact of different schema designs is significant.

Summarize

Faced with a new technology product or system, "morphology" is a description of the most unique part of the product or system and belongs to the core model. and the "safety", "capacity", "efficiency" three core dimensions comprehensively reflect a technical product or system of different design and implementation considerations, can be class in the mechanical design than the "three views." For the first time to face a new technology product or system, this is a suitable entry point to help make preliminary technical decisions, and then follow the further practice test to verify the thinking and understanding, so as to better understand and use the existing technology, to become a qualified technical take doctrine.

Reference

[1] MongoDB Doc. MongoDB Manual
[2] MongoDB white Paper. MongoDB Architecture Guide
[3] Chenhao. Don't use MongoDB? Are you sure?. 2011.11
[4] David Mytton. Does everyone hate MongoDB?. 2012.09
[5] David Mytton. MongoDB benchmarks. 2012.08
[6] David Mytton. MongoDB Schema Design Pitfalls. 2013.02

MongoDB's Guide to the beginning

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.