MongoDB initial guide

Last Update:2016-01-20 Source: Internet

Author: User

Tags mongodb schema design

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MongoDB initial guide

If the technology is just as early as the first time, will it still be difficult?

MongoDB has been introduced in the system for several years. At the beginning, it was because a single table of MySQL records grew too fast (tens of millions of records per day) and it was easy to slow down MySQL master-slave replication. This kind of fast-growing streaming water meter has less requirements on data consistency and does not need to be associated with it for business queries. Why is MongoDB? It happened that the DBA team of the company introduced the database, and some people helped the O & M, making the business team a natural choice. However, if you want to use any technical product in the production environment, you 'd better determine a comprehensive understanding of its architecture and operating mechanism.

Form

MongoDB is a NoSQL database, which is essentially different from MySQL in terms of data storage. The basic object stored in MongoDB is Document, so we call it a Document database, and the Collection of documents forms a Collection. Compared with SQL, Collection corresponds to Table and Document corresponds to Row. Document is expressed using a BSON (Binary JSON) structure. JSON is familiar to everyone, as shown below.

How is Document stored internally? Each Document is saved in a Record. Record is equivalent to a space allocated within MongoDB. In addition to saving the Document content, it may reserve additional space for filling. If the written Document is updated, the length of the Document may increase, and additional space can be used. If the business does not update or delete the written Document (such as monitoring logs and flow records), you can specify the RecZ records without filling? Http://www.bkjia.com/kf/ware/vc/ "target =" _ blank "class =" keylink "> mirror/VvOShozwvcD4NCjxwPjxpbWcgYWx0PQ =" here write picture description "src =" http://www.bkjia.com/uploads/allimg/160120/040I93121-2.png "title =" \ "/>

After learning about the Document format, let's talk about the access to the Document. The new WiredTiger storage engine provides Document-level concurrent operations, which improves the concurrency performance. In addition, MongoDB only provides ACID guarantee for transactions for a single Document. If an operation involves multiple documents, transaction features cannot be guaranteed. Different business data have different requirements for transaction consistency. Therefore, application developers need to know the possible impact of Data Writing in different documents. For details about the operation API, go to the official documentation.

Security

Here, security refers to data security. Security means that data is stored securely and will not be lost. There was a lot of debate about MongoDB data security in earlier versions (1.x. (For details, refer to [2])

Security and efficiency are mutually restricted. The more secure the system is, the less efficient the system is. MongoDB is designed to deal with a large amount of data writing and query, and the importance of data is relatively low. Therefore, MongoDB's default settings are between security and efficiency, which is more efficient.

Let's first look at the internal processing method of the next Document written to MongoDB. The MongoDB API provides write options at different security levels to allow users to flexibly choose based on their data nature.

Write To Buffer Without ACK

In this mode, MongoDB does not confirm the write request. After the Client calls the driver to write data, if there is no network error, it is regarded as successful. It is not certain whether the write is successful. Even if there is no network problem, after the data arrives at MongoDB, it is saved in the memory Buffer first, and then asynchronously written into the Journaling log, which contains a 100 ms (default) disk (written to the disk) time window. Generally, the database is designed to write Journaling flow logs first, and then asynchronously write the real data files to the disk. This may take a long time, mongoDB is 60 seconds or the Journaling log reaches 2 GB.

Write To Buffer With ACK

This is a little better than the previous mode. MongoDB receives the write request, first writes the memory Buffer, and then returns Ack for confirmation. The Client can ensure that MongoDB receives the written data, but there is still a short Journaling log disk time difference, resulting in potential data loss.

Write To Journaling With ACK

This mode ensures that at least the Journaling log is written before Ack confirmation is sent back. The Client can ensure that the data is written to the disk at least, which is highly secure.

Write To Replica Buffer With ACK

This mode is applicable to multiple replica sets. To improve data security, you can write multiple replicas to the disk in time. In this mode, Ack confirmation is returned only when data is written to the memory Buffer with at least two copies. Although both instances are in the memory Buffer, the probability of failure between the two instances in the temporary 100 ms time difference is very low, so the security is improved.

Only by understanding the different write mode options can we better select the appropriate security level for the nature of the data. Next, we will analyze the efficiency differences in different write modes in the efficiency section.

Capacity

Before considering the overall storage capacity of MongoDB, consider the Document capacity as the basic unit. The JSON format of Document naturally leads to data storage redundancy, mainly because every Document in the field attribute is saved once. Currently, MongoDB 3.2 uses the new WiredTiger as the default storage engine. It provides the compression function in two ways:

Snappy adopts the default compression algorithm, which balances the compression ratio and CPU overhead. Zlib has a higher compression rate, but it also brings a higher CPU overhead.

Each Document still has a maximum capacity limit and cannot be infinitely increased. This limit is currently 16 MB. What should I do if I want to store files larger than 16 MB? MongoDB provides GridFS to store files larger than 16 MB. As shown in, a large File is split into a small File Chunk. Each Chunk is 255KB and stored in a Document. GridFS uses two collections to store the Chunk and metadata respectively.

The capacity of a single machine is always limited by the disk size, while the MongoDB solution is still fragmented. It is to use more machines to provide larger capacity. The sharding cluster adopts the proxy mode (this mode has been written in the article "integration and connection of Redis clusters"), for example.

Data on each shard is organized in the form of Chunk (similar to the Slot concept of Redis Cluster) to facilitate data migration and rebalancing within the Cluster. It is easy to confuse that the Chunk here is not the Chunk mentioned in GridFS, and their relationship is like (in general, why should we use the same name to express completely different concepts ).

MongoDB Cluster that supports horizontal scaling and data rebalancing is basically no longer a problem.

Efficiency

The previous section "security" lists different write modes. Let's see how efficient writing is in these different modes. Since no benchmark performance test data is provided officially, the following data comes from the reference [5] writing benchmark test data shared by a blog of a professional technology company that has been using MongoDB since 2009. Here I will make some analysis and summary based on the data results. The following is a table and graphic display of the test result data.

The test type has an additional difference of storing Journaling logs on SSD and HDD, which allows us to intuitively feel the performance difference between SSD and HDD in sequential writing. The biggest performance constraint on a mechanical hard disk is moving the head. Therefore, we recommend that you store Journaling logs and data files on different disks in MongoDB official documents. Ensure that the headers of sequential Journaling logs are not affected by random Writing of data files, while the writing of data files is an asynchronous process buffered by the memory buffer, which has little impact on the latency of interaction.

According to the test results, there is a double difference in response latency between Ack or not, which is basically a delay wait time for network transmission. When Journaling is enabled, the latency is increased by two orders of magnitude, hundreds of times, and the sequential write speed of SSD is three times faster than that of the mechanical hard disk. The average latency of writing dual copies is much higher than I expected. It should be said that the latency fluctuates greatly, unlike the minimum, maximum, and average latency of writing disks. Theoretically, when writing dual-copy data to a disk, the latency should only be twice the network overhead and some program overhead in a single case. The actual test data shows that the latency fluctuation range is much higher than expected. In this mode, MongoDB latency varies too much and is not stable enough. It is not clear whether the implementation defect or test is not accurate enough. In addition, the tested version is 2.4.1 and does not know the latest version 3.2. If this write mode is used, you can simulate the actual test in your production environment to draw a conclusion.

As for reading performance, no benchmark test can be performed. Different document models may have different performance when different query conditions are selected. Although MongoDB is Schemaless, it does not mean that you do not need to design the Schema of the document. Different Schema designs have a great impact on performance.

Summary

Facing a new technical product or system, the "form" is a description of the most unique part of the product or system and is a core model. The three core dimensions of "security", "capacity", and "efficiency" fully reflect the different design and implementation considerations of a technical product or system, similar to the "Three Views" in the mechanical design 」. For the first time facing a new technical product or system, this is a suitable starting point to help with preliminary technical decisions, and then follow further practical tests to verify thinking and understanding, in this way, we can better understand and make good use of existing technologies and be qualified technical experts.

Reference

[1] MongoDB Doc. MongoDB Manual
[2] MongoDB White Paper. MongoDB Architecture Guide
[3] Chen Hao. Never use MongoDB? Really ?. 2011.11
[4] David Mytton. Does everyone hate MongoDB ?. 2012.09
[5] David Mytton. MongoDB Benchmarks. 2012.08.
[6] David Mytton. MongoDB Schema Design Pitfalls. 2013.02

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More