MongoDB summary)

Last Update:2018-12-07 Source: Internet

Author: User

Tags couchdb

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I used MongoDB for a while and made some summary for future reference. According to the past habits, let's give a general overview first, and then pick out some of the points that you are more concerned about, as the Pearl River, and explain them in series.

MongoDB is written by C ++ and its name comes from HuMongoIn the middle of the word "us", we can see from the name that its ambition lies in the processing of massive data. The simplest description of this database is scalable, high-performance, open source, schema-free, and document-oriented database. I have some personal preferences for document-based databases. These preferences come from couchdb six months ago, because I think it is appropriate to describe a entity object with personalized characteristics, for example, a user or item book on a website.

Concepts:

Like mysqld, A mongod service can have multiple databases, and each database can have multiple tables. The table name here is collection, and each collection can store multiple documents ), each document is stored in the hard disk as bson (Binary JSON. What is different from relational databases is that they are stored in a single document. You can add or delete fields for one or more documents without affecting other documents, this is the so-called Schema-free, which is also the main advantage of document-type databases. Unlike the general key-value database, its value stores structure information, so you can perform read/write, statistics, and other operations on certain domains like relational databases. It can be said that it combines the convenience and efficiency of the key-value database with the powerful functions of the relational database.

Index

Similar to relational databases, MongoDB can index a field, create a composite index, unique index, or delete an index. Of course, creating an index will increase the space overhead. My suggestion is that if you can consider a document as an object and use it online, generally, you only need to create an index on the Object ID, and retrieve some data of the object based on the ID and put it in memcache. If it is necessary for background analysis and the response requirements are not high, it takes too much time to query non-indexed fields even if you directly scan the table. If not, create another index.

By default, each table has a unique index: _ id. If _ id is not specified during data insertion, the service automatically generates a _ id. to make full use of existing indexes, reduce the space overhead. It is best to specify the key of a unique as _ id. It is usually appropriate to use the Object ID, such as the product ID.

Capped collection

Capped collection is a special table. Its table creation command is:

 DB. createcollection ("mycoll", {capped: True, size: 100000 })

You are allowed to specify a certain space size at the beginning of the table creation. The next insert operation will constantly append data in this pre-allocated space file. If the size of the data exceeds the size of the space, then, the system will return to the file header to overwrite the original data and continue the insertion. This structure ensures the efficiency of insertion and query. It does not allow the deletion of a single record, and there are restrictions on the Update: the size of the original record cannot exceed. This type of table is very efficient. It is suitable for some scenarios where data is saved temporarily, such as the session information of users logged on to the website, and someProgramThe monitoring logs are all data that can be overwritten after a certain period of time.

Copy and sharding

The replication architecture of MongoDB is similar to that of MySQL. In addition to the master-slave configuration and master-master configuration, there is also a replica pairs configuration, this configuration can normally work like the master-slave. Once the master node fails, the application automatically connects to the slave. It is also easy to copy. I have used the master-slave configuration myself, as long as the-master parameter is added when a service is started, and the-slave and-source parameters are added to the other service, to achieve synchronization.

Sharding is a headache. If the data volume is large, it must be sharded. The sharding in MySQL has become a nightmare for countless DBAs. In MongoDB, the easily distributed feature of the Document Database, similar to the key-value database, is displayed. no matter whether the sharding service is constructed, it is very easy to add or delete nodes. However, MongoDB is not mature enough in this aspect. Currently, only the alpha2 version (MongoDB V1.1) is used for sharding. It is estimated that there are still many problems to solve, so we can only look forward to it.

Performance

In my use cases, query of tens of millions of document objects and nearly 10 Gb of data for indexed IDS is not slower than query of non-indexed fields, in this case, it is a complete winner. MySQL is not competent to query any field in a large data volume, and I was surprised by the query performance of MongoDB. The write performance is equally satisfactory, and millions of data records are also written. MongoDB is much faster than couchdb I have tried before and can be solved in less than 10 minutes. In addition, MongoDB is far from a CPU killer during observation.

Gridfs

Gridfs is an interesting file system like MongoDB. It can use a large file space to store a large number of small files, this is especially effective for storing a large number of small files (such as a large number of user portraits) common in Web websites. It is also very convenient to use, basically similar to the general file system.

Use a suitable database to do the right thing

The user cases mentioned in MongoDB documents include real-time analysis, logging, and full-text search. Some domestic users also use MongoDB to store and analyze website logs, however, I think MongoDB is not suitable for processing website logs of a certain scale. The most important thing is that it occupies too much space. It can store 1 GB of log data into several GB, A hard disk cannot store logs for several days. On the other hand, sharding must be considered when the data volume is large, and MongoDB's sharding is still immature till now. Because logs are non-updatable, only append is required. Because operations on logs are usually concentrated in only one or two columns, the most suitable database for log analysis is a column-store database, in particular, column-store databases designed for Data Warehouses like infobright.

Because MongoDB does not support transaction operations, systems with strict transaction requirements (if the banking system) certainly cannot use it. The reason why MongoDB occupies too much space is described in the official FAQ as follows:

1. pre-allocation of space: to avoid the formation of too many hard disk fragments, MongoDB will apply to generate a large disk space each time when space is insufficient, in addition, the application volume increases exponentially from 64 m, 128 M, and m until 2G is the maximum volume of a single file. As the amount of data increases, you can see these files in the data directory with increasing capacity.

2. space occupied by field names: to keep the structure information in each record for query, MongoDB needs to store the key-value of each field in the form of bson, if the value field is not large relative to the key field, for example, to store numeric data, the overhead of the data is the largest. One way to reduce the space occupation is to take the field name as short as possible, so that the occupied space will be smaller, but this requires a trade-off between ease of use and space occupation. I once suggested that the author use the field name as an index, and each field name is represented in one byte, so that you don't have to worry about the length of the field name. However, the author's concerns are not unreasonable. This index method requires that the index value be replaced with the original value after each query result and then sent to the client. This replacement is quite time-consuming. The current implementation is to exchange space for time.

3. delete a record without releasing space: This is easy to understand. To avoid large-scale migration of deleted data, the original record space is not deleted and only marked as "deleted, it can be reused in the future.

4. You can regularly run dB. repairdatabase () to sort records, but this process will be slow.

Because the official documents have detailed descriptions of various aspects, I have not referenced the original andCodeI just summarized some experiences based on my usage. If you are interested, you may wish to go through the questions you are interested in directly in the document. There is a good introduction on the excellent blog.

Finally, the document-based database is a bit like a popup, and always shows its advantages as a relational database or key-value database when appropriate.

Case studies:

Yesterday, I started to encounter errors when accessing the python program of MongoDB. I often throw an assertionerror exception. After verification, it is only the master query exception. The slave is normal and it can be determined that the master data has encountered a problem.

Repair Process:

1. DB. repairdatabase () on the master node does not work;

2. Stop slave synchronization;

3. Perform mongodump on slave to back up data;

4. Perform external store on the master node to restore the backup data. You can use the-drop parameter to delete the original table first.

5. Restore the slave synchronization.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More