Mailbox: The performance tuning and cluster migration of the information database

Source: Internet
Author: User
Keywords nbsp;
Tags address based code copy data different document find

In mailbox rapid expansion process, one of the performance problems is the MongoDB database level write lock, the time spent in the lock waiting process, directly reflects the user's use of the service process delay. To address this long-standing problem, we decided to migrate a common set of MongoDB (storing mail-related data) to a separate cluster. According to our inference, this will reduce the lock latency by 50%, and we can add more fragments, and we expect to be able to optimize and manage different types of data independently.

We first started with the MongoDB document and soon found the clonecollection command. The subsequent tragic discovery, however, is that it cannot be used in a fragmented set; Similarly, renamecollection cannot be used in a fragmented set. After denying other possibilities (based on performance issues), we wrote a Python script to copy the data and another script to compare the raw and target data. In the process, we also found a lot of interesting things, such as gevent and Pymongo copy large data set time is Mongodump (c + +) Half, even if the MongoDB client and server on the same host. Through the final effort, we developed Hydra, a toolset for MongoDB migrations, which is now open source. First, we built the original snapshot of the MongoDB collection.

Question 1: The performance of tragedy

Early on I did an experiment to test the ultimate speed at which the MongoDB API works-enabling a simple use of the MongoDB C + + software development Kit. On the one hand, I'm tired of C + +, on the one hand, I hope most of my colleagues who are proficient in Python can use or adapt to this kind of code for other purposes, I did not further explore C + + use, but found that, if it is for small amounts of data, in the same task, Application speed is 5-10 times that of simple Python applications.

So, my research direction goes back to Python, the Dropbox default language. In addition, when a series of remote network requests such as a mongod query are made, the client often spends a lot of time waiting for the server to respond, and there seems to be not a lot of CPU-intensive operations (parts) that copy_collection.py (My MongoDB Collection Replication tool) requires. Initialcopy_collection.py's small CPU usage confirms this.

Then, MongoDB requests to copy_collection.py. The initial work-thread test results are not ideal. But next, we implement the worker thread communication through the Python queue object. This is still not a good performance because the overhead of IPC is overshadowed by the increased concurrency. Using pipes and other IPC mechanisms does not help much.

Next, we tried a MongoDB asynchronous query using single-threaded python to see how much performance savings could be made. Gevent is one of the most common libraries to implement this approach, and we tried it. Gevent modified the standard Python module to implement asynchronous operations, such as sockets. The good thing is that you can simply write asynchronous read code, just like synchronizing code.

This simple code can be copied from the MongoDB source collection to the target location based on their _idfields, and their _idfields are unique identifiers for each MongoDB document. Opy_documents will produce delegated greenlets run runcopy_document () for document replication. When Greenlets performs a blocking operation, such as any requirement for MongoDB, it puts control to other greenlet that are ready to execute. Because all greenlets are executed in the same thread and process, you generally do not need any form of internal locking.

With Gevent, you can find a quicker way than the worker thread pool or worker process pool. The performance of each method is summarized below:

Approachperformance (Exit is decoupled) single process, no gevent520 documents/secthread worker pool652 documents/ Secprocess worker pool670 Documents/secsingle process, with gevent2,381 documents/sec

Integrated gevent and worker processes (one per fragment) can be linearly elevated in performance. The key to using a worker process effectively is to use fewer IPC as much as possible.

Issue 2: Replication changes after a snapshot

Because MongoDB does not support transactions, if you read a large dataset that is performing a modification, you may get a different result. For example, you use MongoDB find () to read through the entire dataset, and your result set may be:

Ncluded:document saved unreported your () Included:document saved unreported your find () Included:document saved unreported Your find () Included:document inserted on your find () began

In addition, in order to minimize the failure time when the mailbox backend points to the new replica set, it is critical to minimize the time spent in applying the source cluster to the new cluster.

Similar to the majority of asynchronous replication storage, MongoDB uses the action log Oplog to record the increment, modification, and deletion of Mongod instances to assign to all copies of this Mongod instance. Given the snapshot, Oplog records all changes after the snapshot occurred.

So the job here becomes the Oplog record of the application source cluster on the target cluster, from Kristina Chodorow's teaching blog, we know the Oplog format. Given the serialized format, both the Gerze and the deletion are very easy to execute, and the difficulty is becoming a part of it.

The Oplog logging structure of the operation is not very friendly: The duplicate key is used in MongoDB 2.2, but these duplicate keys cannot be rendered through the MONGO shell, not to mention most of the MongoDB drivers. After careful deliberation, a simple workaround was chosen: embed _id in the source document to trigger other copies of the document. Because it is only for modifications, it is impossible to achieve full synchronization of the replica set and the source instance, but it can reduce the gap between the real-time state of the replica set and the snapshot as much as possible. The following chart shows why the intermediate version (V2) is not necessarily identical, but the source and destination replicas remain consistent:

There is also a performance problem with target clusters: While separate processes are used for each fragmented OPS, continuous OPS performance still does not match mailbox requirements.

This parallel OPS is a must-have, but the correctness of the guarantee is not easy. In particular, the same _id operations must be executed sequentially. A python set is used here to maintain the _id set that is executing the Modify OPS: When a document that requests a modification is in progress on the copy_collection.py, all OPS (whether modified or otherwise) that is requested by the system is blocked until the old operation ends. As shown in the figure:

>

Verifying replicated data

Comparing the replica set with the source instance data is usually a simple operation, but it is a great challenge in multiple processes and multiple namespaces. At the same time, based on data being constantly being modified, there are more things to consider:

First, you use compare_collections.py (a tool for comparing data) to verify data for the most recently modified document, and then if there are inconsistencies, remind them and then review them. However, this deletion of the document is not valid because there is no last modified timestamp.

The second thought is "final consistency" because it is very popular in asynchronous scenarios, such as MongoDB replica sets and MySQL master/copy. After a very large number of attempts (in addition to the next big failure scenario), the source data and the copy will remain in the final agreement. As a result, a number of repeated comparisons have been made to continuously increase backoff in successive retries. There are still problems, such as data wavering between two values, but in the modified mode, there is no problem with the migrated data.

Before performing the final conversion of the new and old MongoDB cluster, it is important to ensure that the recent OPS has been applied, so we have added command-line options in compare_collections.py to contrast the most recent n operations modified by the document, which effectively avoids inconsistencies. This does not take much time, and a single fragment of hundreds of thousands of OPS will take a few minutes to ease the pressure of comparison and retry paths.

Contingency handling

Although there are several ways to handle errors (retry, discover possible exceptions, logs), there are still many unexpected errors in the final test before the product migration. There have been some occasional network problems, and a particular document assembly has been causing MONGOs to disconnect from the copy_collection.py and to reset the accidental connection to the Mongod.

After the attempt, we found that we worked out a specific solution to these problems, so we quickly turned to recovery. We recorded the document _ID that these compare_collections.py detected, and then created a document duplication tool for these _id.

Final migration Time

During the product migration process, copy_collection.py built a raw snapshot of tens of millions of e-mails and recreated billions of MongoDB ops. The entire replication process lasted about 9 hours, and we set a time limit of 24 hours to execute the original snapshot and index. We repeated the use of copy_collection.py 3 times and checked the data required 3 times.

All conversions are not completed until today, and MongoDB-related work is rarely (in minutes). In a simple Maintenance window, we use compare_collections.py to compare the last 500,000 ops per slice. After ensuring that there was no inconsistency in the final operation, we did some related testing, then pointed the mailbox back to the new cluster and reopened the service to the user. After the conversion, we did not receive any user feedback questions. Making the user feel that migration is the greatest success. The migrated upgrade is shown in the following illustration:

Less time to write locks than 50% (originally expected)

Open Source Hydra

Hydra is a collection of all the tools used in the previous operation and is now open source on GitHub.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.