Using MongoDB's Splitvector command to implement concurrent data migration _mongodb

Source: Internet
Author: User
Tags mongodb

Absrtact: Background Data Migration is a very common scene in the database operation dimension. Data migration is divided into total and incremental. In order to pursue speed, we usually use concurrent methods to migrate the data in a full volume. When you export data in a full volume, you typically choose to record-level concurrency, so it's often about the process of splitting (partitioning) of a table (set) that needs to be exported according to concurrency. Background

Data migration is a very common scene in the database operation dimension. Data migration is divided into total and incremental. In order to pursue speed, we usually use concurrent methods to migrate the data in a full volume. When you export data in a full volume, you typically choose to record-level concurrency, so it's often about the process of splitting (partitioning) of a table (set) that needs to be exported according to concurrency. The existing common practice is to find some partition points by using several skip plus limit, and then you can export multiple partitions concurrently. In fact, MongoDB also has a splitvector command that is especially suitable for partitioning a collection. This article describes how to use this command to partition a collection to achieve concurrent data migration. Introduction to Commands

The Splitvector command was originally an internal command to be used in the chunk split in the sharding, and was used when MONGOs was sent to the chunk to compute the split point (chunk) before it was ready to split a shard. But this command can also be used for ordinary replica sets, where we can think of a set of replicas as a unique chunk, using this command to compute the split point for this chunk, thus achieving partitioning for a set.

The use of the Splitvector command is not described in the official document, but is an internal command, but can be seen with the help of the command:

Db.runcommand ({splitvector: ' Test.test ', help:1})
{
 ' help ': ' Help For:splitvector Internal nexamples:\n  {splitvector: \ blog.post\, keypattern:{x:1}, Min:{x:10}, max:{x:20}, maxchunksize:200}\n  Max Chunksize mbs\n may  Optionally specify ' maxsplitpoints ' and ' maxchunkobjects ' to avoid traversing the whole CH unk\n  \  {splitvector: \ "blog.post\", Keypattern:{x:1}, Min:{x:10}, max:{x:20}, force:true}\n  ' Force ' W Ill produce one split point even if data is small; Defaults to False\nnote:this command could take a while to run,
 "LockType": 0,
 "OK": 1
}

As you can see from the Help document, this command is generally used in this way:

Db.runcommand ({splitvector: "Blog.post", Keypattern:{x:1}, Min{x:10}, max:{x:20}, maxchunksize:200})

Next, I'll introduce each parameter and its meaning. The field type describes the Operation object collection name of the Splitvector string splitvector the partition key used Keypattern document chunk splitting, which must have an index, and sharding key in Shard. In a replica set, it is usually specified as the primary key _id index min document optional parameter, the minimum value of the partitioned dataset, if not specified, then use the Minkey max document optional parameter, the maximum partition dataset, if not specified, then use the Maxkey Maxchunksize An integer optional parameter, and the "force" parameter must specify one. The maximum size of each chunk after partition maxsplitpoints the integer optional parameter, the number of split points is upper maxchunkobjects the integer optional parameter, the maximum number of records per chunk after partition, and the default is 250000 force Boolean optional arguments, and "maxchunksize" parameters must specify one. By default, if the current chunk data size is less than maxchunksize, it will not split. If "force" is specified to be true, it forces the middle point of the current chunk to split and return a split point. The default is False.

So many parameters how to use it. How do I know what the result is? No more detailed documentation, only the source of the bite. Principle

The principle of splitvector is to traverse the specified "Keypattern" index, and according to the specified "Maxchunksize" Find n splitting points that meet the following conditions: each new chunk after splitting is about half the size of "maxchunksize". If the current size of the collection is smaller than "maxchunksize" or the number of collection records is empty, an empty split point collection is returned. If "Force:true" is specified, the incoming "maxchunksize" argument is ignored and the fragment is forced at the middle of the collection, where only one split point is generated.
The number of documents included in each chunk after a split is calculated first based on the average document size of the collection when looking for a split point:

Original link

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.