MongoDB Fast Balanced Import data

Source: Internet
Author: User
Tags mongodb sharding python script mongorestore

Requirement Environment:

There is a project that needs to upgrade MongoDB sharding from 2.6 to 3.0 and use the WT engine, where 2.6 of the environment many collectiong turn on sharding, and the amount of data is large.

Select Mongodump,mongorestore Mode

Problem:

There are 2 problems with the restore step

1) Large number of data import slow time spend long

2) Chunks distribution is not uniform, it takes a long time to do balance.

Analysis Reason:

In view of the above problems, I have made a preliminary study and study of MongoDB, but also looked at some excellent blog, found some reasons:

MongoDB importing large amounts of data involves chunks splitting and the new chunk file allocation process that takes time, and the automatic equalization strategy cannot be evenly distributed chunks to each shard which also causes different shard IO usage gaps in the data import, The entire cluster's IO cannot be reasonably used.

Workaround:

Can pre-allocate chunks in advance and move evenly to each shard, for this scenario I wrote a simple Python script implementation

#! /usr/bin/python#input Basic infons = "Crawler.logsdata" Shard_key = "_id" shards= ["shard01", "shard02", "shard03", " Shard04 "]min_key = 237223617max_key = 264171274avg_doc_size = 2340 #bytechunk_size = 64*1024*1024 #byte 64MBfragment = 0.9def Split_chunk (ns,shard_key,shards,min_key,max_key,avg_doc_size,chunk_size,fragment): Fname= './' +ns+ '. js ' f=     Open (fname, ' a ') f.write ("db = Db.getsiblingdb (' admin ')" + ' \ n ') docs_per_chunk = Int (chunk_size*fragment/avg_doc_size)        Key_value=min_key+docs_per_chunk shard_counter = 0 Shardlen = Len (shards) while Key_value < max_key: str_split_chunk= (' Db.runcommand ({split: "%s", Middle: {%s:%d}}) ')% (ns,shard_key,key_value) Str_move_chun k= (' Db.runcommand ({movechunk: '%s ', find: {%s:%d}, to: '%s '}) ')% (Ns,shard_key,key_value,shards[shard_counter]) shar D_counter = shard_counter + 1 if shard_counter = = Shardlen:shard_counter = 0 Key_value=key_value +docs_per_chunk F.write (Str_split_chunk+ ' \ n ') f.write (str_move_chunk+ ' \ n ') # print (str_split_chunk) # Print (Str_move_chunk) f . Closedsplit_chunk (Ns,shard_key,shards,min_key,max_key,avg_doc_size,chunk_size,fragment)

Step1

Edit the above script fill parameters run will generate an assignment and move chunks js file filename is crawler.logsData.js

Step2

Run with the following command to achieve uniform pre-allocation of chunks

Time MONGO admin-u username-p ' passwd '

Step3

Run the following command to implement collection Crawler.logsdata data import

Time Mongorestore--host * * * *--port *--db crawler--collection logsdata-u username-p "passwd"/home/user/crawler/l Ogsdata.bson

Attention:

Step1&2 can be pre-processed without having to wait for the migration, so allocating chunks and chunks balance time can be saved during the actual migration, which also reduces the import data occurrence chunk split.

Here are my test results mongorestore time spent 26GB data 14 minutes

Divergence:

For the above test, we further consider whether this method can be used for maintenance, if we know in advance collection monthly data growth, then we can advance for the next one months of data to do chunks pre-distribution,

This eliminates the need to use MongoDB balance because the data is written in accordance with our plan, which balances the sharding IO utilization and improves the overall sharding write efficiency.

MongoDB Fast Balanced Import data

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.