Mongodb-based Distributed Data Storage

Source: Internet
Author: User

Note: This article is a by-product study of Mongodb Distributed Data Storage. Through the relevant steps in this article, data in a large table can be distributed to several mongo servers.

In MongoDB 1.6, the auto-sharding function is basically stable and can be used in a production environment. Because it is auto-sharding, mongodb uses mongos (an automatic sharding module used to build a large-scale scalable database cluster, which can be incorporated into dynamically increasing machines) A horizontally scalable Database Cluster System is automatically created to store database sub-tables on each sharding node.

A mongodb cluster includes some shards (including some mongod processes), mongos routing processes, and one or more config servers.

(Note: The test cases in this article require a 64-bit mongo program, because I have never succeeded in a 32-bit mongo program ).

The following are some vocabulary descriptions:
Shards: Each shard includes one or more services and mongod processes that store data (mongod is the core process of MongoDB data). Typically, each shard enables multiple services to improve service availability. These service/mongod processes form a replica set in the shard.

Chunks: A Chunk is a data range from a special set. (collection, minKey, and maxKey) describes a chunk, which is between the minKey and maxKey ranges. For example, the maxsize of chunks is 100 mb. If a file reaches or exceeds this range, it is split into two new chunks. When a shard has an excessive amount of data, chunks will be migrated to other shards. Similarly, chunks can be migrated to other shards.

Config Servers: The Config server stores the metadata information of the cluster, including the basic information and chunk information of each server and shard. The Config server stores the chunk information. Each config server copies the complete chunk information.

Next, let's take a look at the test environment information to be configured:

Simulate two shard services and one config service, all of which run on the 10.0.4.85 machine, but with different ports
Shard1: 27020
Shard2: 27021
Configuration: 27022
Default port 27017 used when Mongos is started

Create the following folders on disks C, D, and E:

Mongodb \ bin

Mongodb \ db

 

Then run the CMD command to open the mongd file in the corresponding folder in sequence:

C: \ mongodb \ bin \ mongod -- dbpath c: \ mongodb \ db \ -- port 27020

D: \ mongodb \ bin \ mongod -- dbpath d: \ mongodb \ db \ -- port 27021

E: \ mongodb \ bin \ mongod-- Configsvr-- Dbpath e: \ mongodb \ db \ -- port 27022 (Note: config configuration server)

 

When mongos is started, port 27017 is enabled by default.

E: \ mongodb \ bin \ mongos -- configdb 10.0.4.85: 27022

 

Then open mongo:

E: \ mongodb \ bin> mongo press enter (sometimes adding a port causes the following addshard command to fail)

> Use admin
Switched to db admin
> Db. runCommand ({addshard: "10.0.4.85: 27020", allowLocal: 1, maxSize: 2, minKey: 1, maxKey: 10 })

-- Add a sharding. The maxsize unit is M. Here, a small value is set to only demonstrate the sharding effect.

{"ShardAdded": "shard0000", "OK": 1}
> Db. runCommand ({addshard: "10.0.4.85: 27021", allowLocal: 1, minKey: 1000 })
{"ShardAdded": "shard0001", "OK": 1}

Note: To remove sharding, use the following method:

Db. runCommand ({removeshard: "localhost: 10000 "});

 

> Db. runCommand ({listshards: 1}); view the shard node list

{
"Shards ":[
{
"_ Id": "shard0000 ",
"Host": "10.0.4.85: 27020"
},
{
"_ Id": "shard0001 ",
"Host": "10.0.4.85: 27021"
}
],
"OK": 1
}

 

 

Next, create the corresponding database and set its "sharding" to create the automatic sharding database user001:

> Config = connect ("10.0.4.85: 27022 ")
> Config = config. getSisterDB ("config ")
> Dnt_mongodb = db. getSisterDB ("dnt_mongodb ");
Dnt_mongodb
> Db. runCommand ({enablesharding: "dnt_mongodb "})
{"OK": 1} Note: Once a database is enabled, mongos places different datasets in the database on different shards. Unless the dataset is split (which will be set below), all data in a dataset will be placed on one partition.

> Db. printShardingStatus ();

--- Sharding Status ---
Sharding version: {"_ id": 1, "version": 3}
Shards:
{"_ Id": "shard0000", "host": "10.0.4.85: 27020 "}
{"_ Id": "shard0001", "host": "10.0.4.85: 27021 "}
Databases:
{"_ Id": "admin", "partitioned": false, "primary": "config "}
{"_ Id": "dnt_mongodb", "partitioned": true, "primary": "shard0000 "}

 

> Db. runCommand ({shardcollection: "dnt_mongodb.posts1", key: {_ id: 1}, unique: true}) {"collectionsharded": "dnt_mongodb.posts1", "OK": 1}

-- Use the shardcollection command to separate datasets. The key automatically generates [must be a unique index].

To perform GridFS sharding, perform the following settings:
Db. runCommand ({shardcollection: "dnt_mongodb.attach_gfstream.chunks", key: {files_id: 1 }})
{"OK": 1}, for more information, see http://eshilin.blog.163.com/blog/static/13288033020106215227346/

 

> Db. printShardingStatus ()

--- Sharding Status ---
Sharding version: {"_ id": 1, "version": 3}
Shards:
{"_ Id": "shard0000", "host": "localhost: 27020 "}
{"_ Id": "shard0001", "host": "localhost: 27021 "}
Databases:
{"_ Id": "admin", "partitioned": false, "primary": "config "}
{"_ Id": "user001", "partitioned": true, "primary": "shard0000 "}
Dnt_mongodb.posts1e chunks:
{"Name": {$ minKey: 1 }}-- >>{ "name": {$ maxKey:
1 }}on: shard0000 {"t": 1000, "I": 0

 

Next I will use a tool to import data to the posts1 TABLE OF THE dnt_mongodb database in batches, which is about 0.16 million pieces of data. Mongos displays the following information during the import process:

Tue Sep 07 12:13:15 [conn14] autosplitting dnt_mongodb.posts1 size: 47273960 shard: ns: dnt_mongodb.posts1 at: shard0000: 10.0.4.85: 27020 lastmod: 1 | 0 min :{_ id: MinKey} max: {_ id: MaxKey} on: {_ id: 19} (splitThreshold 47185920)
Tue Sep 07 12:13:15 [conn14] config change: {_ id: "4_85-2010-09-07T04: 13: 15-0", server: "4_85", time: new Date (1283832795994), what: "split", ns: "dnt_mongodb.posts1", details: {before: {min: {_ id: MinKey}, max: {_ id: MaxKey }}, left: {min: {_ id: MinKey}, max: {_ id: 19 }}, right: {min: {_ id: 19}, max: {_ id: MaxKey }}}}
Tue Sep 07 12:13:16 [conn14] moving chunk (auto): ns: dnt_mongodb.posts1 at: shard0000: 10.0.4.85: 27020 lastmod: 1 | 1 min :{_ id: MinKey} max: {_ id: 19} to: shard0001: 10.0.4.85: 27021 # objects: 0
Tue Sep 07 12:13:16 [conn14] moving chunk ns: dnt_mongodb.posts1 moving (ns: dnt_mongodb.posts1 at: shard0000: 10.0.4.85: 27020 lastmod: 1 | 1 min :{_ id: MinKey} max: {_ id: 19}) shard0000: 10.0.4.85: 27020-> shard0001: 10.0.4.85: 27021
Tue Sep 07 12:13:23 [WriteBackListener] ~ ScopedDBConnection: _ conn! = Null
Tue Sep 07 12:13:23 [WriteBackListener] ERROR: splitifshocould failed: ns: dnt_mongodb.posts1 findOne has stale config
Tue Sep 07 12:13:28 [WriteBackListener] autosplitting dnt_mongodb.posts1 size: 54106804 shard: ns: shard at: shard0000: 10.0.4.85: 27020 lastmod: 2 | 1 min: {_ id: 19} max: {_ id: MaxKey} on: {_ id: 71452} (splitThreshold 47185920)
Tue Sep 07 12:13:28 [WriteBackListener] config change: {_ id: "4_85-2010-09-07T04: 13: 28-1", server: "4_85", time: new Date (1283832808738), what: "split", ns: "dnt_mongodb.posts1", details: {before: {min: {_ id: 19}, max: {_ id: MaxKey }}, left: {min: {_ id: 19}, max: {_ id: 71452 }}, right: {min: {_ id: 71452}, max: {_ id: MaxKey }}}}

 

After automatic sharding, you can use mongo to check the result:> use dnt_mongodb
Switched to db dnt_mongodb
> Show collections
Posts1
System. indexes
> Db. posts1.stats () {
"Sharded": true,
"Ns": "dnt_mongodb.posts1 ",
"Count": 161531,
"Size": 195882316,
"AvgObjSize": 1212.6608267143768,
"StorageSize": 231467776,
"Nindexes": 1,
"Nchunks": 5,
"Shards ":{
"Shard0000 ":{
"Ns": "dnt_mongodb.posts1 ",
"Count": 62434,
"Size": 54525632,
"AvgObjSize": 873.3323509626165,
"StorageSize": 65217024,
"NumExtents": 10,
"Nindexes": 1,
"LastExtentSize": 17394176,
"PaddingFactor": 1,
"Flags": 1,
"TotalIndexSize": 2179072,
"IndexSizes ":{
"_ Id _": 2179072
},
"OK": 1
},
"Shard0001 ":{
"Ns": "dnt_mongodb.posts1 ",
"Count": 99097,
"Size": 141356684,
"AvgObjSize": 1426.4476623913943,
"StorageSize": 166250752,
"NumExtents": 12,
"Nindexes": 1,
"LastExtentSize": 37473024,
"PaddingFactor": 1,
"Flags": 1,
"TotalIndexSize": 3424256,
"IndexSizes ":{
"_ Id _": 3424256
},
"OK": 1
}
},
"OK": 1
}

 

Through the above results, 0.16 million records can be evenly distributed to two sharding instances, of which 62434 are shard0000 and 99097 are shard0001. Next, let's take a look at the distribution of the two sharding-Chunks (the error message in the figure is 'incorrect input string format', mainly because the runtime environment is different from the Environment used by the Compilation Program. One is 64, one is a 32-bit system ):

 

We can see that the data is automatically split by intervals, a bit like the SQL Server Data Partition Table, but this is automatically completed (currently I have not found a way to manually specify the upper and lower limits of the range, if you know about TX, you can tell me ). Of course, in the test in this article, there are five chunks, four of which are located at shard0001. This situation can change during each test, including the number of records allocated by two sharding instances. In addition, a folder will be generated on the shard0000 before and after mongodb's moving process, which contains some bson files, such as the name (Table + date ):

Post-cleanup.2010-09-07T04-13-31.1.bson

This file mainly contains some database, table structure and related records. I think it should be used for data recovery and backup.

 

Okay. Today's content is here first.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.