MongoDB is a document-oriented NoSQL database developed by the 10gen team. More than a year ago, MongoDB has been more and more large-scale website application to the production environment, the more famous have Foursquare, bit.ly, SourceForge, boxed and so on. MongoDB provides auto-sharding functionality that allows users to easily build a distributed MongoDB cluster with a simple configuration.
MongoDB's auto-sharding can do:
· When the load and data distribution is unbalanced between sharding, the automatic rebalancing
· Easy and easy to add and remove nodes
· Automatic failover (auto failover)
· Expandable to thousands of nodes
A MongoDB sharding consists of three parts:
1. Shards
Shard is the Shard that stores the actual data, each shard can be a mongod instance, or it can be a set of replica set of Mongod instances. In order to implement each shard internal Auto-failover,mongodb the official recommendation for each shard is a set of replica set.
2. Config Servers
In order to split a collection into multiple chunk, stored in multiple shard, you need to specify a shard key for the collection. For example {name:1}, {_id:1}, {lastname:1, firstname:1}, and so on. Shard Key determines which chunk the record belongs to, for example, when 1 < shard Key < 100 is a chunk, the chunk is saved on shard1. Config servers is used to store: configuration information for all shard nodes, shard key range for each chunk, chunk distribution in each Shard, collection configuration for all DB and sharding in the cluster.
3. Routing Process
MongoDB's binary package has a MONGOs program that is used to make the routing process for the MongoDB cluster. It is equivalent to a transparent proxy, receives a query or update request from the client, and then asks Config servers which shard to query or save the record, and then connect the corresponding shard to do the operation, and finally return the results to the client. The client simply sends the query or update request that was originally sent to mongod to routing Process without worrying about which shard the record is stored on.
Next I'll show you how to build a simple MongoDB cluster to test MongoDB's auto-sharding functionality.
This MongoDB cluster will contain two shards, a config server and a routing Process. We will use MongoDB 1.6.5来 To do this test, for: http://www.mongodb.org/downloads
First, we create a data directory for two shards and one config server:
sudo mkdir-p/data0/mongo/shard1/data0/mongo/shard2/data0/mongo/config
Then we start with two mongod processes in turn as Shard, a mongod process as Config Server, and a mongos process as routing processes:
sudo mongod--port 27017--fork--logpath/var/log/mongo_shard1.log--dbpath/data0/mongo/shard1--shardsvr
sudo mongod--port 27018--fork--logpath/var/log/mongo_shard2.log--dbpath/data0/mongo/shard2--shardsvr
sudo mongod--port 27217--fork--logpath/var/log/mongo_config.log--dbpath/data0/mongo/config--configsvr
sudo mongos--port 27417--fork--logpath/var/log/mongos.log--configdb 127.0.0.1:27217--chunksize 1
MONGOs startup parameters, chunksize This is used to specify the size of the chunk, the unit is MB, the default size is 200MB, in order to facilitate testing sharding effect, we specify Chunksize as 1MB.
Next, we use the MONGO shell to log in to MONGOs and add the Shard node:
MONGO--port 27417
MongoDB Shell version:1.6.5
Connecting To:127.0.0.1:27417/test
> Use admin;
Switched to DB admin
> Db.runcommand ({addshard: "127.0.0.1:27017"})
{"shardadded": "shard0000", "OK": 1}
> Db.runcommand ({addshard: "127.0.0.1:27018"})
{"shardadded": "shard0001", "OK": 1}
Here we enable sharding for database "foo" and set the Shard key of Collection "col" to "{_id:1}" to test the sharding function:
> Db.runcommand ({enablesharding: ' foo '});
{"OK": 1}
> Db.runcommand ({shardcollection: "Foo.col", Key:{_id:1}});
{"collectionsharded": "Foo.col", "OK": 1}
In order to test the balance effect of sharding, I have inserted about 200M of data in succession, using db.stats () to query the distribution of data during the insertion process. It was found that all trunks were stored on shard0000 when the amount of data was smaller, but when they continued to be inserted, the data began to be evenly distributed, and MONGOs rebalance the data between multiple shard. When the insertion data reaches 200M, at the end of the insertion, there is about 135M data on the shard0000, and about 65M data on the shard0001, but after a while, the amount of data on the shard0000 is reduced to 115M, The amount of data on the shard0001 has reached 85M.
MongoDB auto-sharding function Since the beginning of the 1.6 version of the Production-ready, so far more than half a year, most companies are still watching, do not dare to use the production environment, so at present there is not much relevant information on the Internet can be consulted. In the future, we will continue to share more experience in MongoDB use process.
MongoDB auto-sharding (Automatic sharding) Introduction