Reference: http://www.lanceyan.com/tech/arch/mongodb_shard1.html
I. Introduction of MONGODB shards
Early in the system, the amount of data will not cause too much problem, but as the volume of data continues to increase, the subsequent sooner or later there will be a machine hardware bottleneck problem. And MongoDB Main is the massive data architecture, he can not solve the huge amount of data how to do! No way! "Shard" uses this to solve this problem. Traditional database How to do massive data read and write? In fact, a word summed up: divide. It's clear that the following Taobao Yeu Xuqiang the architecture diagram mentioned in Infoq:
650) this.width=650; "title=" Clipboard.png "src=" http://s3.51cto.com/wyfs02/M01/75/22/wKioL1YzNV3QH_ Enaaerqxdop7k266.jpg "alt=" Wkiol1yznv3qh_enaaerqxdop7k266.jpg "/>
There is a tddl, is a Taobao data access layer component, his main role is SQL parsing, routing processing. Parses the currently accessed SQL based on the functionality of the application's request to determine which business database, which table accesses the query, and returns data results. Specific
650) this.width=650; "title=" 2.png "src=" Http://s3.51cto.com/wyfs02/M00/75/24/wKiom1YzNUGQUFeeAAFCk54sGLQ120.jpg " alt= "Wkiom1yznugqufeeaafck54sglq120.jpg"/>
Having said so much about the architecture of traditional databases, how does NoSQL do that? MySQL to do automatic extension needs to add a data access layer with the program to expand, database additions, deletions, backups need to be controlled by the program. One but the database of more nodes, to maintain is also very headache. But MongoDB all this through his own internal mechanism can be done! or see what mechanisms MongoDB implements for routing, sharding:
650) this.width=650; "title=" 3.png "src=" Http://s3.51cto.com/wyfs02/M02/75/24/wKiom1YzNXXC88VfAAFAlXxMgCk553.jpg " alt= "Wkiom1yznxxc88vfaafalxxmgck553.jpg"/>
You can see that there are four components: MONGOs, config server, shard, replica set.
MONGOs, the entrance to the database cluster request, all requests are coordinated through MONGOs, do not need to add a route selector in the application, MONGOs itself is a request distribution center, it is responsible for the corresponding data request request forwarded to the corresponding Shard server. In a production environment there is usually more mongos as the entrance to the request, preventing one of the other MongoDB requests from being hung out of operation.
Config server, as its name implies, configures the configuration of servers, storing all database meta-information (routing, sharding). The mongos itself does not have a physical storage Shard server and data routing information, but is cached in memory, and the configuration server actually stores the data. MONGOs the first boot or shutdown reboot will load configuration information from config server, and if configuration server information changes will notify all MONGOs to update their status, so that MONGOs can continue to route accurately. In a production environment there are usually multiple config server configuration servers, because it stores the metadata of the Shard route, this can not be lost! Even if you hang one of them, as long as there is inventory, the MongoDB cluster will not hang off.
Shard, this is the legendary shard. The above mentioned a machine even if the ability to have a large ceiling, as the military war, a person can drink blood bottle also can not spell the other one's division. As the saying goes Three stooges the top of Zhuge Liang, this time the strength of the team is highlighted. In the Internet, too, a common machine can do more than one machine to do, such as:
650) this.width=650; "title=" 4.png "src=" Http://s3.51cto.com/wyfs02/M01/75/22/wKioL1YzNcbSWYrHAADct7J8acA360.jpg " alt= "Wkiol1yzncbswyrhaadct7j8aca360.jpg"/>
A data sheet for a machine Collection1 stored 1T of data, too much pressure! After the 4 machines are divided, each machine is 256G, and the pressure that is concentrated on a single machine is apportioned. Perhaps someone asked a machine hard disk to increase a little bit, why should be divided into four machines? Do not think about storage space, the actual running database also has the hard disk read and write, network IO, CPU and memory bottlenecks. As long as the Shard rules are set up in the MongoDB cluster, the corresponding data operation request can be forwarded to the corresponding Shard machine automatically by MONGOs operation database. In the production environment, the chip key can be well set, this affects how to divide the data evenly into a plurality of fragmented machines, do not appear one of the machines divided by 1T, other machines do not have the situation, so it is better not to Shard!
Some noun explanations for shards:
Chip key: when setting up the Shard, you need to select one or several keys from the collection, the selected key as the basis for data splitting, this key is called the chip key or the composite sheet key. with the Tab key selected, MongoDB will not allow the insertion of documents without a slice key, but it allows different pieces of the document to be typed differently.
Block (chunk): Within a shard server, MongoDB still divides the data into chunks, and each chunk represents part of the data within the Shard server. The chunk will have the following two uses:
splitting: when the size of a chunk exceeds the chunk size in the configuration, the mongdb background process divides the chunk into smaller chunk, thus avoiding the situation where the chunk is too large.
balancing: in MongoDB, balancer is a background process that is responsible for chunk migration, thus balancing the load on each shard server.
Second, the environment construction
1. Introduction of experimental environment
650) this.width=650; "title=" 5.png "src=" Http://s3.51cto.com/wyfs02/M00/75/24/wKiom1YzNaeg1CPLAAF4qF7HfQE648.jpg " alt= "Wkiom1yznaeg1cplaaf4qf7hfqe648.jpg"/>
2. Synchronization Time
[[email protected] ~]# ntpdate 202.120.2.101[[email protected] ~]# ntpdate 202.120.2.101[[email protected] ~]# ntpdate 20 2.120.2.101[[email protected] ~]# ntpdate 202.120.2.101
3. Installing MongoDB
Note that the remaining three have been installed in front of MongoDB here is not listed.
[email protected] ~]# Yum localinstall-y mongo-10gen-2.4.14-mongodb_1.x86_64.rpm mongo-10gen-server-2.4.14-mongodb_ 1.x86_64.rpm
4. Configuring config server
[[email protected] ~]# vim /etc/mongod.conf dbpath=/data/mongodbconfigsvr = true[[email protected] ~]# mkdir -p /data/mongodb[[email protected] ~ ]# chown -r mongod.mongod /data/mongodb[[email protected] ~]# service Mongod startstarting mongod: about to fork child process, waiting until server is ready for connections.forked process: 130046all output going to: /var/log/mongo/mongod.log child process started successfully, parent exiting [Determine][[EMAIL PROTECTED] ~]# SS -TNLP |grep mongodLISTEN 0 128 *:28019 *:* users: (("Mongod", 130046,10) listen 0 128 * :27019 *:* users: (("Mongod", 130046,9)
5, configuration mongos
Note that the machine is already in use in the previous copy set experiment, and it needs to be removed in the configuration of the replica set experiment and configured as follows.
[[email protected] ~]# vim /etc/mongod.conf #dbpath =/data/mongodbconfigdb = 192.168.1.6:27019[[email protected] ~]# mongos -f /etc/mongod.confsat sep 26 21:50:57.282 warning: running with 1 config server should be done only for testing purposes and is not recommended for productionabout to fork child process, waiting until server is ready for connections.forked process: 7485all output going to : /var/log/mongo/mongod.logchild process started successfully, parent exiting[[ Email protected] ~]# ss -tnlp |grep mongoslisten 0 128 *:28017 *:* users: ((" MONGOs ", 7485,8)) listen 0 128 *:27017 *:* users: (("MONGOs", 7485,6)
6. Shard node Configuration and start
The Node1 and Node2 nodes are also used in previous experiments with replica sets, which need to be removed from the previously added configuration and configured as follows.
[[email protected] ~]# vim/etc/mongod.conf dbpath=/data/mongodb[[email protected] ~]# service Mongod start[[email Protec Ted] ~]# vim/etc/mongod.conf dbpath=/data/mongodb[[email protected] ~]# service Mongod start
7. Connecting into MONGOs
[[email protected] ~]# mongo --host 192.168.1.8mongodb shell version: 2.4.14connecting to: 192.168.1.8:27017/testmongos> sh.status ()--- Sharding Status --- sharding version: { "_id" : 1, "version" : 3, "Mincompatibleversion" : 3, "CurrentVersion" : 4, "Clusterid" : ObjectId ("5606a2c30b2c688bc754432d")} shards: databases: { "_id"  : "admin", "partitioned" : false, "primary" : "config" } Add Shard node: Mongos> sh.addshard ("192.168.1.9:27017") { "shardadded" : "shard0000", " OK " : 1 }mongos> sh.addshard (" 192.168.1.10:27017 ") { " shardadded " : " shard0001 ", " OK " : 1 }mongos> sh.status ()--- Sharding Status --- sharding version: { "_id" : 1, "version" : 3, "Mincompatibleversion"  : 3, "CurrentVersion" : 4, "Clusterid" : objectid ("5606a2c30b2c688bc754432d" )} shards: { "_id" : "shard0000", "host" : " 192.168.1.9:27017 " } { " _id " : " shard0001 ", " host " : " 192.168.1.10:27017 " } databases: { " _id " : " admin ", " Partitioned " : false, " primary " : " config " } { " _id " : "Test", "partitioned" : false, "primary" : "shard0000"  }
TestDB library Shard:
Mongos> sh.enablesharding ("TestDB") { "OK" : 1 }mongos> use Testdbswitched to db testdbmongos> sh.shardcollection ("Testdb.testcoll", {Age: 1, NAME: 1}) { "collectionsharded" : "Testdb.testcoll", "OK" : 1 }mongos > sh.status ()--- Sharding Status --- sharding version: { "_id " : 1, " version " : 3, " Mincompatibleversion " : 3, " CurrentVersion " : 4, " Clusterid " : objectid (" 5606a2c30b2c688bc754432d ")} shards: { "_id" : "shard0000", "host" : "192.168.1.9:27017" } { "_id" : "shard0001", "host" : "192.168.1.10:27017" } databases: { "_id" : "admin", "partitioned" : false, "PRIMARY" : "conFig " } { " _id " : " test ", " partitioned " : false, "PRIMARY" : "shard0000" } { "_id" : "TestDB", " Partitioned " : true, " primary " : " shard0001 " } testdb.testcoll shard key: { "Age" : 1, "Name" : 1 } chunks: shard0001 1 { "Age" : { " $minKey : 1 }, Name : { $minKey : 1 } } -- >> { "Age" : { "$maxKey" : 1 }, "Name" : { " $maxKey " : 1 } } on : shard0001 timestamp (1, 0)
Add data to see if it can be fragmented:
Mongos> for (i=1;i<=1000000;i++) Db.testcoll.insert ({Name: "User" +i,age: (i%150), Address: "Xinyang"})
Another machine to access the MONGOs, check the input storage status:
[[email protected] ~]# mongo --host 192.168.1.8mongodb shell version: 2.4.14connecting to: 192.168.1.8:27017/testmongos> use testdbswitched to db testdbmongos> sh.status ()--- Sharding Status --- sharding version : { "_id" : 1, "version" : 3, "Mincompatibleversion" : 3, "CurrentVersion" : 4, "Clusterid" : objectid ("5606a2c30b2c688bc754432d")} shards: { "_id" : "shard0000", "host" : " 192.168.1.9:27017 " } { " _id " : " shard0001 ", " host " : " 192.168.1.10:27017 " } databases: { " _id " : " admin ", " Partitioned " : false, " primary " : " config " } { " _id " : "Test", "partitioned" : false, "PRIMARY" : "shard0000" } { "_id" : " TestDB ", " partitioned " : true, " primary " : " shard0001 " } testdb.testcoll shard key: { "Age" : 1, "Name" &NBSP;: 1 } chunks: shard0000 1 shard0001 3 { "Age" : { "$minKey" : 1 }, "Name" : { "$minKey" : 1 } } -->> { "Age" : 1, "Name" : "User1" } on : shard0000 timestamp (2, 0) { ' age ' : 1, ' Name ' : ' User1 ' } -->> { ' age ' : 64, "Name" : "User92614" } on : shard0001 timestamp (2, &NBSP;2) { "age" : 64, "Name" : "User92614" } -->> { "Age" : 149, "Name" : "User899" } on : shard0001 timestamp (2, 3) { ' age ' : 149, ' Name ' : ' User899 ' } -->> { ' age ' : { "$maxKey" : 1 }, "Name" : { "$maxKey" : 1 } } on : shard0001 timestamp (1, 4) mongos> db.testcoll.find ({ age: {$gt: 140}}). Limit (3) { "_id" : objectid ("5606b672ef76b4db97a7d713"), "Name" : "User100041", "Age" : 141, "Address" : "Xinyang" }{ "_id" : objectid ("5606b672ef76b4db97a7d7a9"), "Name" : "User100191", "Age" &NBSP;: 141, "Address" : "Xinyang" }{ "_id" : objectid (" 5606b672ef76b4db97a7d83f "), " Name " : "User100341", "Age" : 141, "Address" : "Xinyang" }mongos> sh.getbalancerstate () True
Summarize:
Here is a simple demonstration of the MongoDB shard mechanism, in the experiment there are a few problems to solve:
Data loss occurs when a shard node fails, so the Shard node needs to make a replica set to increase its high availability.
The same config server also needs to be highly available.
The above is what I think of the problem point, however, there are still many problems to be solved, I have limited, here is not much to say, the following is a connection between the architecture is to use three machines to achieve the high availability of Shard, but in practice is not recommended to play, in all the distributed system is recommended or increased host use, If you have fewer machines, use a centralized solution. After all, relational database technology is very mature, and all of the NoSQL is the fire in the past few years, if you want to use or look at the company's research and development capabilities. Personally think that in the use of any new technology must not blindly use, need to see its shortcomings, rather than just look at the advantages, at the same time according to their own needs to choose software. Only the most suitable is not the best! I am using four virtual machines to do the experiment when the computer is 4G of memory, CPU is I5-5200U, but still the card to die, so the partners are still based on their own hardware in the play MongoDB bar, I am just a catalyst.
650) this.width=650; "title=" 6.png "src=" Http://s3.51cto.com/wyfs02/M00/75/24/wKiom1YzNcqjHGoaAAG5UMOnqlc205.jpg " alt= "Wkiom1yzncqjhgoaaag5umonqlc205.jpg"/>
This article is from the "Bread" blog, make sure to keep this source http://cuchadanfan.blog.51cto.com/9940284/1708143
MongoDB (v) Shard