MongoDB sharding solution and chunks block and partition Key Analysis

Last Update:2018-12-03 Source: Internet

Author: User

Tags mongodb sharding mongo shell

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

--------------------------------------------------------------------------------

0. Create three shard servers // create a database, log file [root @ localhost ~] # Mkdir-P/data/shard/S0 [root @ localhost ~] # Mkdir-P/data/shard/S1 [root @ localhost ~] # Mkdir-P/data/shard/s2 [root @ localhost ~] # Mkdir-P/data/shard/logshard Sever 1: 38010 shard sever 2: 38011 shard sever 3: 38012 config server: 40000 route process: 500001, create three databases as three shards. This is for easy log reading, directly view/apps/Mongo/bin/mongod -- dbpath =/data/shard/S0 -- shardsvr -- Port 38010 -- directoryperdb -- rest/apps/Mongo/bin/mongod -- dbpath =/data/shard/S1 -- shardsvr -- Port 38011 -- directoryperdb -- rest/apps/Mongo/bin/mongod -- dbpath =/data/shard/s2 -- shardsvr -- Port 38012 -- directoryperdb -- rest (-- directoryperdb, the files in each database will be placed in a separate folder) to run the program in the background, use the following: /apps/Mongo/bin/mongod -- dbpath =/data/shard/S0 -- shardsvr -- Port 38010 -- directoryperdb -- logpath =/data/shard/log/s0.log -- logappend -- fork -- rest/apps/Mongo/bin/mongod -- dbpath =/data/shard/S1 -- shardsvr -- Port 38011 -- directoryperdb -- logpath =/data/shard/log/s1.log -- logappend -- fork -- rest/apps/Mongo/bin/mongod -- dbpath =/data/shard/s2 -- shardsvr -- Port 38012 -- directoryperdb -- logpath =/data/shard/log/s2.log -- logappend -- fork -- rest ---------------------------------------------------------------------------------- 1, start config server, config server: 40000 // create a database, log File mkdir-P/data/shard/config/apps/Mongo/bin/mongod -- dbpath/data/shard/config -- configsvr -- Port 40000 -- directoryperdb -- the official rest operation must be performed in the background. run the following command: /apps/Mongo/bin/mongod -- dbpath/data/shard/config -- configsvr -- Port 40000 -- logpath =/data/shard/log/config. log -- fork -- directoryperdb -- rest --------------------------------------------------------------------------------2. Start route process, route process: 50000/Apps/Mongo/bin/mongos -- Port 50000 -- configdb 127.0.0.1: 40000 -- chunksize 1 -- chunksize 1 (MB) specifies the minimum unit capacity of the shard. Here, 1 m is set, for ease of viewing the results, run the following command in the background:/apps/Mongo/bin/mongos -- Port 50000 -- configdb 127.0.0.1: 40000 -- chunksize 50 -- logpath =/data/shard/log/route. log -- fork --------------------------------------------------------------------------------: At this time, no slices are added to the sharding cluster. You can see that the admin database of the sharding cluster is stored in the config server without sharding!Bytes -----------------------------------------------------------------------------------3. Connect to mongos to configure shardingLog on to route process/apps/Mongo/bin/Mongo -- Port 50000 use admin with Mongo shell (remember to run this command and switch to the admin database)// Add a shard node. Each Shard is a replica set.DB. runcommand ({addshard: "127.0.0.1: 38010", allowlocal: true}) dB. runcommand ({addshard: "127.0.0.1: 38011", allowlocal: true}) dB. runcommand ({addshard: "127.0.0.1: 38012", allowlocal: true })

Allowlocal: true: Only fragments are configured locally during development. This is not allowed during production -----------------------------------------------------------------------------------: The configuration information is added to the shards set of the config database of the config server:------------------------------------------------------------------------------- 4. Configure the database mydb and enable sharding.
Use admindb. runcommand ({enablesharding: "mydb, the configuration information is stored in the databases set of the config database of the configuration server! ------------------------------------------------------------------------------- 5. Set the set to be sharded: Enable the set users to shard with the piece key _ id.
DB. runcommand ({shardcollection: "mydb. users ", key: {_ ID: 1}) after the users set of mydb is configured to enable sharding, the configuration information is stored in the collections set of the config database of the configuration server!

------------------------------------------------------------------------------ You can see that the Routing Server does not store configuration information (dbpath is also the reason, but the configuration on the server will be cached !) --------------------------------------------------------------------------------6. sharding cluster data insertion test (incremental partition Key Mode)Use admindb. runcommand ({shardcollection: "mydb. users ", key: {_ ID: 1}) Use mydb to insert 0.6 million data records for (VAR I = 1; I <= 600000; I ++) dB. users. insert ({age: I, name: "Mary", ADDR: "Guangzhou", Country: "China"}) and wait a few minutes, the data of users in the set is evenly distributed to each shard. The Shard completes 1. First, check each trunk. The test requires that 1 MB/trunk be set. Each trunk records the range of the partition key, there is also a name chart of the part where the part is located: 0.21 million documents have been inserted, mainly in the shard0002 insert graph: 0.28 million documents have been inserted, and then to the main insert graph to shard0001: 0.4 million documents have been inserted, after that, we will insert a graph to shard0002: 0.61 million documents have been inserted ?, Insert data to shard0000: All the 0.6 million documents are inserted normally. The data is not evenly distributed and the data is split within five minutes. After several hours, the number of chunks for each shard server is the same, which proves that the mongos Routing Server performs Load Balancing for each shard in real time! Because the sharding key is objectid, The objectid here is like an incremental partition key. during insertion, it cannot be evenly routed to each partition. In this case, the write load is uneven, the 0.6 million documents are not evenly distributed in each part after being written,Therefore, the mongos route performs load balancing on each slice in the background until the number of chunks blocks for each slice is equal!Bytes ---------------------------------------------------------------------------------------------B: insert 0.6 million incremental keys to the sharding cluster with 42 shards and data:The above is the time from 3 minutes, just to the time when 0.6 million inserts are completed! After 10 minutes of stability:The first 0.37 million inserts are mainly concentrated on shard0001, and 0.18 million and 50 thousand are inserted to shard0002 and shard0000 respectively. The insertion time is uneven!Because this is an incremental key, operations are usually concentrated on a block and a piece!
--------------------------------------------------------------------------------7. sharding cluster data insertion test (random chip Key Mode)Use admindb. runcommand ({shardcollection: "mydb. cmdles", key: {RAM: 1 }})

Use mydb to insert 0.6 million data records for (VAR I = 1; I <= 600000; I ++) dB. les. insert ({name: "irelandken _ Zhen", age: I, ADDR: "Guangzhou", Country: "China", Ram: math. random ()}) Here we have a random decimal number [0-1) as the chip key. 3 minutes later: very uneven! 15 minutes later: Even! How can this problem still occur when we use the random chip key ?? The data just inserted is still distributed so unevenly! In the beginning, there were no chunks in the configuration server. The first infinite chunks were located on the substrate, and then the chunks were split while inserting data into the database, at this time, the split blocks are mainly distributed on the substrate shard000, so the insertion Operation is concentrated on the substrate! So now there are 83 chunks, and then insert 0.6 million more chunks? I think this time we will insert 0.6 million more data records for (VAR I = 1; I <= 600000; I ++) dB. les. insert ({name: "Zhen", age: I, ADDR: "Zhuhai", Country: "China", Ram: math. random ()}) 0.6 million documents just inserted: Look, you guessed it, and finally saw the effect of the random chip key !! The chunks on the three parts are even, because the 0.6 million file piece keys are also random keys that will hit the chunks of the three parts evenly! The idea of the mongos Routing Server is that if the load on the Shard is not balanced, the shards will be adjusted until the number of chunks in each shard is roughly equal!
Now, we will clear this set and test the document "use mydbdb" for inserting 0.6 million random keys again. les. remove () clears this set (as long as the set is not deleted), the existing 153 chunks will not be deleted!Insert 153 random keys to an empty set of 0.6 million chunksFor (VAR I = 1; I <= 600000; I ++) dB. les. insert ({name: "Jack", age: I, ADDR: "Beijing", Country: "China", Ram: math. random ()})

It can be seen that the entire Insert Process is quite even! The number of blocks only adds two, that is, almost all of the 0.6 million documents "hit" existing blocks!Bytes -------------------------------------------------------------------------------------------8. Summary of Testing 6 and 7
1: If the chunks block information of the new set does not exist in the configuration server at the beginning, whether it is an incremental key or a random key,The data insertion process is not even, and may even be concentrated on a machine. Then mongos will execute Server Load balancer.
2: The mongos route performs load balancing on each slice in the background until the number of chunks blocks for each slice is equal!
3: For the sharding cluster of Server Load balancer (the number of chunks in each slice is equal), the operation on random keys is very effective, and the entire process is quite even.At this time, the incremental key operation will still cause serious load imbalance!
------------------------------------------------------------------------------ 6. Remove the shard server and recycle the data dB. runcommand ({"removeshard": "127.0.0.1: 38010"}) Because 127.0.0.1: 38010 is the primary chip of the database test and mydb "primary": "shard0000, therefore, you need to manually move the database substrate such as/* 1 */{"_ id": "test", "partitioned": True, "primary ": "shard0000"}/* 2 */{"_ id": "mydb", "partitioned": True, "primary": "shard0000"} manually modify the substrate of the database test, change to 127.0.0.1: 3 8011 run: mongos> dB. runcommand ({"moveprimary": "test", "to": "127.0.0.1: 38011"}) {"primary": "shard0001: 127.0.0.1: 38011", "OK ": 1} manually modify the database test substrate and change it to 127.0.0.1: 38011. Execute: mongos> dB. runcommand ({"moveprimary": "mydb", "to": "127.0.0.1: 38011"}) {"primary": "shard0001: 127.0.0.1: 38011", "OK ": 1} after all the dependencies on the slices to be deleted are deleted, run the following command again: DB. runcommand ({"removeshard": "127.0.0.1: 38010"}) mongos> dB. runcommand ({"removes Hard ":" 127.0.0.1: 38010 "}) {" MSG ":" removeshard completed successfully "," State ":" completed "," shard ":" shard0000 ", "OK": 1} mongos recycles the data of the film, averages the data to other pieces, and then removes the film from the sharding cluster! You can see the progress in the middle of this sentence after multiple calls! This operation is completely transparent to users and does not require downtime! ------------------------------------------------------------------------------ 7. added the shard server to connect to mongos because there are shards of the database test and mydb collections in each shard, the newly added mongod cannot contain the same database (if the database to be added contains mydb and contains a certain amount of data, this data must be deleted) the newly added shard server to mongos must not contain the same database as other slices! Use admindb. runcommand ({addshard: "127.0.0.1: 38010", allowlocal: true}) after a new slice is added, mongos performs Load Balancing again to evenly distribute data to each slice! Slave from the log of each part, only the node shard server and the route process are "busy", and the config server seems to be just synchronizing the configuration of the route process, because the routing route process does not persist data, the config server stores the configuration and feels that the routing route process is a facade, and the external server seems to be a database.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More