MongoDB BASICS (9) sharding and mongodb Basics
Sharding stores data on multiple servers. Mongodb uses sharding to support deployment of very large datasets and high throughput operations. The capabilities of a single server are limited in various aspects, such as CPU, IO, RAM, and storage space. To solve the expansion problem, the database provides two methods: vertical expansion and sharding.
Vertical Scaling:CPU, RAM, and storage resources are added, which is also limited by hardware devices. Some cloud-based suppliers also require users to use small systems.
Sharding (horizontal scaling ):Divide a dataset and distribute the data to multiple servers. Each shard (chard) is an independent database. These shards constitute a logical database. (Similar to Windows Dynamic Disk Striping)
The sharding cluster structure in Mongodb is as follows:
A sharding cluster has three components: shards, query routers, and configservers.
Shards:Fragments are used to store data and ensure high availability and data consistency. In a sharded cluster, each shard is a replica set.
Query routers:Query a route, or mongos instance. The client application directly operates the fragment interface. Query route processing and locating operations to shards and return relevant data to the client. A sharding cluster contains multiple query routes to divide the client's request pressure.
Configservers:Configure the server to store metadata in the cluster. The data includes the ing of cluster data to fragments. Query routes use these metadata to locate specific fragments. A shared cluster requires three configuration servers.
Note: For testing, you can configure one config servers.
Mongodb distributes data or fragments at the collection level. Sharding divides data of a set by shardkey. The shard key can be an index key column or a composite key column in each document. Mongodb divides the shard key value into chunk and distributes the chunk evenly to shards. Mongodb uses a range partition or hash partition. (For more information, see shardkey)
Shard cluster deployment:
Mongodb server: (Red HatEnterprise Linux 6 64-bit + Mongodb 3.0.2)
192.168.1.11 mongodb11.kk.net 21017
192.168.1.12 javasdb12.kk.net 21018
192.168.1.13 mongodb13.kk.net 21019
192.168.1.14 mongodb14.kk.net 21020
The test structure is as follows:
Note: Before configuration, make sure that the Members you want to join the cluster can connect to each other.
[1. Configure config servers] (on 192.168.1.14 server)
Configuration server (Config serversStore cluster metadata, so Configure the server first. To configure the server, you must use the-configsvr parameter to start the mongod service. If there are multiple configuration servers, each configuration server fully stores the cluster elements.
1. 1 create the database directory configdb:
[root@mongodb14 ~]# mkdir /var/lib/mongo/configdb[root@mongodb14 ~]# chown mongod:mongod /var/lib/mongo/configdb/
1.2. Configure the startup parameter file:
[root@mongodb14 ~]# vi /etc/mongod.conf
192.168.1.14 |
Logpath =/var/log/mongodb/mongod. log Pidfilepath =/var/run/mongodb/mongod. pid Logappend = true Fork = true Port = 27020 Bind_ip = 192.168.1.14 Dbpath =/var/lib/mongo/configdb Configsvr = true |
1.3. Restart the mongod service:
[root@mongodb14 ~]# service mongod restart
[2. Configure the router] (on the 192.168.1.11 server)
2. 1. Enable the mongos (MongoDB Shard) instance and connect to the config servers: (for more information, see:Mongos)
# Use mongos to connect to config servers and specify the local port; otherwise, the default value is 27017 # port 27017 of the current server mongod, therefore, set the mongos port to 27016 # mongo -- host
In the actual environment, if multipleConfig servers, Mongos can be specified at the same time.
Mongos -- configdb mongodb14.kk.net: 27020, mongodb15.kk.net: 27020, mongodb16.kk.net: 27020 ......
[3. Add a shard member to the cluster] (add a sharding set with IP addresses 11, 12, and 13. Take 192.168.1.11 as an example)
3.1. Configure the startup parameter file:
[root@redhat11 ~]# vi /etc/mongod.conf
192.168.1.11 |
192.168.1.12 |
192.168.1.13 |
Logpath =/var/log/mongodb/mongod. log Pidfilepath =/var/run/mongodb/mongod. pid Logappend = true Fork = true Port = 27017 Bind_ip = 192.168.1.11 Dbpath =/var/lib/mongo Shardsvr = true |
Logpath =/var/log/mongodb/mongod. log Pidfilepath =/var/run/mongodb/mongod. pid Logappend = true Fork = true Port = 27018 Bind_ip = 192.168.1.12 Dbpath =/var/lib/mongo Shardsvr = true |
Logpath =/var/log/mongodb/mongod. log Pidfilepath =/var/run/mongodb/mongod. pid Logappend = true Fork = true Port = 27019 Bind_ip = 192.168.1.13 Dbpath =/var/lib/mongo Shardsvr = true |
3. 2. Restart the mongod service:
[root@mongodb11 ~]# service mongod restart
3.3 Add each shard member to the mongos instance (remove or delete existing user data before adding ):
[root@mongodb11 ~]# mongo 192.168.1.11:27016mongos> sh.addShard("mongodb11.kk.net:27017")mongos> sh.addShard("mongodb12.kk.net:27018")mongos> sh.addShard("mongodb13.kk.net:27019")
3.4 added !~ Connect to mongos to view system information:
configsvr> show dbsconfigsvr> use configconfigsvr> show collectionsconfigsvr> configsvr> db.mongos.find(){ "_id" : "mongodb11.kk.net:27016", "ping" : ISODate("2015-05-23T11:16:47.624Z"), "up" : 1221, "waiting" : true, "mongoVersion" : "3.0.2" }configsvr> configsvr> db.shards.find(){ "_id" : "shard0000", "host" : "mongodb11.kk.net:27017" }{ "_id" : "shard0001", "host" : "mongodb12.kk.net:27018" }{ "_id" : "shard0002", "host" : "mongodb13.kk.net:27019" }configsvr> configsvr> db.databases.find(){ "_id" : "admin", "partitioned" : false, "primary" : "config" }{ "_id" : "mydb", "partitioned" : false, "primary" : "shard0000" }{ "_id" : "test", "partitioned" : false, "primary" : "shard0000" }
[4. PairEnable database sharding]
4.1 currently, you can connect to mongos to view the database or set shards (no shards ):
mongos> db.stats()mongos> db.tab.stats()
4.2 activate the sharding function for the database:
[Root @ mongodb11 ~] # Mongo 192.168.1.11: 27016 mongos> sh. enableSharding ("test") # Or [root @ mongodb11 ~] # Mongo 192.168.1.11: 27016 mongos> use adminmongos> db. runCommand ({enableSharding: "test "})
4.3 check the database partition and change partitioned to "true ".
configsvr> use configswitched to db configconfigsvr> db.databases.find(){ "_id" : "admin", "partitioned" : false, "primary" : "config" }{ "_id" : "mydb", "partitioned" : true, "primary" : "shard0000" }{ "_id" : "test", "partitioned" : true, "primary" : "shard0000" }
When database sharding is enabled, data is not separated, and collection needs to be sharded.
[5. PairEnable sharding for a set]
There are several issues to consider before enabling:
1. Select the key column as the shard key. (For more information, see:Considerations for Selecting Shard Keys)
2. If data already exists in the collectionAn index must be created for the key column of the shard key. If the set is empty, mongodb will create an index when the Set Partition (sh. shardCollection) is activated.
3. Set Partition FunctionsSh. shardCollection,
Sh. shardCollection ("<database>. <collection>", shard-key-pattern)
Mongos> sh. shardCollection ("test. tab", {"_ id": "hashed "})
Test:
for (var i=1; i<100000; i++) {db.kk.insert({"id": i, "myName" : "kk"+i, "myDate" : new Date()});}mongos> show collectionsmongos> db.kk.find()mongos> db.kk.createIndex({ "id": "hashed" })mongos> db.kk.getIndexes()mongos> sh.shardCollection("test.kk", { "id": "hashed" })mongos> db.stats()mongos> db.kk.stats()
Because the Data Partition takes time, you can view the data distribution later:
Total number of rows: 99999
mongos> db.kk.count()99999
mongos> db.printShardingStatus();--- Sharding Status --- sharding version: {"_id" : 1,"minCompatibleVersion" : 5,"currentVersion" : 6,"clusterId" : ObjectId("556023c02c2ebfdfbc8d39eb")} shards:{ "_id" : "shard0000", "host" : "mongodb11.kk.net:27017" }{ "_id" : "shard0001", "host" : "mongodb12.kk.net:27018" }{ "_id" : "shard0002", "host" : "mongodb13.kk.net:27019" } balancer:Currently enabled: yesCurrently running: noFailed balancer rounds in last 5 attempts: 0Migration Results for the last 24 hours: 1334 : Success2 : Failed with error 'could not acquire collection lock for test.kk to migrate chunk [{ : MinKey },{ : MaxKey }) :: caused by :: Lock for migrating chunk [{ : MinKey }, { : MaxKey }) in test.kk is taken.', from shard0000 to shard0001 databases:{ "_id" : "admin", "partitioned" : false, "primary" : "config" }{ "_id" : "mydb", "partitioned" : true, "primary" : "shard0000" }{ "_id" : "test", "partitioned" : true, "primary" : "shard0000" }test.kkshard key: { "id" : "hashed" }chunks:shard0000667shard0001667shard0002667too many chunks to print, use verbose if you want to force print{ "_id" : "events", "partitioned" : false, "primary" : "shard0002" }mongos>
Here's the chunks:
Shard0000 667
Shard0001 667
Shard0002 667
The original shard0000 is the largest, and shard0001 and shard0002 are 0. Eventually, the data will remain stable and will not change.
mongos> db.kk.stats(){"sharded" : true,"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.","userFlags" : 1,"capped" : false,"ns" : "test.kk","count" : 99999,"numExtents" : 19,"size" : 11199888,"storageSize" : 44871680,"totalIndexSize" : 10416224,"indexSizes" : {"_id_" : 4750256,"id_hashed" : 5665968},"avgObjSize" : 112,"nindexes" : 2,"nchunks" : 2001,"shards" : {"shard0000" : {"ns" : "test.kk","count" : 33500,"size" : 3752000,"avgObjSize" : 112,"numExtents" : 7,"storageSize" : 22507520,"lastExtentSize" : 11325440,"paddingFactor" : 1,"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.","userFlags" : 1,"capped" : false,"nindexes" : 2,"totalIndexSize" : 3605616,"indexSizes" : {"_id_" : 1913184,"id_hashed" : 1692432},"ok" : 1},"shard0001" : {"ns" : "test.kk","count" : 32852,"size" : 3679424,"avgObjSize" : 112,"numExtents" : 6,"storageSize" : 11182080,"lastExtentSize" : 8388608,"paddingFactor" : 1,"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.","userFlags" : 1,"capped" : false,"nindexes" : 2,"totalIndexSize" : 3343984,"indexSizes" : {"_id_" : 1389920,"id_hashed" : 1954064},"ok" : 1},"shard0002" : {"ns" : "test.kk","count" : 33647,"size" : 3768464,"avgObjSize" : 112,"numExtents" : 6,"storageSize" : 11182080,"lastExtentSize" : 8388608,"paddingFactor" : 1,"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.","userFlags" : 1,"capped" : false,"nindexes" : 2,"totalIndexSize" : 3466624,"indexSizes" : {"_id_" : 1447152,"id_hashed" : 2019472},"ok" : 1}},"ok" : 1}mongos>
Data Distribution of individual shards in the preceding section:
"Shard0000" "count": 33500
"Shard0001" "count": 32852
"Shard0002" count ": 33647
A total of 99999 rows are completely accurate, and the data distribution is also very average.
(Test data should be as much as possible, otherwise it will not be effective. At first, I tested a small amount of data, less than 1000 rows, but it was ineffective. I thought there was a problem and I had to wait for another two hours !~)
Reference: Sharding Introduction
(The steps in the official documentation are not very clear. It has been a long time. There are also some blogs on the Internet, which are just a Summary of the bloggers. For a new employee, the operations and operations are not detailed)