MongoDB BASICS (9) sharding and mongodb Basics

Source: Internet
Author: User
Tags mongodb server database sharding

MongoDB BASICS (9) sharding and mongodb Basics


Sharding stores data on multiple servers. Mongodb uses sharding to support deployment of very large datasets and high throughput operations. The capabilities of a single server are limited in various aspects, such as CPU, IO, RAM, and storage space. To solve the expansion problem, the database provides two methods: vertical expansion and sharding.


Vertical Scaling:CPU, RAM, and storage resources are added, which is also limited by hardware devices. Some cloud-based suppliers also require users to use small systems.


Sharding (horizontal scaling ):Divide a dataset and distribute the data to multiple servers. Each shard (chard) is an independent database. These shards constitute a logical database. (Similar to Windows Dynamic Disk Striping)


The sharding cluster structure in Mongodb is as follows:


A sharding cluster has three components: shards, query routers, and configservers.

 

Shards:Fragments are used to store data and ensure high availability and data consistency. In a sharded cluster, each shard is a replica set.

 

Query routers:Query a route, or mongos instance. The client application directly operates the fragment interface. Query route processing and locating operations to shards and return relevant data to the client. A sharding cluster contains multiple query routes to divide the client's request pressure.

 

Configservers:Configure the server to store metadata in the cluster. The data includes the ing of cluster data to fragments. Query routes use these metadata to locate specific fragments. A shared cluster requires three configuration servers.


Note: For testing, you can configure one config servers.


Mongodb distributes data or fragments at the collection level. Sharding divides data of a set by shardkey. The shard key can be an index key column or a composite key column in each document. Mongodb divides the shard key value into chunk and distributes the chunk evenly to shards. Mongodb uses a range partition or hash partition. (For more information, see shardkey)


Shard cluster deployment:

Mongodb server: (Red HatEnterprise Linux 6 64-bit + Mongodb 3.0.2)

192.168.1.11 mongodb11.kk.net 21017

192.168.1.12 javasdb12.kk.net 21018

192.168.1.13 mongodb13.kk.net 21019

192.168.1.14 mongodb14.kk.net 21020


The test structure is as follows:


Note: Before configuration, make sure that the Members you want to join the cluster can connect to each other.



[1. Configure config servers] (on 192.168.1.14 server)

Configuration server (Config serversStore cluster metadata, so Configure the server first. To configure the server, you must use the-configsvr parameter to start the mongod service. If there are multiple configuration servers, each configuration server fully stores the cluster elements.

1. 1 create the database directory configdb:

[root@mongodb14 ~]# mkdir /var/lib/mongo/configdb[root@mongodb14 ~]# chown mongod:mongod /var/lib/mongo/configdb/

1.2. Configure the startup parameter file:

[root@mongodb14 ~]# vi /etc/mongod.conf

192.168.1.14

Logpath =/var/log/mongodb/mongod. log

Pidfilepath =/var/run/mongodb/mongod. pid

Logappend = true

Fork = true

Port = 27020

Bind_ip = 192.168.1.14

Dbpath =/var/lib/mongo/configdb

Configsvr = true


1.3. Restart the mongod service:

[root@mongodb14 ~]# service mongod restart


[2. Configure the router] (on the 192.168.1.11 server)

2. 1. Enable the mongos (MongoDB Shard) instance and connect to the config servers: (for more information, see:Mongos)

# Use mongos to connect to config servers and specify the local port; otherwise, the default value is 27017 # port 27017 of the current server mongod, therefore, set the mongos port to 27016 # mongo -- host 

In the actual environment, if multipleConfig servers, Mongos can be specified at the same time.

Mongos -- configdb mongodb14.kk.net: 27020, mongodb15.kk.net: 27020, mongodb16.kk.net: 27020 ......


[3. Add a shard member to the cluster] (add a sharding set with IP addresses 11, 12, and 13. Take 192.168.1.11 as an example)

3.1. Configure the startup parameter file:

[root@redhat11 ~]# vi /etc/mongod.conf

192.168.1.11

192.168.1.12

192.168.1.13

Logpath =/var/log/mongodb/mongod. log

Pidfilepath =/var/run/mongodb/mongod. pid

Logappend = true

Fork = true

Port = 27017

Bind_ip = 192.168.1.11

Dbpath =/var/lib/mongo

Shardsvr = true

 

Logpath =/var/log/mongodb/mongod. log

Pidfilepath =/var/run/mongodb/mongod. pid

Logappend = true

Fork = true

Port = 27018

Bind_ip = 192.168.1.12

Dbpath =/var/lib/mongo

Shardsvr = true

 

Logpath =/var/log/mongodb/mongod. log

Pidfilepath =/var/run/mongodb/mongod. pid

Logappend = true

Fork = true

Port = 27019

Bind_ip = 192.168.1.13

Dbpath =/var/lib/mongo

Shardsvr = true

 


3. 2. Restart the mongod service:

[root@mongodb11 ~]# service mongod restart

3.3 Add each shard member to the mongos instance (remove or delete existing user data before adding ):

[root@mongodb11 ~]# mongo 192.168.1.11:27016mongos> sh.addShard("mongodb11.kk.net:27017")mongos> sh.addShard("mongodb12.kk.net:27018")mongos> sh.addShard("mongodb13.kk.net:27019")

3.4 added !~ Connect to mongos to view system information:

configsvr> show dbsconfigsvr> use configconfigsvr> show collectionsconfigsvr> configsvr> db.mongos.find(){ "_id" : "mongodb11.kk.net:27016", "ping" : ISODate("2015-05-23T11:16:47.624Z"), "up" : 1221, "waiting" : true, "mongoVersion" : "3.0.2" }configsvr> configsvr> db.shards.find(){ "_id" : "shard0000", "host" : "mongodb11.kk.net:27017" }{ "_id" : "shard0001", "host" : "mongodb12.kk.net:27018" }{ "_id" : "shard0002", "host" : "mongodb13.kk.net:27019" }configsvr> configsvr> db.databases.find(){ "_id" : "admin", "partitioned" : false, "primary" : "config" }{ "_id" : "mydb", "partitioned" : false, "primary" : "shard0000" }{ "_id" : "test", "partitioned" : false, "primary" : "shard0000" }


[4. PairEnable database sharding]

4.1 currently, you can connect to mongos to view the database or set shards (no shards ):

mongos> db.stats()mongos> db.tab.stats()

4.2 activate the sharding function for the database:

[Root @ mongodb11 ~] # Mongo 192.168.1.11: 27016 mongos> sh. enableSharding ("test") # Or [root @ mongodb11 ~] # Mongo 192.168.1.11: 27016 mongos> use adminmongos> db. runCommand ({enableSharding: "test "})

4.3 check the database partition and change partitioned to "true ".

configsvr> use configswitched to db configconfigsvr> db.databases.find(){ "_id" : "admin", "partitioned" : false, "primary" : "config" }{ "_id" : "mydb", "partitioned" : true, "primary" : "shard0000" }{ "_id" : "test", "partitioned" : true, "primary" : "shard0000" }


When database sharding is enabled, data is not separated, and collection needs to be sharded.

[5. PairEnable sharding for a set]

There are several issues to consider before enabling:

1. Select the key column as the shard key. (For more information, see:Considerations for Selecting Shard Keys)

2. If data already exists in the collectionAn index must be created for the key column of the shard key. If the set is empty, mongodb will create an index when the Set Partition (sh. shardCollection) is activated.

3. Set Partition FunctionsSh. shardCollection,


Sh. shardCollection ("<database>. <collection>", shard-key-pattern)


Mongos> sh. shardCollection ("test. tab", {"_ id": "hashed "})


Test:

for (var i=1; i<100000; i++) {db.kk.insert({"id": i, "myName" : "kk"+i, "myDate" : new Date()});}mongos> show collectionsmongos> db.kk.find()mongos> db.kk.createIndex({ "id": "hashed" })mongos> db.kk.getIndexes()mongos> sh.shardCollection("test.kk", { "id": "hashed" })mongos> db.stats()mongos> db.kk.stats()


Because the Data Partition takes time, you can view the data distribution later:

Total number of rows: 99999

mongos> db.kk.count()99999

mongos> db.printShardingStatus();--- Sharding Status ---   sharding version: {"_id" : 1,"minCompatibleVersion" : 5,"currentVersion" : 6,"clusterId" : ObjectId("556023c02c2ebfdfbc8d39eb")}  shards:{  "_id" : "shard0000",  "host" : "mongodb11.kk.net:27017" }{  "_id" : "shard0001",  "host" : "mongodb12.kk.net:27018" }{  "_id" : "shard0002",  "host" : "mongodb13.kk.net:27019" }  balancer:Currently enabled:  yesCurrently running:  noFailed balancer rounds in last 5 attempts:  0Migration Results for the last 24 hours: 1334 : Success2 : Failed with error 'could not acquire collection lock for test.kk to migrate chunk [{ : MinKey },{ : MaxKey }) :: caused by :: Lock for migrating chunk [{ : MinKey }, { : MaxKey }) in test.kk is taken.', from shard0000 to shard0001  databases:{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }{  "_id" : "mydb",  "partitioned" : true,  "primary" : "shard0000" }{  "_id" : "test",  "partitioned" : true,  "primary" : "shard0000" }test.kkshard key: { "id" : "hashed" }chunks:shard0000667shard0001667shard0002667too many chunks to print, use verbose if you want to force print{  "_id" : "events",  "partitioned" : false,  "primary" : "shard0002" }mongos> 


Here's the chunks:
Shard0000 667
Shard0001 667
Shard0002 667

The original shard0000 is the largest, and shard0001 and shard0002 are 0. Eventually, the data will remain stable and will not change.

mongos> db.kk.stats(){"sharded" : true,"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.","userFlags" : 1,"capped" : false,"ns" : "test.kk","count" : 99999,"numExtents" : 19,"size" : 11199888,"storageSize" : 44871680,"totalIndexSize" : 10416224,"indexSizes" : {"_id_" : 4750256,"id_hashed" : 5665968},"avgObjSize" : 112,"nindexes" : 2,"nchunks" : 2001,"shards" : {"shard0000" : {"ns" : "test.kk","count" : 33500,"size" : 3752000,"avgObjSize" : 112,"numExtents" : 7,"storageSize" : 22507520,"lastExtentSize" : 11325440,"paddingFactor" : 1,"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.","userFlags" : 1,"capped" : false,"nindexes" : 2,"totalIndexSize" : 3605616,"indexSizes" : {"_id_" : 1913184,"id_hashed" : 1692432},"ok" : 1},"shard0001" : {"ns" : "test.kk","count" : 32852,"size" : 3679424,"avgObjSize" : 112,"numExtents" : 6,"storageSize" : 11182080,"lastExtentSize" : 8388608,"paddingFactor" : 1,"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.","userFlags" : 1,"capped" : false,"nindexes" : 2,"totalIndexSize" : 3343984,"indexSizes" : {"_id_" : 1389920,"id_hashed" : 1954064},"ok" : 1},"shard0002" : {"ns" : "test.kk","count" : 33647,"size" : 3768464,"avgObjSize" : 112,"numExtents" : 6,"storageSize" : 11182080,"lastExtentSize" : 8388608,"paddingFactor" : 1,"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.","userFlags" : 1,"capped" : false,"nindexes" : 2,"totalIndexSize" : 3466624,"indexSizes" : {"_id_" : 1447152,"id_hashed" : 2019472},"ok" : 1}},"ok" : 1}mongos> 


Data Distribution of individual shards in the preceding section:

"Shard0000" "count": 33500

"Shard0001" "count": 32852

"Shard0002" count ": 33647


A total of 99999 rows are completely accurate, and the data distribution is also very average.

(Test data should be as much as possible, otherwise it will not be effective. At first, I tested a small amount of data, less than 1000 rows, but it was ineffective. I thought there was a problem and I had to wait for another two hours !~)



Reference: Sharding Introduction

(The steps in the official documentation are not very clear. It has been a long time. There are also some blogs on the Internet, which are just a Summary of the bloggers. For a new employee, the operations and operations are not detailed)




Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.