What is a shard
High data volume and throughput of the database application will be more pressure on the performance of the single machine, large query volume will be a single CPU exhausted, large amount of data on the single-machine storage pressure, and eventually will exhaust the system's memory and transfer the pressure to disk IO.
MongoDB shards are a way to store data using multiple servers to support huge data stores and manipulate data. Shard Technology can meet the demand of large amount of MONGODB data, when a MongoDB server is not enough to store massive data or not enough to provide acceptable read and write throughput, we can divide the data on multiple servers, so that the database system can store and process more data.
MongoDB Shard Advantage
Fragmentation provides a way to cope with high throughput and large data volumes
- Using sharding reduces the number of requests that each shard needs to process, so the cluster can increase its storage capacity by horizontally scaling. For example, when inserting a piece of data, the app only needs to access the shards that store the data.
- Using shards reduces the data stored in each Shard village
The advantage of sharding is to provide a similar linear growth architecture, improve data availability, and improve the performance of large database query servers. Sharding technology is used when MongoDB single-point database server storage becomes a bottleneck, the performance of a single-point database server becomes a bottleneck, or a large application needs to be deployed to take advantage of memory. The composition of a MongoDB shard cluster
- Shard: A shard server for storing actual blocks of data, a Shard server role in a real-world production environment can be made up of several servers, one Peplica Set, to prevent a single point of failure of the host.
- Config server: Configures servers that store configuration information for the entire shard cluster, including chunk information.
- Routers: Front-end routing, where the client is connected and makes the entire cluster look like a single database, and front-end applications can be used transparently.
Environment preparation
ip:172.16.10.26 |
ip:172.16.10.27 |
ip:172.16.10.29 |
MONGOs (27017) |
MONGOs (27017) |
MONGOs (27017) |
Config (30000) |
Config (30000) |
Config (30000) |
Shard1 main node (40001) |
Shard1 Sub-node (40001) |
Shard1 quorum node (40001) |
SHARD2 quorum node (40002) |
Shard2 main node (40002) |
Shard2 Sub-node (40002) |
Shard1 Sub-node (40003) |
Shard1 quorum node (40003) |
Shard1 main node (40003) |
Deploying a MongoDB Shard cluster
The concept of cluster deployment, using three servers, respectively, to install the MongoDB database, each server to create five instances (MONGOs, configs, Shard1, Shard2, Shard3). Three instances of the same name on different servers are created as a replica set, including the primary node, the secondary node, and the quorum node, respectively. MONGOs does not need to create a replica set, Config does not need to specify the primary and quorum nodes, but to create a replica set. Three servers operate slightly differently, but most are repetitive, with exactly the same steps.
Install the MONGODB database installation support software and MongoDB
yum install openssl-devel -ytar zxf mongodb-linux-x86_64-rhel70-4.0.0.tgz -C /usr/localmv /usr/local/mongodb-linux-x86_64-rhel70-4.0.0 /usr/local/mongodb //解压即完成安装
Create a data store directory and a log store directory
The routing server does not store data, so you do not need to create a data store directory, just create config, shard1, Shaed2, Shard3, and you will need to give permissions after the log file creation is complete.
mkdir -p /data/mongodb/logs/mkdir /etc/mongodb/mkdir /data/mongodb/config/mkdir /data/mongodb/shard{1,2,3}touch /data/mongodb/logs/shard{1,2,3}.logtouch /data/mongodb/logs/mongos.logtouch /data/mongodb/logs/config.logchmod 777 /data/mongodb/logs/*.log
Create an administrative user, modify directory permissions
useradd -M -u 8000 -s /sbin/nologin mongochown -R mongo.mongo /usr/local/mongodbchown -R mongo.mongo /data/mongodb
Setting environment variables
echo "PATH=/usr/local/mongodb/bin:$PATH" >> /etc/profilesource /etc/profile
System memory Optimization
ulimit -n 25000ulimit -u 25000sysctl -w vm.zone_reclaim_mode=0echo never > /sys/kernel/mm/transparent_hugepage/enabledecho never > /sys/kernel/mm/transparent_hugepage/defrag //*注意*这些优化都是临时的,重启失效
Deployment Configuration Server Create configuration file
#vim /etc/mongodb/config.confpidfilepath = /data/mongodb//logs/config.pid //pid文件位置dbpath = /data/mongodb/config/ //数据文件存放位置logpath = /data/mongodb//logs/config.log //日志文件位置logappend = true bind_ip = 0.0.0.0 //监听地址port = 30000 //端口号fork = true replSet=configs //复制集名称configsvr = truemaxConns=20000 //最大连接数
Send a profile to a different server
scp /etc/mongodb/config.conf [email protected]:/etc/mongodb/scp /etc/mongodb/config.conf [email protected]:/etc/mongodb/
Start Config instance
mongod -f /etc/mongodb/config.conf //三台服务器操作一致
Configure a replication set (any action can be)
mongo --port 30000 //建议三台服务器都进入数据库,方便查看角色变更config={_id:"configs",members:[{_id:0,host:"172.16.10.26:30000"},{_id:1,host:"172.16.10.27:30000"},{_id:2,host:"172.16.10.29:30000"}]} //创建复制集rs.initiate(config) //初始化复制集
To deploy a Shard1 shard server to create a configuration file
#vim /etc/mongodb/shard1.confpidfilepath = /data/mongodb//logs/shard1.piddbpath = /data/mongodb/shard1/logpath = /data/mongodb//logs/shard1.loglogappend = truejournal = truequiet = truebind_ip = 0.0.0.0port = 40001fork = truereplSet=shard1shardsvr = truemaxConns=20000
Send a profile to a different server
scp /etc/mongodb/shard1.conf [email protected]:/etc/mongodb/scp /etc/mongodb/shard1.conf [email protected]:/etc/mongodb/
Launch Shard1 Instance
mongod -f /etc/mongodb/shard1.conf //三台服务器操作一致
Configuring SHARD1 Replication Sets
In the creation of the Shard Shard server, it is important to note that it is not successful to create on either server, if you choose to create a copy assembly error on the server that is pre-set as the quorum node. As an example of a shard1 shard server, you can create a replica set on the 172.16.10.26 and 172.16.10.27 servers, and the creation on 172.16.10.29 will fail because 172.16.10.29 has been set as the quorum node before the replica set was created.
mongo --port 40001 //建议三台服务器都进入数据库,方便查看角色变更use adminconfig={_id:"shard1",members:[{_id:0,host:"172.16.10.26:40001",priority:2},{_id:1,host:"172.16.10.27:40001",priority:1},{_id:2,host:"172.16.10.29:40001",arbiterOnly:true}]}rs.initiate(config)
To deploy a Shard2 shard server to create a configuration file
#vim /etc/mongodb/shard2.confpidfilepath = /data/mongodb//logs/shard2.piddbpath = /data/mongodb/shard2/logpath = /data/mongodb//logs/shard2.loglogappend = truejournal = truequiet = truebind_ip = 0.0.0.0port = 40002fork = truereplSet=shard2shardsvr = truemaxConns=20000
Send a profile to a different server
scp /etc/mongodb/shard2.conf [email protected]:/etc/mongodb/scp /etc/mongodb/shard2.conf [email protected]:/etc/mongodb/
Launch Shard2 Instance
mongod -f /etc/mongodb/shard2.conf //三台服务器操作一致
Configure Shard Replica set (non-quorum node server)
mongo --port 40002 //建议三台服务器都进入数据库,方便查看角色变更use adminconfig={_id:"shard2",members:[{_id:0,host:"172.16.10.26:40002",arbiterOnly:true},{_id:1,host:"172.16.10.27:40002",priority:2},{_id:2,host:"172.16.10.29:40002",priority:1}]}rs.initiate(config)
To deploy a Shard3 shard server to create a configuration file
#vim /etc/mongodb/shard3.confpidfilepath = /data/mongodb//logs/shard3.piddbpath = /data/mongodb/shard3/logpath = /data/mongodb//logs/shard3.loglogappend = truejournal = truequiet = truebind_ip = 0.0.0.0port = 40003fork = truereplSet=shard3shardsvr = truemaxConns=20000
Send a profile to a different server
scp /etc/mongodb/shard3.conf [email protected]:/etc/mongodb/scp /etc/mongodb/shard3.conf [email protected]:/etc/mongodb/
Launch Shard3 Instance
mongod -f /etc/mongodb/shard3.conf //三台服务器操作一致
Configure Shard Replica set (non-quorum node server)
mongo --port 40003 //建议三台服务器都进入数据库,方便查看角色变更use adminconfig={_id:"shard3",members:[{_id:0,host:"172.16.10.26:40003",priority:1},{_id:1,host:"172.16.10.27:40003",arbiterOnly:true},{_id:2,host:"172.16.10.29:40003",priority:2}]}rs.initiate(config);
Deploying a routing server to create a configuration file
pidfilepath = /data/mongodb/logs/mongos.pidlogpath=/data/mongodb/logs/mongos.loglogappend = truebind_ip = 0.0.0.0port = 27017fork = trueconfigdb = configs/172.16.10.26:30000,172.16.10.27:30000,172.16.10.29:30000maxConns=20000
Send a profile to a different server
scp /etc/mongodb/mongos.conf [email protected]:/etc/mongodb/scp /etc/mongodb/mongos.conf [email protected]:/etc/mongodb/
Launch MONGOs Instance
mongos -f /etc/mongodb/mongos.conf //三台服务器操作一致*注意*这里是“mongos”而非“mongod”
Enable sharding feature
mongo //因为默认端口即是27017,所以此处不接端口号mongos> use adminmongos> sh.addShard("shard1/172.16.10.26:40001,172.16.10.27:40001,172.16.10.29:40001")mongos> sh.addShard("shard2/172.16.10.26:40002,172.16.10.27:40002,172.16.10.29:40002")mongos> sh.status() //查看群集状态//此处先添加两各分片服务器,还有一个,待会添加
Test Server shard feature set shard chunk size
mongos> use configswitched to db configmongos> db.settings.save({"_id":"chunksize","value":1}) //设置块大小为1M是方便实验,不然就需要插入海量数据WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : "chunksize" })
Analog Write Data
mongos> use pythonswitched to db pythonmongos> show collectionsmongos> for(i=1;i<=50000;i++){db.user.insert({"id":i,"name":"jack"+i})}//在python库的user表中循环写入五万条数据WriteResult({ "nInserted" : 1 })
Enable Database sharding
mongos>sh.enableSharding("python")//数据库分片就有针对性,可以自定义需要分片的库或者表,毕竟也不是所有数据都是需要分片操作的
Index created for the table
The rule to create an index is not consistent too high, to be unique, such as ordinal, such as gender, such as the repetition of too high is not suitable for indexing
mongos> db.user.createIndex({"id":1}) //以”id“为索引
Enable Table sharding
mongos> sh.shardCollection("python.user",{"id":1})
View Shard Condition
mongos> sh.status ()---sharding status---... Omit content shards: {"_id": "Shard1", "host": "shard1/172.16.10.26:40001,172.16.10.27:40001", "state": 1} {"_id": "Shard2", "host": "shard2/172.16.10.27:40002,172.16.10.29:40002", "state": 1} Omit content Chunks:shard1 3 Shard2 3 {"id": {"$minKey": 1}}-->> {"id": 9893} on:shard1 Timestamp (2, 0) {"id": 9893}-->> {"id": 19786} on:shard1 Timestamp (3, 0) {"id": 19786}--&G t;> {"id": 29679} on:shard1 Timestamp (4, 0) {"id": 29679}-->> {"id": 39572} o N:shard2 Timestamp (4, 1) {"id": 39572}-->> {"id": 49465} on:shard2 Timestamp (1, 4 {"id": 49465}-->> {"id": {"$maxKey": 1}} on:shard2 timeStamp (1, 5)
Manually add a shard server to see if the Shard condition has changed
Mongos> use adminswitched to DB adminmongos> Sh.addshard (" 172.16.10.26:40003,172.16.10.27:40003,172.16.10.29:40003 ") mongos> sh.status ()---Sharding status---... Omit content shards: {"_id": "Shard1", "host": "shard1/172.16.10.26:40001,172.16.10.27:40001", "state": 1} {"_id": "Shard2", "host": "shard2/172.16.10.27:40002,172.16.10.29:40002", "state": 1} {"_id": "Shard3", "Host": "shard3/172.16.10.26:40003,172.16.10.29:40003", "state": 1} Omit content Chunks:shard1 2 Shard2 2 Shard3 2 {"id": {"$minKey": 1}}-->> {"id": 9893} on:s Hard3 Timestamp (6, 0) {"id": 9893}-->> {"id": 19786} on:shard1 Timestamp (6, 1) {"id": 19786}-->> {"id": 29679} on:shard1 Timestamp (4, 0) {"id": 29679}-->> {" id ": 39572} on:shard3 Timestamp (5, 0) {" id ": 39572}-->> { "id": 49465} on:shard2 Timestamp (5, 1) {"id": 49465}-->> {"id": {"$maxKey": 1} } On:shard2 Timestamp (1, 5)
The server also re-shards the data, when you remove a shard server again, the data will be again fragmented processing, MONGODB data processing is very flexible
MongoDB (4.0) Shard--the way to deal with big data