The first part
Before we build MongoDB, we need to consider a few minor issues:
1. What is the purpose of building a cluster? Do multiple backups improve fault tolerance and system availability or scale-out storage of large-scale data or both?
If it is for multi-backup then choose Replication Cluster Build, if it is to deal with big data need to build sharding cluster, if both need to create replica for each shardsvr.
2. What is sharding? What's the difference with replication?
In a nutshell, replica is a basic unit of service provided by MONGO, and the standalone system and replication cluster are no different to the user, they are only equivalent to a service node, but the replication cluster has multiple backups, there are also service-side elections, security is more assured. And the Sharding cluster contains 3 roles: Mongos,configsvr,shardsvr, for a cluster mongos equivalent to master, responsible for providing services to the outside, Shardsvr equivalent to slave, responsible for the Shard storage data, Configsvr is the equivalent of router, which is responsible for recording Shard Meta information. The child nodes in any of these 3 roles are a replica. Refer to the official website for descriptions of sharding and replication:
replication:https://docs.mongodb.com/manual/replication/
sharding:https://docs.mongodb.com/manual/sharding/
3. What does the architecture of our cluster look like?
If you have enough knowledge of questions 1 and 2, then it is generally clear what kind of cluster is built according to the local hardware environment, each SHARDSVR replication is equivalent to a slave, we need several subnodes to create how many shardsvr, CONFIGSVR is router information, we can make all the machines a CONFIGSVR replication to provide router services, as for MONGOs, internal use of a node can also, If you need to run stably, you also need to group a small mongos replication.
Part II
Here is the actual session:
I can have 5 servers to run MongoDB and a batch of data, of course, these 5 machines also run other frameworks such as Spark,hadoop and so on, because Spark and Hadoop are single point of failure (what?). How master?secondary? Not exist, the two-man deployment cluster is always a single point of failure) so MONGOs is also a node, the data is stored on 5 machines, and because of the large amount of data, hard disk (Others Group of RAID5, plus a server also 1T multi-space), So certainly regardless of backup and stability (2 backup hard disk is not much place, HDFs and other data to put), then the architecture can be built as follows:
Each of the following shardsvr is a separate replica to begin deployment:
1. Create the configuration file:
A) Configsvr
systemlog:destination:filePath:"/home/cloud/platform/logs/mongodb/configsvr.log"Logappend:trueStorage:dbpath:"/home/cloud/platform/data/configdata"journal:enabled:trueSetparameter:enablelocalhostauthbypass:falseprocessmanagement:fork:trueReplication:replsetname:"configsvr0"Sharding:clusterrole:"Configsvr"
b) Shardsvr
systemlog:destination:filePath:"/home/cloud/platform/logs/mongodb/shardsvr.log"Logappend:trueStorage:dbpath:"/home/cloud/platform/data/sharddata"journal:enabled:trueSetparameter:enablelocalhostauthbypass:falseprocessmanagement:fork:trueReplication:replsetname:"SHARDSVR1"Sharding:clusterrole:"Shardsvr"
c) MONGOs
systemlog:destination:filePath:"/home/cloud/platform/logs/mongodb/mongos.log"Logappend:trueNet:bindip:192.168.12.161Port:27017Setparameter:enablelocalhostauthbypass:falseprocessmanagement:fork:truesharding:configdb:"configsvr0/ 192.168.12.161:27019,192.168.12.162:27019,192.168.12.163:27019,192.168.12.164:27019,192.168.12.169:27019"
Note: The configuration on each machine is slightly different, easy to refer to the official document to modify, Replsetname This is the replication settings, each role sub-replication should have the same value, different replication should have different values
Next is the startup script
A) Shardsvr
#!/bin/bash# Use this toinitiate:rs.initiate ({_id:"shardsvr1", members:[{_id: 0, Host:"192.168.12.161:27018"}]})/home/cloud/platform/mongodb- 3.4. 5/bin/mongod--config/home/cloud/platform/mongodb-3.4. 5/shardserver.conf
b) Configsvr
#!/bin/Bash#use this to Initiate:rs.initiate ({_id:"configsvr0", Configsvr:true, members:[{_id:0, Host:"192.168.12.161:27019"},{_ID:1, Host:"192.168.12.162:27019"},{_ID:2, Host:"192.168.12.163:27019"},{_ID:3, Host:"192.168.12.164:27019"},{_ID:4, Host:"192.168.12.169:27019"}]}) Mongo_home=/home/cloud/platform/mongodb-3.4.5/${mongo_home}/bin/mongod--config ${mongo_home}/configserver.conf
c) MONGOs
#!/bin/Bash#mogos dont need to initiate,#SH. enablesharding ("dbname") to create database#SH. Shardcollection ("Dbname.tablename", {ID:"Hashed"}) to create a shard tableSplitByID/home/cloud/platform/mongodb-3.4.5/bin/mongos--config/home/cloud/platform/mongodb-3.4.5/mongosserver.conf
2. Startup process
A. Copy scripts and configuration files to each machine
B, start each shardsvr, and then log on to Shardsvr to perform the initialization process:
1, the implementation of START-SHARDSVR. SH 2, execute Bin/mongo--host ${hostip}--Port${hostport} shardsvr The default ports are 27018 CONFIGSVR The default port is 27019 the default port for MONGOs is 27017 The port is not specified in the configuration file above, everything is default to primary 3, execution rs.initiate ({_id:"shardsvr1 ", members:[{_id:0, Host:"192.168.12.161:27018" }]}) for initialization work 4, execute rs.status () to view the Shardsvr status, a successful example is as follows:
C, start all configsvr, and use the MONGO--host--port command to log on to any configsvr configsvr port (default:27019). and perform the initialization work:
Rs.initiate ({_id:"configsvr0", Configsvr:true, members:[{_id:0, Host:"192.168.12.161:27019"},{_ID:1, Host:"192.168.12.162:27019"},{_ID:2, Host:"192.168.12.163:27019"},{_ID:3, Host:"192.168.12.164:27019"},{_ID:4, Host:"192.168.12.169:27019"}]})
D, start the MONGOs, this time can already perform our operation on the MONGOs.
Printshardingstatus ()
Then is the normal MONGO shell operation, you can use MONGOs as a common single-machine MongoDB, the operation is basically the same, in addition to creating sharding table
Create the table as follows:
sh. enablesharding ("dbname") to create databasesh. Shardcollection ("dbname.tablename", {"_id"" hashed"}) to the Create a shard table hashed by _id
Note that "_id" is the MONGO shard basis, can not be repeated, if you want to use other fields to hash, the command "_id" to the field name, but MONGO will automatically create a _id column to index
To add an index:
Db.collectionname.ensureIndex ({"indexcolumn":1})
Part III
Javaapi Small Tips
Get the connection:
New Mongoclient ("192.168.12.161", 27017= mongo.getdatabase ("TestDB"= db.getcollection ("origin2")
Insert data:
Newnew documentd.append ("path", X.getpath) d.append ("name", X.getname ) d.append ("Content", filterhtml (Source.fromfile (x, detector (x)). Getlines (). ToArray ("\ n" ). New Insertmanyoptions (). Ordered (false))
During the insert process, if "_id" is duplicated, the current insert is aborted by default and a exception is added, that is, the previous data is inserted, the data is not inserted into the table, and the new Insertmanyoptions () is appended to it. The ordered (false) parameter allows you to insert all the data that is not duplicated and then throw a exception
Sharding cluster construction of mongoDB3.4 and the simple use of JAVAAPI