High-Availability MongoDB cluster

Source: Internet
Author: User
Tags failover mongodb sharding

1. Preamble

MongoDB is a scalable, high-performance, open-source, mode-free, document-oriented database. It is written using C + +. MongoDB contains features:

    • L? Collection-oriented storage: suitable for storing objects and data in JSON form.

    • L? Dynamic query: Mongo support Rich query methods, query instructions using JSON-style tags, you can easily query the document embedded objects and arrays.

    • L? Full index support: Includes embedded objects and arrays in the document. The query optimizer of Mongo parses the query expression and generates an efficient query plan.

    • L? Query monitoring: MONGO contains a monitoring tool for analyzing database operational performance.

    • L? Replication and automatic failover: The Mongo database supports data replication between servers, supporting master-slave mode and inter-server replication. The primary purpose of replication is to provide redundancy and automatic failover.

    • L? Efficient traditional storage: supports binary data and large objects (such as photos or pictures).

    • L? Auto-sharding to support cloud-scale scalability: Automatic sharding supports a level of database clustering, adding additional machines dynamically.

2. Background

The main goal of MongoDB is to combine the advantages of both the key-value pair storage (which provides high performance and high scalability) and the traditional RDBMS (relational database) system. Mongo Use the scene:

    • L? Website data: Mongo is ideal for real-time inserts, updates and queries, as well as the replication and high scalability required for real-time data storage on the site.

    • L? Caching: Because of its high performance, Mongo is also suitable as a caching layer for the information infrastructure. After the system restarts, the persistent cache built by Mongo can avoid overloading the underlying data sources.

    • L? Large, low-value data: Storing some data in a traditional relational database can be expensive, before many programmers often choose traditional files for storage.

    • L? Highly scalable scenario: The MONGO is ideal for databases made up of dozens of or hundreds of servers
    • L? Storage for objects and JSON data: MONGO's Bson data format is ideal for storing and querying document formats.

  Note: This article is intended to introduce a highly available MongoDB cluster, which does not discuss HDFS for the Hadoop platform. According to the actual business needs of the company, select the appropriate storage system.

Of course Mongdb also have unsuitable scenes:

    • L? A highly transactional system, such as a bank or accounting system. The traditional relational database is still more suitable for applications that require a lot of atomic replication.

    • L? Traditional business intelligence applications: a BI database for a specific problem will produce highly optimized query methods. For such applications, the data warehouse may be more appropriate for the choice (such as Hive in the Hadoop suite).

    • L? An issue that requires SQL.

3. Build 3.1 Environment Preparation

Download the Linux version installation package on Mongo's website and unzip it to the corresponding directory; Due to limited resources, we use Replica sets + sharding to configure high availability. The structure diagram looks like this:

Here I explain the meaning of the figure.

    • L? Shard Server: Use the replica sets to ensure that each data node has the ability to back up, auto-fault transfer, and auto-recover.
    • L? Configuration server: Use 3 configuration servers to ensure metadata integrity.

    • L? Routing process: Balancing with 3 routing processes for improved client access performance

    • L? 3 Shard processes: Shard11,shard12,shard13 constitutes a replica set that provides the functionality of Shard1 in sharding.

    • L? 3 Shard processes: Shard21,shard22,shard23 constitutes a replica set that provides the functionality of SHARD2 in sharding.

    • L? 3 Configuration server processes and 3 router processes.

Building a MongoDB sharding Cluster requires three roles: Shard Server (shardserver), configuration Server (config), routing process

  Shard Server

The Shard server is the Shard that stores the actual data, each shard can be a mongod instance, or it can be a Replica sets of a set of Mongod instances. In order to automate the automatic conversion of faults within each shard, MongoDB officials recommend each Shar D is a group of Replica sets.
  Configure the server

In order to store a particular collection in multiple Shard, you need to specify a shard key for the collection, determine which chunk the record belongs to, and the configuration server can store the following information, configuration information for each shard node, The Shard key range for each chunk, chunk in the distribution of each shard, the collection configuration information for all DB and sharding in the cluster.
Routing process

It is a front-end routing, the Client access, first ask the configuration server to which shard to query or save records, and then connect the corresponding shard to perform operations, and finally return the results to the client, the client only need to send the original Mongod query or update request intact to the routing process Without worrying about which shard the records are stored on.

According to the architecture diagram, it is theoretically necessary to have 16 machines, because of the limited resources, the use of directories to replace the physical machine (there is a risk, if one of the machines down, the configuration of the machine service will be down), the following gives the configuration table:

Server

Host

Services and Ports

1

10.211.55.28

shard11:10011 shard21:10021 configsvr:10031 mongos:10040

2

10.211.55.28

shard12:10012 shard22:10022 configsvr:10032 mongos:10042

3

10.211.55.28

shard13:10013 shard23:10023 configsvr:10033 mongos:10043

3.2. Environment variables

Here is a configuration of the environment variables for MongoDB, enter the command and configure:

vi /etc/profileexport mongo_home=/root/mongodb-linux-x86_64-2.6. 7 export PATH= $PATH: $MONGO _home/bin

Then save the exit and enter the following command configuration file to take effect immediately:

? [Email protected] ~]#. /etc/profile
3.3. Configure Shard+replica Sets

We start all of the Shard1 processes separately and set the replica set to: Shard1. The script file that launches Shard1 is given below.

    • L? Shard11.conf



directoryperdb=truelogappend=true replset=shard1 Port=10011 Fork= true shardsvr= true journal= True
    • L? Shard12.conf


Pidfilepath=/mongodb/pid/shard12.pid Directoryperdb=truelogappend=true Replset=shard1 Port=10012 Fork=true shardsvr=true Journal=true
    • L? Shard13.conf
? dbpath=/mongodb/data/shard13 logpath=/mongodb/log/shard13.log pidfilepath=/mongodb/pid/  Shard13.pid Directoryperdb=truelogappend=true  replset=shard1 Port =10013  Fork=true  shardsvr=true  Journal=  True
    • L? Shard21.conf
dbpath=/mongodb/data/shard21logpath=/mongodb/log/shard21.log pidfilepath=/mongodb/pid/  Shard21.pid Directoryperdb=truelogappend=true  replset=shard2 Port =10021  Fork=true  shardsvr=true  Journal=  True
    • L? Shard22.conf

dbpath=/mongodb/data/shard22logpath=/mongodb/log/shard22.log pidfilepath=/mongodb/pid/  Shard22.pid Directoryperdb=truelogappend=true  replset=shard2 Port =10022Fork=true  shardsvr=true  Journal=  True
    • L? Shard23.conf
dbpath=/mongodb/data/shard23logpath=/mongodb/log/shard23.log pidfilepath=/mongodb/pid/  Shard23.pid Directoryperdb=truelogappend=true  replset=shard2 Port =10023Fork=true  shardsvr=true  Journal=  True
    • L? Config1.conf


Pidfilepath=/mongodb/pid/config1.pid Directoryperdb=truelogappend=true Port=10031 Fork=true configsvr=true Journal= True
    • L? Config2.conf
? dbpath=/mongodb/config/config2 logpath=/mongodb/log/config2.log pidfilepath=/mongodb/pid/  Config2.pid Directoryperdb=truelogappend=true  Port=10032  Fork= true configsvr= true journal= True
    • L? Config3.conf
dbpath=/mongodb/config/config3 logpath=/mongodb/log/config3.log pidfilepath=/mongodb/pid/  Config3.pid Directoryperdb=truelogappend=true  Port=10033  Fork= true configsvr= true journal= True
    • Route.conf
Configdb=mongo:10031, MONGO:10032, MONGO:10033
Pidfilepath=/mongodb/pid/route.pidport=10040chunkSize=1logpath=/ mongodb/log/route.log Logappend=trueFork=true
    • Route2.conf
Configdb=mongo:10031, MONGO:10032, MONGO:10033  pidfilepath =/mongodb/pid/  Route.pidport=10042chunkSize=1logpath=/mongodb/log/route2.log Logappend=trueFork=true
    • Route3.conf
Configdb=mongo:10031, MONGO:10032, MONGO:10033  pidfilepath =/mongodb/pid/  Route3.pidport=10043chunkSize=1logpath=/mongodb/log/route3.log Logappend=trueFork=true

Note: The directory in the configuration file must exist and does not exist to be created.

3.4. Start the batch process

The following is a script for starting a batch, with the following:

Mongod-f/etc/shard11.confmongod-f/etc/shard12.confmongod-f/etc/shard13.confmongod-f/etc/shard21.confmongod-f/ Etc/shard22.confmongod-f/etc/shard23.confmongod-f/etc/config1.confmongod-f/etc/config2.confmongod-f/etc/ Config3.confmongos-f/etc/route.confmongos-f/etc/route2.confmongos-f/etc/route3.conf 3.5. Parameter description

DBPath: Data Storage Directory
LogPath: Log storage path Logappend: Log logging in Append mode
Replset:replica Set's name
The port number used by the PORT:MONGODB process, which defaults to 27017fork: Run the process later

Journal: Write Log
Smallfiles: Add this parameter when insufficient prompt space
Other parameters
Pidfilepath: Process file, convenient to stop MONGODBDIRECTORYPERDB: Set up a folder for each database according to the database name Bind_ip:mongodb the IP address that is bound
Oplogsize:mongodb the maximum size of the operation log file. Mb, default to 5% of the hard disk's remaining space
Noprealloc: No pre-allocated storage
Shardsvr: Shards
CONFIGSVR: Configuring the Service Node
CONFIGDB: Configuring the Config node to the route node

3.6. Configure the table and slice keys for the shards

First, we need to log in to the routing node, where we log in to the node under one of the 10040 ports. Enter the following command:

? MONGO MONGO:10040Use admin
Db.runcommand ({addshard:"shard1/mongo:10011,mongo:10012,mongo:10013"})
Db.runcommand ({addshard:"shard2/mongo:10021,mongo:10022,mongo:10023"})
Db.runcommand ({listshards:1}) #列出 number of Shard
Db.runcommand ({enablesharding:"Friends"}); #创建 Friends Library Db.runcommand ({shardcollection:"Friends.user", key: {ID:1},unique:true}) # Use the user table to do the Shard, the slice key isIDand unique
3.7. Verification

At this point, the entire cluster is built, and below we test the high availability of the cluster. The following is given:

The first is to view the status diagram of the cluster:

You can see that there is data in the cluster, this is the data that I used to test, note that Mongo only the data to a certain amount will be fragmented, all of the data I inserted is relatively large, each test is 10w record insertion.

Below, I kill shard11 service, see what will happen? as follows:

I've killed Shard11 's process service here. Next, we enter the route node on port 10040: Db.user.stats () to view the status, and the display is working properly. As shown below:

You can also insert a 10w record in the routing node to see if it succeeds, and the insert script below shows the following:

?  for (Var i=1; i<=100000; i++) Db.user.save ({id: i,value1:"1234567890 ", value2:"1234567890", Value3:"123 4567890" , Value4:"1234567890"});
4. Summary

This piece of article to share here, if in the process of research what problem can add group discussion or send mail to me, I will do my best to answer for you, with June Mutual encouragement!

High-Availability MongoDB cluster

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.