Shards in MongoDB

Last Update:2015-02-12 Source: Internet

Author: User

Tags mongodb support

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Shards can add more machines to cope with increasing load and data without impacting applications. Sharding (sharding) is the splitting of data and spreading it across different machines without the need for powerful mainframe computers to store more data and handle larger loads.

MongoDB Support Auto-shard, can get rid of the management of manual fragmentation, cluster automatic segmentation data, do load balancing. The basic idea of sharding is to divide the set into small pieces, which are scattered across several slices, each of which is responsible for only part of the total data. The application does not need to know which piece corresponds to which data, and even does not know that the data has been split, so before the Shard to run a routing process (MONGOs), this route knows where all the data is stored, so the application can connect it to send the request normally. For the application, it only knows that a common mongod is connected. The router knows the correspondence between the data and the slices and is able to forward the request to the correct slice. If the request responds, the router collects it and sends it to the app.

When there is no Shard, the client connects to the Mongod process; when the Shard, the client connects to the MONGOs process, mongos details of the Shard are hidden from the application. From the point of view of application, there is no difference in sharding, so there is no need to modify the code of the application when it needs to be extended.

When to Shard:

(1) The machine's disk is not enough.

(2) A single mongod can no longer meet the performance requirements of write data

(3) Want to put a lot of data in memory to improve performance

In general, start with a shard and convert it to a component when needed.

1, tablet key

When you set up a shard, you need to select a key from the collection as the basis for data splitting, called the Slice Key (Shard key).

For example, a document collection represents a person, and if the name is selected as the Tablet key, the first piece may hold a document beginning with A-f, a second g-p name, and a third q-z name. With the addition or deletion of slices, MongoDB will re-balance the data, so that each piece of traffic is more balanced, the amount of data is also within reasonable range (for example, the volume of large volumes of data storage may be less than the volume of book data smaller).

1.1, the existing set of shards

Suppose you have a collection of storage logs, and now you want to shard. We turn on the Shard function and tell MongoDB to use "timestamp" as the tablet key, that is, all the data is put on a slice. You can insert data at will, but it's always on one piece. Then, add a slice. After this piece is built and running, MongoDB splits the set into two halves and becomes a block. Each block contains all the documents in a certain range of slice key values, assuming that one piece contains a document with a timestamp before 2015.11.11, and the other contains documents after 2015.11.11. One of the pieces will be moved to the new piece. If the timestamp of the new document is before 2015.11.11, it is added to the first block, otherwise to the second block.

1.2, the increment chip key or the random slice key

The selection of the slice key determines the distribution of the insert operation between slices. If a key such as "timestamp" is selected, the value may grow, and without too much interruption, all the data will be sent to a slice (the one that contains the date after 2015.11.11). If you add a new slice and then split the data, it will all be imported to a single server. With the addition of a new film, MongoDB Ken will be able to split the 2011.11.11 later into 2011.11.11-2021.11.11. If the document is longer than 2021.11.11, all documents are inserted in the last piece. This is not suitable for high-load situations, but pressing the photo key query can be very efficient. If the write load is relatively high, want to evenly spread the load to each piece, you have to choose evenly distributed chip keys. The hash value of the timestamp in the log example, the "logmessage" with no pattern is compounded by this condition.

Whether the chip key randomly jumps or increases steadily, the change of the chip key is very important. For example, if there is a value of "LogLevel" with only 3 values "DEBUG", "WARN", "ERROR", MongoDB can no longer use it as a tablet key to divide the data into more than 3 pieces (because there are only 3 values). If the key changes too little, but want to use it as a chip key, you can combine this key with a large variable key, create a composite sheet key, such as "LogLevel" and "timestamp" combination.

Selecting the slice key and creating the slice key is much like an index, because the two principles are similar. In fact, the tablet key is also the most commonly used index.

1.3, the effect of the chip key on the operation

End users should not be able to differentiate between shards, but understand how the query differs in the case of selecting different slice keys.

Suppose or that represents a collection of people, according to the "name" Shard, there are 3 pieces, the first letter of the range is a-Z. The following are queried in different ways:

Db.people.find ({"Name": "Refactor"})

The MONGOs will send this query directly to the Q-z tablet, and once the response is received, it is forwarded directly to the client

Db.people.find ({"name": {"$lt": "L"}})

MONGOs sends the its first to a-f and g-p slices and forwards the results to the client.

Db.people.find (). Sort ({"Email": 1})

MONGOs will be queried on all slices, and a merge sort will be done when the results are returned, ensuring that the results are in the correct order.

MONGOs uses cursors to fetch data from each server, so you don't have to wait until all the data is available to send bulk results to the client.

Db.people.find ({"Email": "[email protected]"})

MONGOs does not track the "email" key, so it is not known that the query should be sent to that piece. So he sends the query to all slices in sequence.

If the document is inserted, MONGOs will send it to the corresponding slice based on the value of the "name" key.

2. Creating shards

Creating a shard is two steps: starting the actual server and slicing the data.

A shard typically has 3 components:

(1) Tablets

A slice is a container that holds the data of subcollections, which is a single Mongod server (for development and testing) or a replica set (for production use). So there are multiple servers, there can only be one master server, the other servers to save the same data.

(2) MONGOs

MONGOs is the router process that the MongoDB is equipped with. It routes all of the requests and then aggregates the results. It does not store data or configuration information by itself, but it caches information about the configuration server.

(3) Configuring the server

The configuration server stores the configuration information for the cluster: the correspondence between the data and the slices. MONGOs does not permanently store data, so it needs a place to store the configuration of the Shard. It gets the synchronization data from the configuration server.

If MongoDB is already available, there is a ready-made piece (the current mongod can be the first one).

2.1. Start the server

2.1.1, starting the configuration server

The configuration server needs to be started first because MONGOs will use the configuration information on it. The configuration server starts just like a normal mongod.

[Email protected] master]# mkdir-p/opt/mongo/config

[Email protected] ~]# Mongod--dbpath/opt/mongo/config/--port 20000

Configuring the server does not require much space and resources (200MB actual data takes up about 1KB of configuration space)

2.1.2 Establishing the MONGOs process (application connection)

This routing server does not require a data directory, but it is important to indicate the location of the configuration server.

[Email protected] ~]# MONGOs--port 30000--configdb localhost:20000

Shard Management is done through MONGOs.

2.1.3, adding slices

A piece is an ordinary mongod instance (or replica set):

[Email protected] master]# mkdir-p/opt/mongo/shard1

[Email protected] ~]# Mongod--dbpath/opt/mongo/shard1/--port 10000

Connect the MONGOs to add a slice to the cluster. To start the shell, connect MONGOs:

[Email protected] ~]# MONGO Localhost:30000/admin

MongoDB Shell version:2.6.6

Connecting To:localhost:30000/admin

Mongos>

To add a slice with the Addshard command:

Mongos> Db.runcommand ({addshard: "localhost:10000", allowlocal:true})

{"shardadded": "shard0000", "OK": 1}

Mongos>

Add one more slice:

[Email protected] master]# mkdir-p/opt/mongo/shard2

[Email protected] ~]# Mongod--dbpath/opt/mongo/shard2--port 10001

Mongos> Db.runcommand ({addshard: "localhost:10001", allowlocal:true})

{"shardadded": "shard0001", "OK": 1}

When running on localhost, the "allowlocal" key must be set. MongoDB tries to avoid configuring the cluster locally due to misconfiguration.

When you want to add a slice, running ADDSHARD,MONGODB will be responsible for integrating the slices into the cluster.

2.2. segmenting data

MongoDB does not publish every piece of data that is stored directly, it has to open the Shard feature at the level of the database and collection first. The RGF collection of the test database is segmented with "_id" as the baseline.

2.2.1, turn on the Shard function of the database

On which server is the Shard feature open?

First run on the configuration server:

[Email protected] master]# MONGO localhost:20000

MongoDB Shell version:2.6.6

Connecting To:localhost:20000/test

Server has startup warnings:

> Use admin

Switched to DB admin

> Db.runcommand ({"enablesharding": "Test"})

{

"OK": 0,

"ErrMsg": "No such cmd:enablesharding",

"Code": 59,

"Bad cmd": {

"Enablesharding": "Test"

}

The configuration server does not have this command.

(1) Enable the Sharding feature on the routing server ("Enablesharding") on the database

Run on the routing server:

Mongos> Db.runcommand ({"enablesharding": "Test"})

{"OK": 1}

Mongos>

The test database is enabled for sharding, and when the database is fragmented, its internal collections are stored on different slices, which is also a precondition for the shards of those collections.

(2) Stacking and partitioning ("Shardcollection")

After shards are enabled at the database level, you can use the shardcollection command to accumulate and make shards:

Mongos> Db.runcommand ({"Shardcollection": "TEST.RGF", "key": {"name": 1}})

{

"Proposedkey": {

"Name": 1

"Curindexes": [

{

"V": 1,

"Key": {

"_ID": 1

"Name": "_id_",

"NS": "TEST.RGF"

}

"OK": 0,

"ErrMsg": "Please create a index that starts with the Shard key before sharding."

}

In this case, you need to create the document in the collection on the slice:

> Db.rgf.ensureIndex ({"Name": 1})

{

"Createdcollectionautomatically": false,

"Numindexesbefore": 1,

"Numindexesafter": 2,

"OK": 1

}

Run the Shard command again:

Mongos> Db.runcommand ({"Shardcollection": "TEST.RGF", "key": {"name": 1}})

{"collectionsharded": "TEST.RGF", "OK": 1}

The RGF collection of the test database is fragmented, and the slice key name, if you add data to the RGF collection, is automatically dispersed across slices based on the value of "name".

Note: Insert data is inserted on the MONGOs side.

3. Production Configuration

The following conditions are required to successfully build the Shard:

(1) Multiple configuration servers

(2) Multiple MONGOs servers

(3) Each slice is a replica set

(4) Correct setting W

3.1. Configure multiple configuration servers

Building multiple configuration servers is straightforward:

[[email protected] master]# mkdir-p/opt/mongo/config{1,2,3}

[Email protected] ~]# Mongod--dbpath/opt/mongo/config1/--port 20001

[Email protected] ~]# Mongod--dbpath/opt/mongo/config2/--port 20002

[Email protected] ~]# Mongod--dbpath/opt/mongo/config3/--port 20003

When you start MONGOs, you should connect it to these 3 configuration servers:

[Email protected] ~]# MONGOs--configdb localhost:20001,localhost:20002,localhost:20003

The configuration server uses a two-step commit mechanism rather than an asynchronous copy of ordinary MongoDB to maintain different copies of the cluster, which guarantees the consistency of the cluster state. This also means that when a configuration server is down, the cluster configuration information will be read-only. The client is still able to read and write, but only after all the configuration servers have been backed up can the data be balanced.

3.2, multiple MONGOs

The number of MONGOs is not restricted. It is recommended that you run only one MONGOs process for an app so that each application server can have a local session with MONGOs, and if the server is not working, there will be no application attempting to talk to a non-existent monges.

Shards in MongoDB

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More