Data segmentation method of MongoDB Shard

Source: Internet
Author: User

with the development of mobile Internet, a large number of unstructured data comes into being, which not only puts forward new requirements for database storage Big data, but also puts forward demanding requirements for querying data and making big data analysis, which are obviously not satisfied with single server processing ability, and it is inevitable to set up a cluster naturally. The complexity of the cluster is well known, and One of the advantages of MongoDB can help us solve these problems formally.

Shards (sharding)

Shards are MongoDB provides a mechanism by which large collection partitions can be saved to different servers. MongoDB can do everything for us almost automatically compared to other partitioning schemes. As long as we have a simple configuration and tell MongoDB What data to allocate, it can automatically maintain the balance of data between different servers. While adding or subtracting servers as needed,MongoDB will automatically move the existing data.

The Shard mechanism provides the following three advantages

1. abstract The cluster so that the cluster is "invisible".

MongoDB comes with a call MONGOs the proprietary routing process. MONGOs is the router that grasps the unified intersection, it will send the request of the client to one or a set of servers in the cluster correctly, and will assemble the received response and send it back to the client.

2. ensure that the cluster is always readable and writable.

MongoDB ensure cluster availability and reliability in a variety of ways. using MongoDB 's sharding and replication capabilities, while ensuring that the data is fragmented to multiple servers, ensures that every data is backed up accordingly, so that when a server is replaced, the other replicas can immediately take over the broken part to continue working.

3. make the cluster easy to scale.

when the system needs more space and resources, MongoDB allows us to expand the system capacity as needed.

Achieve Data segmentation

Shards (Shard) is one or more servers in a cluster that store a subset of the collection's data. In a production environment, a shard is typically a replica set (replica set).

Tablet key (Key) , MongoDB It is used as a basis to determine the data that needs to be moved between different Shard servers. For example, we can select the user name (username) field as the Shard key, the existing user name interval ["P", "Z"], then Wufengtinghai belong to this interval, the data will eventually be saved to the Shard server corresponding to this interval.

assigning data to a shard server

assigning data to a shard server can be used in different ways, and understanding the different ways can deepen our MongoDB understanding of how to use it.

One shard and one interval

The simplest way to allocate data to a shard is to have one shard at a interval. Assuming we have four shards to store the information about the user, we may get the following shard and interval correspondence.

this shard is very easy to understand, but it can cause a lot of inconvenience in a large, busy system. If a large number of users use the first letter in "" " a", "F") The name in to register, which causes the Shard 1 is larger, so you need to move some of its documents to a shard 2 , we can adjust the Shard 1 corresponding Interval "" a", "C"), Make Shards 2 The interval of the becomes "" C "," n ") .


If you move the data, the Shard2So what about overloading? Assuming Shards1and Shards2each has500Gdata, while Sharding3and Shards4each has300Gdata. So according to this scheme, a series of copies will eventually be needed, all of which need to be moved400Gdata, considering the need to move the data between the servers in the cluster, the amount of mobile data that is visible is large.


What if you need a new Shard server to scale horizontally? Let's say you have the same on every shard at this time500Gdata, then we now need to add the Shard4on the400Gdata movement to shards5, The Shard3of the300Gdata movement to shards4, The Shard2of the200Gdata movement to shards3, The Shard1of the100Gdata movement to shards2and moved a whole1Tthe data!


as the number of shards and the amount of data grows, this nightmare will continue, so MongoDB not in this way.

a piecewise multi-interval

If we take a piecewise multi-interval approach, we can use the Shard1is divided into two intervals, ""a "," D ")include400Gdata, ""d "," F ")include100Gdata, we can also2do a similar deal to get the interval ""F "," J ")and ""J "," N "). Now we just need to put the Shard1on the ""d "," F ")data movement to shards4, The Shard2of ""J "," N ")data is moved to the Shard3. So we just need to move200Gdata.


If you want to add a new Shard, you can fetch it from the top of each shard 100G data and move it to a new shard, so that you only need to move 400G data can be.


MongoDB In this way, when the data of a shard becomes larger, it automatically splits the slice key interval and splits the Shard's data into other shards.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.