MongoDB Shard Management

Source: Internet
Author: User
Tags mongo shell

Tag: Record out value star results in last data set structure port

In MongoDB (version 3.2.9), a shard cluster (sharded cluster) is a way to scale the performance of a database system horizontally, with the ability to store datasets distributed on different shards (shard), each of which only holds part of the data set. MongoDB guarantees that there will be no duplicate data between the shards, and that the sum of the data stored by all shards is the complete data set. The Shard cluster will distribute the data set and distribute the load on multiple shards, and each shard is responsible for reading and writing part of the data, making full use of the system resources of each shard and improving the throughput of the database system.

The datasets are split into chunks (chunk), each containing multiple doc, and the data blocks distributed in the Shard cluster. MongoDB is responsible for tracking the data block on the Shard distribution information, each shard store which data blocks, called shards of metadata, stored in the config Server database config, generally use 3 config server, all config The config database in the server must be exactly the same. The MONGOs can directly access the database config to view the metadata of the Shard, and the MONGO shell provides the SH helper function to safely view the metadata information of the Shard cluster.

Querying on any one shard only gets a subset of the data collection on the current Shard, not the entire dataset. Application only needs to connect to MONGOs, read and write to it, MONGOs automatically routes read and write requests to the appropriate shard. MongoDB MONGOs The underlying implementation of the Shard to application transparently, and, in application's view, accesses the entire data set.

One, primary shard

In a shard cluster, not every collection is distributed, and the collection is distributed in different Shard only after the collection shard is explicitly used with sh.shardcollection (). For non-shard collections (Un-sharded collection), their data is stored only in the primary shard (Primary shard), by default, the primary shard is the Shard originally created by the database, which stores data for non-shard collections in the database. Each database has a primary shard.

Each database in a sharded cluster have a primary shard that holds all the un-sharded collections for that database. Each database have its own primary shard.

For example, a shard cluster has three shards: Shard1,shard2,shard3, which creates a database blog in a shard shard1. If the database Bolg shards, then MongoDB will automatically create a shard2,shard3 on the same structure of the database blog, database blog primary shard is Shard1.

As shown, the primary shard of the Collection2 is Sharda.

Using the moveprimary command to change the default primary shard of the database, the non-Shard collection will move from the current Shard to the new primary shard.

Db.runcommand ({moveprimary: "Test", To: "shard0001"})

After you change the primary shard of the database using the Moveprimary command, the configuration information in config server is up-to-date and the configuration information for the MONGOs cache becomes obsolete. MongoDB provides commands: Flushrouterconfig forces MONGOs to get the latest configuration information from config server, refreshing the MONGOs cache.

Db.admincommand ({"Flushrouterconfig": 1})

Second, the metadata of the Shard

Do not go directly to the Config server to view the metadata information of the Shard cluster, which is very important and secure by connecting to the Config Data View via MONGOs, or by using the SH helper function.

Use the SH helper function to view

Sh.status ()

Connect to MONGOs view a collection in the Config database

mongos> use Config

1,shards Collection Saving Shard information

Db.shards.find ()

Shard data is stored in the replica set or standalone Mongod specified by the host.

{    "_id": "Shard_name",    "host": "Replica_set_name/host:port",    "tag": [Shard_tag1,shard_tag2]  }

2,databases Collection holds information for all databases in a fragmented cluster, regardless of whether the database is fragmented

Db.databases.find ()

If Sh.enablesharding ("Db_name") is executed on the database, the field partitioned field value is the primary shard (primary shard) of the True;primary field that specifies the database.

{    "_id": "Test",    "PRIMARY": "Rs0",    "partitioned": true}

3,collections Collection holds information for all fragmented collections, excluding non-Shard collections (un-sharded collections)

Key is: The slice key of the Shard

Db.collections.find () {    "_id": "Test.foo",    "Lastmodepoch": ObjectId ("57dcd4899bd7f7111ec15f16"),    " Lastmod ": Isodate (" 1970-02-19t17:02:47.296z "),    " dropped ": false,    " key ": {        " _id ": 1    },    " unique " : true}

The 4,chunks collection holds the data block information,

NS: The collection of shards, the structure is: db_name.collection_name

Min and Max: The minimum and maximum values of the slice keys

Shard: The Shard where the block resides

Db.chunks.find () {    "_id": "Test.foo-_id_minkey",    "Lastmod": Timestamp (1, 1),    "Lastmodepoch": ObjectId ( "57dcd4899bd7f7111ec15f16"),    "ns": "Test.foo",    "min": {        "_id": 1    },    "Max": {        "_id": 3087    },    "Shard": "Rs0"}

5,changelog Collection records the operation of a shard cluster, including chunk split and migration operations, shard Add or remove operations

What field: Represents the type of operation, for example: Multi-split represents a split of chunk,

' What ': ' Addshard ', ' What ': ' Shardcollection.start ', ' What ': ' shardcollection.end ', ' What ': ' Multi-split ',

6,tags record the tag of the Shard and the corresponding chip key range

{"    _id": {"ns": "Records.users", "min": {"ZipCode": "10001"}},    "ns": "Records.users",    "min": {"ZIPC Ode ":" 10001 "},    " Max ": {" ZipCode ":" 10281 "},    " tag ":" NYC "}

The 7,settings collection records the size of the equalizer state and chunk, the default chunk size is 64MB.

{"_id": "Chunksize", "Value": 64}
{"_id": "Balancer", "Stopped": false}

8,locks Collection records the distribution lock (distributed lock), guaranteeing that only one MONGOs instance can perform administrative tasks in the Shard cluster.

MONGOs when serving as balancer, it acquires a distribution lock and inserts a doc into the config.locks.

The locks collection stores a distributed lock. This ensures is only one MONGOs instance can perform administrative tasks on the cluster at once. The MONGOs acting as Balancer takes a lock by inserting a document resembling the following into the locks collection.

{    "_id": "Balancer",    "process": "example.net:40000:1350402818:16807",    "state": 2,    "ts": ObjectId (" 507daeedf40e1879df62e5f3 "),    " when ": Isodate (" 2012-10-16t19:01:01.593z "),    " who ":" Example.net : 40000:1350402818:16807:balancer:282475249 ",    " why ":" Doing balance Round "}

Third, delete shards

When you delete a shard, you must make sure that the data on that Shard is moved to another shard, that you use the equalizer to migrate the data block in a collection of shards, and that the primary shard of the collection must be modified for non-Shard collections.

1. Delete the fragmented collection data

Step1, make sure the equalizer is turned on.

Sh.setbalancerstate (TRUE);

Step2, migrating the fragmented collection to a different shard

Use admin
Db.admincommand ({"Removeshard": "Shard_name"})

The Removeshard command migrates data blocks from the current shard to other shards, and the migration process can take a long time if the chunks on the Shard are more numerous.

Step3, check the status of the data block migration

Use Admindb.runcommand ({removeshard: "Shard_name"})

Use the Removeshard command to view the state of a block migration, and the remaining field represents the number of remaining data blocks

{     "msg": "Draining ongoing",    "state": "ongoing",    "remaining": {        "chunks":,        "DBS": 1    },
    "OK": 1}

STEP4, data block complete migration

Use Admindb.runcommand ({removeshard: "Shard_name"}) {    "msg": "Removeshard completed successfully",    "state": " Completed ",    " Shard ":" Shard_name ",    " OK ": 1}

2. Delete the non-fragmented database

Step1, viewing the non-fragmented database

A database that is not fragmented, including two parts:

    • The database is not fragmented, the data is not using sh.enablesharding ("Db_name"), and in database config the partitioned field of the database is False
    • The collection is not fragmented in the database, that is, the current Shard is the primary shard of the collection
Use Configdb.databases.find ({$or: [{"Partitioned": false},{"PRIMARY": "Shard_name"}]})

For Partitioned=false databases, all of their data is stored in the current Shard, and for partitioned=true,primary= "Shard_name", there is no Shard (un-sharded Collection) is stored in the database, the primary shards of these collections must be changed.

Step2, modifying the primary shard of the database

Db.runcommand ({moveprimary: "Db_name", To: "New_shard"})

Four, adding shards

Because shards are stored as part of a dataset, for high availability of data, it is recommended to use replica set as Shard, even if the replica set contains only one member. Connect to MONGOs and use the SH helper function to add shards.

Sh.addshard ("Replica_set_name/host:port")

Standalone Mongod is not recommended as a shard

Sh.addshard ("Host:port")

Five, large block

In some cases, the chunk will continue to grow, exceeding the chunk size limit, becoming a mega block (Jumbo Chunk), because all the doc in chunk uses the same tablet key (Shard key), causing MongoDB to not split the chunk, If the chunk continues to grow, it will result in uneven distribution of chunk and become a performance bottleneck.

When migrating chunk, there are limitations: each chunk size cannot exceed 25,000 doc, or 1.3 times times the configuration value. Chunk size The default configuration value is 64MB, exceeding the limit of chunk will be marked by MongoDB as a large block (jumbo chunk), MongoDB can not move the large block to other shard.

MongoDB cannot move a chunk if the number of documents in the chunk exceeds either 250000 documents or 1.3 times the Resul T of dividing the configured chunk size by the average document size.

1. View oversized blocks

Use Sh.status (), to be able to find extra large blocks, the presence of a jumbo symbol behind a large block

{"X": 2}-->> {"X": 3} on:shard-a Timestamp (2, 2) jumbo

2, distribution of large blocks

Large blocks cannot be split, cannot be distributed automatically by equalizer, and must be distributed manually.

Step1, turn off the equalizer

Sh.setbalancerstate (False)

Step2, increase the configuration value of chunk size

Since MongoDB does not allow the movement of oversized blocks beyond the limit, it is necessary to temporarily increase the configuration value of chunk size, and then distribute the oversize blocks evenly into the Shard cluster.

Use Configdb.settings.save ({"_id": "Chunksize", "value": "1024"})

Step3, moving a large block

Sh.movechunk ("Db_name.collection_name", {sharded_filed: "Value_in_chunk"}, "New_shard_name")

STEP4, Enable equalizer

Sh.setbalancerstate (True)

STEP5, refresh the configuration cache for MONGOs

Forces MONGOs to synchronize configuration information from config server and flush the cache.

Use admin
Db.admincommand ({flushrouterconfig:1})

Six, Equalizer

Equalizer is changed by MONGOs, that is, MONGOs is not only responsible for routing the query to the corresponding Shard, but also responsible for the data block equalization. In general, MongoDB automatically handles data equalization by Config.settings to see the status of Balancer, or by using the SH helper function to view

Sh.getbalancerstate ()

Returns true, indicating that the equalizer is running, the system automatically handles data equalization, and using the SH helper function to close the balancer

Sh.setbalancerstate (False)

Balancer cannot immediately terminate a running block migration operation, and when mongos transitions to balancer, a balancer lock is requested to view the Config.locks collection,

Use Configdb.locks.find ({"_id": "Balancer"})


Sh.isbalancerrunning ()

If state=2, indicates that balancer is active, and if state=0, indicates that balancer has been closed.

The equalization process is actually migrating blocks of data from one Shard to another Shard, or first splitting a large chunk into small chunk and then migrating the smaller blocks to other shard, the migration and splitting of the blocks increases the IO load on the system, and it is best to limit the equalizer's active time to when the system is idle , you can set the active time window of the balancer and limit the balancer to split and migrate data blocks within a specified time interval.

Use Configdb.settings.update (
{"_id": "Balancer"}, "$set": {"ActiveWindow": {"Start": "23:00", "Stop": "04:00"}}), True)

The Equalizer splits and moves the object is chunk, the equalizer only guarantees the chunk quantity to be balanced on each shard, as to each chunk contains the doc quantity, does not necessarily be balanced. There may be some chunk contain a lot of doc, while some chunk contain a small number of doc, or even contain no doc. Therefore, you should carefully select the index key of the Shard, that is, the slice key, if a field can meet the needs of the vast majority of queries, but also to make the doc number evenly distributed, then this field is the best choice for the tablet key.

Reference Documentation:

Config Database

Sharding

Shards

Moveprimary

sharded Cluster Administration

Data Partitioning with Chunks

MongoDB Shard Management

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.