Deep understanding of MongoDB fragmentation Management _mongodb

Source: Internet
Author: User
Tags mongodb time interval access database mongo shell

Objective

In MongoDB (version 3.2.9), a fragmented cluster (sharded cluster) is a way to extend the performance of the database system horizontally, storing the data set distributed on different slices (shard), each fragment saving only part of the dataset, MongoDB guarantees that there will be no duplicate data between the fragments, and that the sum of all the fragments saved data is the complete dataset. The fragmented cluster can distribute the data to a plurality of slices, and each fragment is responsible for reading and writing only part of the data, making full use of the Shard system resources and improving the throughput of the database system.

The dataset is split into data blocks (chunk), each containing multiple doc, and the data block is distributed in a fragmented cluster. MongoDB is responsible for tracking the distribution of data blocks on the Shard, which data blocks each fragment stores, called fragmented metadata, stored in database config on config server, typically using 3 config servers, all config The config database in the server must be exactly the same. The MONGOs can directly Access database config to view the fragmented metadata, and the MONGO shell provides SH auxiliary function to safely view the metadata information of the fragmented cluster.

Querying any one of the Shard will only get a subset of the data collection on the current fragment, not the entire dataset. Application only need to connect to MONGOs, read and write to it, MONGOs automatically route read and write requests to the corresponding shard. MongoDB through MONGOs to application transparent to the bottom of the slice, in application view, access to the entire dataset.

One, the main slice

In a fragmented cluster, not every collection will be distributed, and the collection will be distributed in different Shard only after the collection is explicitly fragmented using sh.shardcollection (). For non-fragmented collections (Un-sharded collection), their data is stored only in the primary fragment (Primary shard), by default, the primary fragment is the Shard that the database was originally created for storing data in a fragmented collection of the database. Each database has a primary fragment.

Each of the database in a sharded cluster has a primary shard this holds all the un-sharded to that database. Each database has its own primary shard.

For example, a fragmented cluster has three slices: shard1,shard2,shard3, creating a database blog in the fragmented Shard1. If the database Bolg fragmented, then MongoDB will automatically create a structure on the SHARD2,SHARD3 database blog, the database blog primary shard is Shard1.

The main fragment of the Collection2 is Sharda.

Using movePrimary the command to change the default primary shard of the database, the non-fragment collection will move from the current Shard to the new primary fragment.

Db.runcommand ({moveprimary: "Test", To: "shard0001"})

After you use the moveprimary command to change the primary fragmentation of the database, the configuration information in config server is up to date and MONGOs cached configuration information becomes obsolete. MongoDB provides the command: Flushrouterconfig forces MONGOs to obtain the latest configuration information from config server and refresh the MONGOs cache.

Db.admincommand ({"Flushrouterconfig": 1})

Second, the fragment of metadata

Do not go directly to the Config server to view the metadata information of the fragmented cluster, which is very important, in a secure manner by MONGOs connection to config data viewing, or by using the SH auxiliary function.

Use the SH auxiliary function to view

Sh.status ()

Connect to MONGOs view the collection in the Config database

mongos> use Config

1,shards collection to save fragmentation information

Db.shards.find ()

The Shard data is stored in the replica set or standalone Mongod specified by the host.

{"
 _id": "Shard_name", "
 Host": "Replica_set_name/host:port",
 "tag": [Shard_tag1,shard_tag2] 
}

The 2,databases collection holds information about all the databases in the fragmented cluster, regardless of whether the database is fragmented

Db.databases.find ()

If executed on the database sh.enableSharding(“db_name”) , the field partitioned field value is the primary fragment (primary shard) of the True;primary field-specified database.

{
 "_id": "Test",
 "PRIMARY": "Rs0",
 "partitioned": True
}

3,collections Collection holds information for all fragmented collections, excluding non-fragmented collections (un-sharded collections)

Key is: Slice key

Db.collections.find ()

{
 "_id": "Test.foo",
 "Lastmodepoch": ObjectId ("57dcd4899bd7f7111ec15f16"),
 "Lastmod": Isodate ("1970-02-19t17:02:47.296z"),
 "dropped": false,
 "key": {
  "_id": 1
 },< c23/> "unique": True
}

The 4,chunks collection holds the data block information,

NS: A set of fragments, structured as:db_name.collection_name

Min and Max: The minimum and maximum values of the slice keys

Shard: The fragment where the block is located

Db.chunks.find ()

{
 "_id": "Test.foo-_id_minkey",
 "Lastmod": Timestamp (1, 1),
 "Lastmodepoch": ObjectId ("57dcd4899bd7f7111ec15f16"),
 "ns": "Test.foo",
 "min": {
  "_id": 1
 },
 "Max": {
  " _id ": 3087
 },
 " Shard ":" Rs0 "
}

5,changelog sets record the operations of fragmented clusters, including chunk splits and migration operations, shard additions or deletions

What field: Represents the type of operation, for example: Multi-split represents a split of chunk,

"What": "Addshard", "
What": "Shardcollection.start",
"what": "Shardcollection.end", 
"what": " Multi-split ",

6,tags records the Shard tag and the corresponding slice key range

{"
 _id": {"ns": "Records.users", "min": {"ZipCode": "10001"}},
 "ns": "Records.users",
 "min": {"ZIPC Ode ":" 10001 "},
 " Max ": {" ZipCode ":" 10281 "},"
 tag ":" NYC "
}

The 7,settings set records the size of the equalizer status and chunk, and the default chunk size is 64MB.

{"_id": "Chunksize", "value":
{"_id": "Balancer", "Stopped": false}

8,locks Sets record distribution locks (distributed lock) to ensure that only one MONGOs instance can perform administrative tasks in a fragmented cluster.

When MONGOs is acting as a balancer, it acquires a distribution lock and inserts a doc into the config.locks.

The locks collection stores a distributed lock. This ensures is one MONGOs instance can perform administrative tasks on the cluster at once. The MONGOs acting as Balancer takes a lock by inserting a document resembling the following to the locks collection.

{
 "_id": "Balancer", "
 process": "example.net:40000:1350402818:16807",
 "state": 2,
 "ts": ObjectId (" 507daeedf40e1879df62e5f3 "),
 " when ": Isodate (" 2012-10-16t19:01:01.593z "),
 " who ":" Example.net : 40000:1350402818:16807:balancer:282475249 ",
 " why ":" Doing balance round "
}

Third, delete the fragment

When you delete a fragment, you must ensure that the data on the fragment is moved to other slices, and for a set of fragments, the equalizer is used to migrate the data block, and for a non fragmented set, the main fragment of the set must be modified.

1, delete the fragmented collection data

Step1, make sure the equalizer is open.

Sh.setbalancerstate (TRUE);

Step2, migrate all fragmented collections to other slices

Use admin
db.admincommand ({"Removeshard": "Shard_name"})

The Removeshard command migrates chunks of data from the current fragment to other slices, and if there are more chunks of data on the fragment, the migration process can take a long time.

Step3, check the status of the data block migration

Use admin
db.runcommand ({removeshard: "Shard_name"})

Use removeShard commands to view the state of block migration, remaining field represents the number of remaining blocks of data

{"
  msg": "Draining ongoing",
 "state": "ongoing",
 "remaining": {
  "chunks": "",
  "DBS": 1
 },
    "OK": 1
}

STEP4, data block complete migration

Use admin
db.runcommand ({removeshard: "Shard_name"})

{
 "msg": "Removeshard completed successfully", c16/> "state": "Completed",
 "Shard": "Shard_name",
 "OK": 1
}

2, delete the not fragmented database

Step1, view a database that is not fragmented

A database that is not fragmented, including two parts:

1, the database has not been fragmented, the data is not used sh.enableSharding(“db_name”) , in database config, the database partitioned field is False

2, there is collection in the database is not fragmented, that is, the current fragment is the main fragment of the set

Use config
db.databases.find ({$or: [{"Partitioned": false},{"PRIMARY": "Shard_name"}]})

For partitioned=false a database, all of its data is stored in the current Shard, and for a database partitioned=true,primary= "Shard_name", there is no fragmentation (un-sharded collection) stored in the database , you must change the primary fragmentation of these collections.

Step2, modifying the primary fragmentation of the database

Db.runcommand ({moveprimary: "Db_name", To: "New_shard"})

Four, add fragmentation

Because fragmented storage is part of a dataset, it is recommended that replica set be used as a shard to ensure high availability of data, even if the replica set contains only one member. Connect to MONGOs and use SH auxiliary function to increase fragmentation.

Sh.addshard ("Replica_set_name/host:port")

Standalone Mongod is not recommended as a shard

Sh.addshard ("Host:port")

Five, extra large block

In some cases, the chunk will continue to grow, exceeding the chunk size limit, becoming an oversize block (jumbo chunk), resulting in extra large blocks because all doc in chunk uses the same slice key (Shard key), causing MongoDB to not split the chunk. If the chunk continues to grow, it will result in uneven distribution of chunk and a performance bottleneck.

There are limitations in chunk migrations: each chunk size cannot exceed 25,000 doc, or 1.3 times times the configuration value. The default configuration value of chunk size is 64MB, chunk that exceeds the limit is MongoDB marked as extra large (jumbo chunk), and MongoDB cannot migrate extra large blocks to other shard.

MongoDB cannot move a chunk if the number of documents in the chunk exceeds either 250000 documents or 1.3 times the Resul T of dividing the configured chunk size by the average document size.

1, view extra large blocks

Using Sh.status (), can find extra large blocks, large blocks behind the existence of jumbo flag

 {"X": 2}-->> {"X": 3} on:shard-a Timestamp (2, 2) jumbo

2, distribute extra large blocks

Large blocks cannot be split, can not be distributed automatically through the equalizer, and must be distributed manually.

Step1, turn off the equalizer.

Sh.setbalancerstate (False)

Step2, increase the configuration value of chunk size

Since MongoDB does not allow large blocks that are larger than the limit to move, it is necessary to temporarily increase the configuration value of chunk size and distribute the oversize blocks evenly into the fragmented cluster.

Use config
db.settings.save ({"_id": "Chunksize", "value": "1024"})

Step3, move extra large blocks

Sh.movechunk ("Db_name.collection_name", {sharded_filed: "Value_in_chunk"}, "New_shard_name")

STEP4, Enable equalizer

Sh.setbalancerstate (True)

STEP5, refreshing the MONGOs configuration cache

Forces MONGOs to synchronize configuration information from config server and flush the cache.

Use admin
db.admincommand ({flushrouterconfig:1})

Six, Equalizer

Equalizer is changed by MONGOs, that is to say, MONGOs is not only responsible for routing the query to the corresponding Shard, but also for the balance of the data block. In general, MongoDB will automatically handle data equalization by config.settings being able to view the state of the balancer, or through the SH auxiliary function

Sh.getbalancerstate ()

Returns true to indicate that the equalizer is running, the system handles data equalization automatically, and the SH auxiliary function can close balancer

Sh.setbalancerstate (False)

Balancer cannot immediately terminate a block migration operation that is running, and when MONGOs is converted to balancer, it will request a Balancer lock, check the 看config.locks collection,

Use config
db.locks.find ({"_id": "Balancer"})

--or 
sh.isbalancerrunning ()

If state=2 indicates that the balancer is active, if state=0, the balancer has been closed.

The equalization process is actually migrating chunks of data from one Shard to another Shard, or to split a large chunk into small chunk, and then migrate the small pieces to other shard, block migration and split will increase the system IO load, preferably the equalizer active time limit to the system idle , you can set the active time window of the balancer to restrict the balancer and migration of data blocks within a specified time interval.

Use config

db.settings.update (
{"_id": "Balancer"},
"$set": {"ActiveWindow": {"Start": "23:00", "Stop": "04:00"}}),
true
)

The Equalizer splits and moves the object is chunk, the equalizer only guarantees chunk quantity in each shard is balanced, as for each chunk contains the doc quantity, does not necessarily be balanced. There may be some chunk contains a large number of docs, while some chunk contain a small number of docs, or even contain no doc. Therefore, should be careful to select the partitioning of the index key, that is, if a field can meet the vast majority of query needs, but also to make the number of doc evenly distributed, then the field is the best choice of key.

Summarize

The above is the entire content of this article, I hope that the study or work to bring some help, if you have questions you can message exchange.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.