MongoDB Learning Note nine: sharding

Source: Internet
Author: User

Sharding (sharding) refers to the process of splitting data and dispersing it across different machines. It also uses zoning (partitioning) to represent the concept. Spread the data across different machines without the need for powerful mainframe computers to store more data and handle larger loads.
"Auto Shard in MongoDB"
MongoDB runs a routing process before The Shard, which is named MONGOs. The router knows where the data is stored, so the application can connect it to send the request normally. MONGOs the details of the Shard are hidden from the app.
When to Shard?
· The machine's disk is not enough.
· A single mongod has not been able to meet the performance needs of writing data.
· Want to put a lot of data in memory to improve performance.
"Tablet key"
When you set up a shard, you need to select a key from the collection and use the value of the modifier as the basis for splitting the data. This key becomes the slice key (Shard key).
With the addition (or deletion) of slices, MongoDB will rebalance the data so that the flow of each piece is more uniform and the amount of data is within reasonable range.
"Shard an existing collection"
Suppose you have a collection of storage logs, and now you want to shard. We turn on the Shard function and tell MongoDB to use "timestamp" as the chip key to put all the data on one piece. You can insert data at will, but it's always on one piece.
Then, add a slice. After this piece has been built and run, MongoDB splits the set into two halves, called blocks. Each block contains all the documents for a certain range of slice key values, so it is assumed that one piece contains a document with a timestamp before June 26, 2003, and another containing a document after June 27, 2003. One of the pieces will be moved to the new piece.
If the timestamp of the new document is before June 27, 2003, it is added to the first block, or to another block.
"Creating Shards"
There are two steps to creating a shard: Starting the actual server and then deciding how to slice the data.
A shard typically has 3 components:
· Chip
A slice is a container that holds the child collection data. Can be a single Mongod server (for development and testing) or a replica set (for production use). So, several times in a single server, there can only be one master server, the other servers to save the same data.
· MONGOs
MONGOs is a router process that is available in each version of MongoDB. It routes all requests and then aggregates the results. It does not itself store data or configuration information (but it caches information from the configuration server).
· Configure the server
The configuration server stores the configuration information for the cluster: the correspondence between the data and the slices. MONGOs does not permanently store data, so you need a place to store the Shard configuration. He will get synchronized data from the configuration server.
"Start Server"
Start the configuration server and MONGOs first. The configuration server needs to be started first because MONGOs will use the configuration information on it. The configuration server starts just like a normal mongod.
$ mkdir-p ~/dbs/config
$./mongod--dbpath ~/dbs/config--port 20000
Configuring the server does not require much space and resources (200MB of actual data consumes approximately 1KB of configuration space).
The MONGOs process can now be established for the application to connect. This routing server does not need to connect to the data directory, but it is important to indicate the location of the configuration server:
$./mongos--port 30000--configdb localhost:20000
Shard management is usually done through MONGOs.
adding slices
A piece is an ordinary mongod instance (or replica set):
$ mkdir-p ~/dbs/shard1
$./mongod--dbpath ~/dbs/shard1--port 10000
Now connect the MONGOs that you just started and add a slice to the cluster. To start the shell, connect MONGOs:
$./mongo Localhost:30000/admin
MongoDB Shell version:1.6.0
Url:localhost:30000/adminconnecting to Localhost:30000/admin
Type ' help ' for help
>
Once you have determined that the connection is MONGOs instead of Mongod, you can add the slices through the Addshard command:
> Db.runcommand ({addshard: "localhost:10000", allowlocal:true})
{
"added": "localhost:10000",
"OK": true
}
When running shards on localhost, the "allowlocal" key must be set. MongoDB tries to avoid configuring the cluster locally due to misconfiguration, so let it know that this is just development, and we know exactly what we are doing. If you are in a production environment, deploy it on a different machine.
When you want to add a piece, run Addshard. MongoDB will be responsible for integrating slices into the cluster.
"Slicing Data"
MongoDB does not publish every piece of data that is stored directly, it has to open the Shard feature at the level of the database and collection first. The following example takes "_id" as the bar collection for the fast-slicing Foo database. First you have to turn on the Shard function of foo:
> Db.runcommand ({"enablesharding": "foo"})
When a database is fragmented, its internal collections are stored on different slices and are the preconditions for the shards of those collections.
After the Shard has been started at the level of the database, you can use the Shardcollection command to shard the collection:
> Db.runcommand ({"Shardcollection": "Foo.bar", "key": {"_id": 1}})
In this way, the collection is fragmented according to "_id", and when the data is added, it is automatically dispersed across the slices according to the value of "_id".
"Production Configuration"
The following conditions are required to successfully build the Shard:
· multiple configuration servers.
· Multiple MONGOs servers.
· Each slice is a replica set.
· Set W correctly.
Example: Set up 3 configuration servers:
$ mkdir-p ~/dbs/config1 ~/dbs/config2 ~/dbs/config3
$./mongod--dbpath ~/dbs/config1--port 20001
$./mongod--dbpath ~/dbs/config2--port 20002
$./mongod--dbpath ~/dbs/config3--port 20003
Then, when you start MONGOs, you should connect it to these 3 configuration servers:
$./mongos--configdb localhost:20001,localhost:20002,localhost:20003
The configuration server uses a two-step commit mechanism-rather than asynchronous replication of ordinary MongoDB-to maintain different copies of the cluster configuration. This ensures the consistency of the cluster state. This also means that when a configuration server is down, the cluster configuration information will be read-only. Clients are also able to read and write, but only after all configuration servers have been backed up can the data be re-balanced.
The number of MONGOs is not restricted. It is recommended that you run only one MONGOs process for an application server. This allows each application server to MONGOs with the local Uighur, and if the server does not work, there will be no application trying to talk to the MONGOs.
"Sturdy piece."
In a production environment, each slice should be a replica set. This will not cause the entire piece to fail if the individual server is broken. The copy set can be added as a slice with the Addshard command. Add the name and seed of the replica set as long as you specify it.
For example, to add the replica set Foo, which contains a server prod.example.com:27017 (and other servers), you can add it to the cluster using the following command:
> Db.runcommand ({"Addshard": "foo/prod.example.com:27017"})
If Prod.example.com is hung, MONGOs will know that it is connected to a replica set and will use the new master node.
"Manage Shards"
The Shard information is mainly stored on the config database, so it can be accessed by any process connected to the MONGOs.
"Configuration Collection"
The code in the next sections assumes that MONGOs is already connected in the shell and that the use config is already running.
⒈ Tablets
You can find all the slices in the Shards collection:
> Db.shards.find ()
{"_id": "Shard0", "host": "localhost:10000"}
{"_id": "Shard1", "host": "Localhost:10001"}
⒉ Database
The Databases collection contains a list of databases already on the chip and some related information:
> Db.databases.find ()
{"_id": "admin", "partitioned": false, "PRIMARY": "Config"}
{"_id": "foo", "partitioned": false, "PRIMARY": "Shard1"}
{"_id": "X", "partitioned": false, "PRIMARY": "Shard0"}
{
"_id": "Test",
"Partitioned": true,
"PRIMARY": "Shard0",
"Sharded": {
"Test.foo": {
"Key": {"X": 1},
"Unique": false
}
}
}
Here are all the available databases and some basic information.
· "_id", string
"_ID" indicates the data name.
· "Partitioned", Boolean type
True indicates that the Shard feature is enabled.
· "PRIMARY", string
This value corresponds to the "_id" of the slice, indicating where the "stronghold" of the database is. The database will always have a base camp, regardless of whether it is fragmented or not. If a shard is made, a slice is randomly selected when the database is created. That is, base camp is where you start to create database files. Although the database will use many other servers when sharding, it will start with this slice.
⒊ Block
The block information is saved in the chunks collection. It has a lot of interesting things, and you can see exactly how the data is sliced into clusters:
> Db.chunks.find ()
{
"_id": "Test.foo-x_minkey",
"Lastmod": {"T": 1276636243000, "I": 1},
"NS": "Test.foo",
"Min": {
"X": {$minKey: 1}
},
"Max": {
"X": {$maxKey: 1}
},
"Shard": "Shard0"
}
The set of blocks is like this: The Block ranges from-∞ (Minkey) to ∞ (Maxkey).

MongoDB Learning Note nine: sharding

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.