1. sharding
Concept: In MongoDB There is another cluster, that is, sharding technology, can meet the needs of a large number of MONGODB data volume growth.
When MongoDB stores massive amounts of data, a machine may not be enough to store data enough to provide acceptable read and write throughput. At this point, we can divide the data on multiple machines so that the database system can store and process more data.
1. Sharding Introduction Sharding is the process by which data is split and dispersed on different machines. Sometimes called partitioning. The data is dispersed on different machines, and no powerful mainframe computer can store more data and handle larger loads. Manual sharding is possible with almost all database software Applications need to maintain connections to several different database servers, each of which is completely independent. The application manages different data on different servers, and the storage village needs to be done on the correct server. This approach works well but is difficult to maintain, such as adding nodes to a cluster or removing nodes from a cluster is difficult , it is not easy to adjust the data distribution and load patterns. MongoDB supports automatic sharding and can get rid of the management of manual shards. The cluster automatically splits the data to do load balancing. The basic idea of the 2.MongoDB auto-Shard MongoDB shard is to divide the set into small pieces. The blocks are scattered across several slices, each of which is only part of the total data. The application does not have to know which piece of data to correspond to, or even to know that the data has been split. So before the Shard to run a routing process, the process name MONGOs, the router knows where all the data is stored, so the application can connect it to send the request. For the application, it only knows that a common mongod is connected. The router knows the correspondence with the slices, Ability to forward requests to the correct slices. If the request responds, the router collects it and sends it back to the app. When there is no Shard, the client connects to the Mongod process, and the client connects to the MONGOs process when the Shard is fragmented. The MONGOs hides the details of the Shard from the app's perspective. There is no difference between sharding and non-sharding. When you need to expand, you do not have to modify the application's code. When you need a shard: A. The machine's disk is not enough. B. A single mongod has not been able to meet some of the data performance requirements C. Want to put a lot of data in memory improve performance in general, start with Then convert the ingredients when needed. 3. When you set a shard, you need to select a key from the collection and use the value of the key as the basis for splitting the data. This key becomes the slice key. Suppose that a document collection represents a person, and if the name "name" is chosen as the chip key, the first one may hold a document that begins with A-f. The second g-p The third document that contains q-z. With the addition or deletion of slices, MongoDB will rebalance the data, which is a more balanced flow of each piece, the amount of data is also within reasonable range (such as the volume of large volumes of the data stored may be less than the data on the chip) 4. The existing set Shard is assumed to have a collection of storage logs, Now you want to shard. We turn on the Shard function and tell MongoDB to use "timestamp" as the chip key, all the data is put on a piece. You can insert data at will, but it's always on one piece. Then, add a sliceAfter this piece is built and running, MongoDB splits the set into two halves and becomes a block. Each block contains all documents with a slice key value within a certain range, assuming that one piece contains a document with a timestamp of 2011.11.11 ago, Another piece contains documents from the 2011.11.11 onwards. One of the pieces will be moved to the new piece. If the timestamp of the new document is before 2011.11.11, it is added to the first block, otherwise to the second block. 5. The selection of the TAB key or the Random tab key determines the distribution of the insertion operation between the slices. If a key such as "timestamp" is selected, the value is likely to grow, and without too much interruption, all data is sent to a slice (the one that contains the date after 2011.11.11). If you have added a new slice, and then split the data, it will all be imported to a single server. With the addition of new slices, MongoDB Ken will be able to split the 2011.11.11-2021.11.11 after 2011.11.11. If the document is longer than 2021.11.11, All documents will also be inserted in the last piece. This is not suitable for high-load situations, but pressing the photo key query can be very efficient. If the write load is relatively high, want to evenly spread the load to each piece, you have to choose evenly distributed chip keys. Timestamp hash value in log example, no pattern "logmessage" are compounded by this condition. Regardless of whether the chip key randomly jumps or increases steadily, the change of the chip key is very important. For example, if there is a value of "LogLevel" key with only 3 values "DEBUG", "WARN", "ERROR", MongoDB can no longer use it as a tablet key to divide the data into more than 3 slices (because there are only 3 values). If the key changes too little, but want to make it as a chip key, you can combine this key with a large variable key, create a composite key, such as "LogLevel" and "timestamp" combination. Select the slice key and create the slice key very much like the index, thinking that the two principles are similar. Actually, , the tablet key is also the most commonly used index. 6. Effect of chip key on operation the end user should not be able to differentiate between shards, but to understand how the query differs in the case of selecting different slice keys. Suppose or that represents a collection of people, according to the "name" Shard, there are 3 slices, whose initials range from A-Z. The following are queried in different ways: Db.people.find ({"Name": "Refactor"}) MONGOs will send this query directly to the Q-z slice, and after receiving the response, go directly to the client Db.people.find ({"name": {"$lt": "L"}}) MONGOs sends the its first to a-f and g-p slices, and then forwards the results to the client. Db.people.find (). Sort ({"Email": 1}) MONGOs willThere is an on-chip query that returns the result with a merge sort to ensure that the results are in the correct order. MONGOs uses cursors to fetch data from each server, so you don't have to wait until all the data is available to send bulk results to the client. Db.people.find ({"Email": "Email protected] "}) MONGOs does not track the" email "key, so it is not known that the query should be sent to that slice. So he sends the query to all slices. If you insert a document, MONGOs will send it to the corresponding slice based on the value of the" name "key.
2. How do I shard?
1. There are two steps to creating a shard: Starting the actual server and then deciding how to slice the data. Shards typically have 3 components: a. A slice is a container for storing subcollections ' data, but a single Mongod server (for development and testing) or a replica set (for production). So there are multiple servers, only one master server, and the other servers hold the same data. The B.mongosmongos is the router process that the MongoDB is equipped with. It routes all requests, The result is then aggregated. It does not store data or configuration information itself but caches the configuration server's information. C. Configuring the server configuration server stores the configuration information for the cluster: data and slice correspondence. MONGOs Non-permanent deposit data, So you need a place to store the configuration of the Shard. It gets the synchronization data from the configuration server. 8. Start the server first to start the configuration server and MONGOs. The configuration server needs to be started first. Because MONGOs will use the configuration information on it. Configure the server to start up like normal Mongod mongod--dbpath "F:\mongo\dbs\config"- -port 20000--logpath "F:\mongo\logs\config\MongoDB.txt"--rest the configuration server does not require a lot of space and resources (200M actual data takes up approximately 1kB of configuration space) to establish the MONGOs process, Altogether the application is connected. This Routing server Connection data directory is not required, but be sure to indicate the location of the configuration server: MONGOs--port 30000--configdb 127.0.0.1:20000--logpath "F:\mongo\logs\ Mongos\mongodb.txt "Shard management is usually done through MONGOs. Adding a slice is a normal mongod instance (or replica set) Mongod--dbpath "F:\mongo\dbs\shard"--port 10000--logpath "F:\mongo\logs\shard\MongoDB.txt" --restmongod--dbpath "F:\mongo\dbs\shard1"--port 10001--logpath "F:\mongo\logs\shard1\MongoDB.txt"-- Rest connects the MONGOs that you just started, adding a slice to the cluster. Start the shell, connect MONGOs: Make sure the connection is MONGOs instead of mongod, add the slice with the Addshard command: >mongo 127.0.0.1:30000Mongos> Db.runcommand (... {... "Addshard": "127.0.0.1:10000",... "Allowlocal": True ...} ... ) Sat Jul 10:46:38 uncaught exception:error {"$err": "Can ' t find a shard toput new db on", "code": 10185}mongos> Use adminswitched to DB adminmongos> Db.runcommand (... {... "Addshard": "127.0.0.1:10000",... "Allowlocal": 1 ... }... ) {"shardadded": "shard0000", "OK": 1}mongos> db.runcommand (... {... "Addshard": "127.0.0.1:10001",... "Allowlocal": 1 ... }... ) {"shardadded": "shard0001", "OK": 1} When you run the tablet in this machine, you have to set the Allowlocal key to 1.MongoDB try to avoid due to the wrong configuration, the cluster is configured locally, so let it know that this is only a development, And we know exactly what we're doing. If you are in a production environment, you want to deploy it on a different machine. When you want to add a piece, run Addshard. MongoDB will be responsible for integrating slices into the cluster. Slicing data---Continue to operate on MONGOs connections MongoDB does not publish every piece of data that is stored directly, it has to open the Shard feature at the level of the database and collection first. E:\mongo\bin>db.runcommand ({"enablesharding": "Test"})//enable the Shard feature on the test database. Once the database is fragmented, its internal collections are stored on different slices, It is also a precondition for these collection shards. After the Shard is enabled at the database level, you can use the shardcollection command to accumulate and make shards: Db.runcommand ({"Shardcollection": "Test.refactor", "Key": {"name": 1})//Shard The Refactor collection of the test database, the slice key is name if you now add data to the Refactor collection,will be automatically dispersed to each slice based on the value of "name".
3. Operation Example:
1. Start the configuration server
D:\mongodb\mongo1\bin>mongod--dbpath=d:\mongodb\mongo1\config--port 20000--logpath=D:\mongodb\mongo1\logs\ Config\mongodb.txt--rest
2.Router Process Server for MONGODB configuration--Shard Management is usually done through MONGOs.
D:\mongodb\mongo1\bin>mongos--port--configdb 127.0.0.1:20000--logpath=d:\mongodb\mongo1\logs\mongos\ MongoDB.txt
3. Create a shard, that is, an ordinary Mongod instance
Shard 1:
D:\mongodb\mongo1\bin>mongod--port 10000--DBPATH=D:\MONGODB\MONGO1\DB1--logpath=d:\mongodb\mongo1\logs\shard \mongodb.txt--rest
Shard 2:d:\mongodb\mongo1\bin>mongod--port 10001--dbpath=d:\mongodb\mongo1\db2--logpath=d:\mongodb\mongo1\logs\ Shard1\mongodb.txt--rest
4.connect the MONGOs that you just started, adding a slice to the cluster. Start the shell and connect the MONGOs:
D:\mongodb\mongo1\bin>mongos--port--configdb 127.0.0.1:20000--logpath=d:\mongodb\mongo1\logs\mongos\ MongoDB.txt
MongoDB's Shard cluster configuration