Experience in using the MongoDB sharded cluster of bugsng,
Bugsnail G is a startup company that provides real-time Bug Tracking and detection services for mobile app developers. bugsnail G has used MongoDB to store more than TB-level document data. From the first version of Bugsnag, they used MongoDB to store business data. Recently, Simon Maynard, A Bugsnag engineer, shared his MongoDB sharding cluster experience on his blog and opened up several common scripts.
Tag Aware Sharding)
Tag-based sharding is a new feature introduced in MongoDB 2.2. This feature allows you to manually control data sharding so that data can be stored on a suitable shard node. The specific method is to tag the shard node and then map the sharding key to these IDs by range. In bugsnail g, the loading of each page involves user sets, such as querying whether a user logs on. When an application writes a large amount of data to the primary Shard, all user requests may become slow. To solve this problem, the bugsnail g adds a tag to all the shard sets in the large shard so that the user set can be stored on a small machine, next, you can directly access user data in the memory. For details about how to use the sharding feature with tags, refer to Asya's blog.
Empty data block
When the old data is deleted, empty data blocks will appear in the parts, which will lead to unbalanced parts. The algorithm for balancing shards only adjusts the number of data blocks in shards, ignoring the size of data blocks. In MongoDB 2.6, A mergeChunks command is added to merge empty data blocks into continuous data blocks. However, this command cannot be executed automatically, so bugsng compiled a script, check data blocks in sequence, and automatically merge empty data blocks.
Big Data blocks
A large data block is used to determine that the size of the data block exceeds the configured size. A script is written by bugsnail g to discover and adjust the size of the big data block. MongoDB's Ruby interface Mongoid is used to connect mongod and mongo instances. The script splits large data blocks to ensure balanced distribution of data sets in the cluster.
Orphaned documents)
Under normal circumstances, there are no isolated documents in the system. However, some failure situations during block migration may leave isolated documents. Isolated documents can be safely deleted. In MongoDB 2.6, you can use the cleanupOrphaned command to delete isolated documents from shards. For more information about the isolated documents, refer to this blog by MongoDB engineers.
MoveChunk directory
Files in the MoveChunk directory are temporary files generated during the partitioning and balancing operation. After the operation, these files can be deleted. Bugsnail G uses scheduled tasks to regularly clear the directory. MongoDB also supports disabling this function. You can test it on your own.
Monitor the sharding Environment
Shell commands
- Db. collection. getShardDistribution (): shows how the clustering is distributed in a sharded cluster. You can use this command to determine when a set on the shard will suddenly become larger than other shards.
- Db. stats (): prints the database status under each shard. You can use this command to track the data size. You can input the 1024*1024*1024 parameter to display the data size in GB.
- Sh. status (): displays the distribution of data blocks in the entire cluster. It can be used to check whether the data is evenly distributed.
Mongostat
Mongostat is a state detection tool provided by MongoDB. When a MongoDB cluster encounters a problem, you can run mongostat-discover to check the performance indicators of each mongos machine in the cluster.
The author concluded that it is not difficult to run a MongoDB sharded cluster, but some small problems may occur from time to time during the running process. There are a lot of MongoDB blog posts in the bugsnail g blog, so you can learn it on your own.