MongoDB capacity Planning and hardware configuration

Source: Internet
Author: User

 

MONGO is a memory-based database that should try to load all the data in the working set into memory, that is, the memory should be larger than the working set

This article is translated from Chad Tindel's English blog: http://www.mongodb.com/blog/post/capacity-planning-and-hardware-provisioning-mongodb-ten-minutes.

Most MONGODB deployments run in clusters of multiple servers, which increases the complexity of capacity planning and configuration relative to traditional databases. System Architect Chad Tindel provides the implementation team with best practices for planning the size of MongoDB deployments in a hardware configuration presentation at the MongoDB World Congress.

Here are two key concepts related to MongoDB architecture that can help you understand the presentation:

    • Sharding. Sharding is a technique by which MongoDB divides data between servers. MongoDB can automatically balance data between shards, and can add and remove shards without requiring the database to be taken offline.
    • Copy. To ensure high availability, MONGODB maintains a redundant backup of many data. Replication is embedded in MongoDB and can work within the WAN without the need for a professional network.

At the beginning of the speech, Chad mentioned some important suggestions to remember, and then referred to some user stories:

    • List your performance needs in advance. Determine the amount of data you need to store. Determine the size of the working set by analyzing the query to estimate the amount of data you need to read at a time. Calculate the number of requests you want to send per second in your production environment, set the required uptime percentage, and determine how long you can tolerate the delay.
    • Plan a POC proof-of-concept test.

      MongoDB's extensibility allows you to run your app with 10%-20% data on 10%-20% of your hardware. By using this method, you can make patterns, index designs, and understand query patterns, and then redefine the size of the working set you estimate. Detects performance on a single machine and then increases replication and fragmentation if necessary. Use this configuration as a sandbox to test successive revisions of the app.

      You can estimate the size of your working set by executing the following command in MongoDB

      Db.runcommand ({serverstatus:1, workingset:1})

    • Test with real-world workloads. Scale up the proof-of-concept testing, but not deploy until you use real-world data and performance requirements for extensive testing.
    • Constantly monitored and adjusted based on new requirements. An increase in the number of users often leads to more queries and a larger working set, and the new index also drives a larger working set. The application may change its read and write ratios over time. Tools like MongoDB Management Services (MMS) and mongoperf help you monitor changes in system performance parameters. As the requirements change, you can adjust your hardware configuration. Note: By using MongoDB, you can add and remove shards or replica sets without shutting down the database.

Chad then presented two real user stories in the presentation.

Case #1: A Spanish bank

The bank wants to store logs for 6 months. The data for each month requires 3TB of space. Then 6 months of data need to use 6 x 3 = TB of space. They know they want to analyze last month's logs, so the working set size is set to: 1 months of data (3TB) plus index (1TB), which is a 4TB working set.

In a proof-of-concept environment, they choose to use about 10% of the data (2TB). Production requirements require a working set of 4TB size, 4TB/18TB data is then multiplied by our 2TB proof of concept data, resulting in a 444GB concept validation working set size. Users can only provide a maximum of 128GB capacity per server, so in a proof-of-concept environment, they choose to use 4 shards, each with 128G storage. 4 x 128GB = 512GB can meet the requirements of 444GB. The 3 replica set members for read availability and redundancy on each shard provide us with 4 x 3 = 12 physical machines. Two application servers run MONGOs and three configuration servers on the virtual machine are configured for proof-of-concept.

To accommodate 4TB deployment working sets and 18TB data in a 128GB server, they chose to split into 36 shards, each with 128GB of memory (RAM) and 512GB of available storage. Total look: A total of * 128GB = 4.6TB RAM (RAM), 36* 512GB =18TB available storage. As with the concept verification system above, they run MONGOs on 2 application servers and run 3 configuration servers on the virtual machine. As above, each shard has a copy set of three nodes, then 36 shard * 3 Replica set node = 108 physical machine.

Note: MongoDB allows for proof of concept and production configuration using the same standard hardware architecture unit with 128GB memory (RAM) and 512GB of available storage. This allows users to test in a proof-of-concept cluster before adding a real production node to a production cluster.

Case #2: A large online retailer

The retailer wants to migrate their product catalogs from SQL Server to MongoDB. MongoDB has a great advantage in storing directory information. Unlike SQL databases, MongoDB's dynamic mode stores different attributes for each product and does not need to force a blank field placeholder in other product information that does not store the same information.

The retailer wants to build its own data center for customers on the east and West Coast, so they choose to run an active/active configuration. They write new or modified directories in bulk at the most idle time of the night. Only the read operation is performed at the peak used.

They have 4,000,000 product inventory units, each with an average size of 30KB JSON document. They need to provide search queries for a particular product through _id or categories such as "desk" or "Hardware Driver". Most category requests will retrieve an average of about 72 documents. A search engine crawler that tracks all links to the category tree will retrieve 200 documents.

The calculation results show that each product will appear in 2 categories on average. The retailer initially wanted to shard by category and repeatedly store products that appeared in multiple categories. An average of two categories of 4,000,000 products are multiplied by 30KB per stock unit: 8,000,000 times on 30KB equals 240GB plus 30GB index, total 270GB data.

MONGODB Consulting Engineers recommend that the retailer use a non-fragmented replica set because the application requires high read performance, but only requires a separate batch write operation at idle time. The entire 270GB working set is suitable for storage in 384GB of available memory for the large Cisco UCS server that the retailer wants to use, and it does not require the partitioning of the memory working set between shards, thus simplifying their deployment architecture.

MONGODB engineers recommend using a 4-node replica set across 2 data centers to establish a polling machine in a third-party location as a slave node in case the primary server fails. This allows any server in each datacenter to be shut down or in case of a failure, and other servers can function properly. MONGOs's "recent read" Query adjuster will select the closest server based on the most recent delay.

Surprisingly, however, the retailer decided to deploy the system on their company's VMware Cloud platform rather than on their large Cisco servers. Their IT department will not provide any one of the VMware nodes with more than 64GB of memory (RAM). To address the 64GB limit, they deployed 3 shards and deployed 4 nodes on each shard (two East coast, two West Coast plus voting machine): 64GB x 3 = 192GB. Although this does not maintain the 270GB working set, they still decide to bear any consequences of the plug-and-pull. They retain the possibility of adding a fourth shard to maintain more working sets in memory.

A tutorial on hardware configuration from these case studies
    1. For read-intensive applications, plan the size of the server to ensure that the entire working set is supported in memory and replicated for higher availability.
    2. If your server's memory (RAM) is not guaranteed to accommodate working sets in memory, Shard to consolidate memory (RAM) from multiple replica clusters.
    3. Create a concept verification system using the same server hardware as the deployment. In this way, you can configure and test a server in your proof-of-concept system. Next, you can expand a replication set or add a shard as needed, and put it directly into the deployment system.

MongoDB capacity Planning and hardware configuration

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.