(go) Prepare for the first deployment of MongoDB: capacity Planning and monitoring

Source: Internet
Author: User
Tags documentation memory usage mongodb documentation mongodb monitoring prepare

If you have completed the development of your new MongoDB application and are now ready to deploy it into your product, you and your operations team will need to discuss some key issues:

    • What is the best deployment practice?
    • What key metrics do we need to monitor to ensure that the application meets its required service levels?
    • How can I determine the timing of adding shards?
    • What tools are available to back up and restore the database?
    • How can I securely access all the new real-time big data?

This article describes hardware selection, expansion, ha, and monitoring. Before looking at the details, let's first deal with one of the most common issues:

What is the difference between deploying MongoDB and deploying an RDBMS?

You will find MongoDB as a document database that shares many of the same concepts, operations, policies, and processes with the relational database you are already familiar with. Processes and best practices for monitoring, indexing, tuning, and backing up content can be applied to MongoDB. At the same time, if you want to start your own training, you can get a free online course from the University of MongoDB for developers and DBAs.

Related Vendor Content

Drip Travel iOS Client architecture Evolution Path! How clients respond to weak networks! function programming in swift and SWIFT in functional programming! How far are you from being a qualified technology leader? The world's most advanced container technology event

Related Sponsors

GMTC Global Mobile Technology Conference June 24, 2016-25th, Beijing, click to learn more!

System performance and capacity planning are two important topics, and any deployment needs to address both of these issues, whether it is an RDBMS or a NoSQL database. As part of the planning, we should establish baselines for data volumes (volume), System load, performance (throughput and latency), and capacity utilization. These baselines should reflect your expectations of the workloads that the database performs in the production environment, and they should be adjusted periodically as the number of users, application capabilities, performance SLAs, or other factors change.

The baseline will help you understand when the system is running according to the design, and when problems that may affect the quality of the user experience or other critical system factors begin to emerge.

The key deployment elements, including hardware, scaling, and ha, are discussed below, as well as what you should monitor in order to maintain optimal system performance.

Clear your working Set

When optimizing the hardware budget for the deployment of MongoDB, RAM should be either close to the first bit of the list.

RAM is widely used in MongoDB for low-latency database operations. In MongoDB, all data is read and manipulated through a memory-mapped file. Reading data from memory is measured in nanoseconds, while reading from disk is measured in milliseconds, so reading data from memory is almost 100,000 times times faster than reading from disk.

The collection of data and indexes that are most frequently accessed during normal operations is called the working set, and ideally they should be in RAM. The working set may be a small part of the entire database, such as the application data associated with the most recent event or the most popular product that is visited most frequently.

The page error that occurs when MongoDB attempts to access data is not loaded into RAM. If there is free memory, the operating system navigates to the pages on disk and loads them directly into memory. However, if there is no free memory, the operating system must write a page in memory to disk and then read the requested page into memory. This process is slower than accessing data that already exists in memory.

Some operations can inadvertently erase a large number of working sets from memory, which can have a serious impact on performance. For example, for a query that browses all documents in a database, if the database is larger than the RAM on the server, it will cause the document to be read into memory and the working set written out to disk. Defining the appropriate indexes for your queries at the project's schema design stage will greatly reduce the likelihood of this risk occurring. The MongoDB description operation provides information for the use of query plans and indexes.

The MongoDB Service Status command contains a useful output: The working Set document, which provides the estimated size of a MongoDB instance working set. The operations team can track the number of pages accessed by the instance at a given time, including the elapsed time between the oldest document in the working set and the most recent document. By tracking These metrics we can find out when the work rally approaches the current RAM limit and actively takes action to ensure that the system is extensible.

MongoDB Management Services and Mongostat can help users monitor memory usage, which we'll discuss in more detail below.

Storage and disk I/O

MongoDB does not require shared storage (such as a storage area network). MongoDB is able to use locally attached storage and solid state drives (SSDs).

Most of the disk access patterns in MongoDB do not have sequential attributes, and the result is that customers can gain significant performance gains from using SSDs. We have observed good results and strong performance with SATA SSDs and PCI. Commercial SATA rotary drives can rival higher-cost rotary drives, thanks to MongoDB's non-sequential access pattern: it should be used more efficiently on more RAM or SSDs instead of more expensive rotating drives.

While data files benefit from SSDs, MongoDB's journal files are a good candidate for fast conventional disks due to their own high-order write properties.

Most MongoDB deployments should use RAID-10. RAID-5 and RAID-6 do not provide sufficient performance. RAID-0 provides good write performance, but has limited read performance and is less tolerant. The deployed MongoDB can provide strong data availability through the replica set (discussed below), while users should consider using RAID and other factors to meet the desired SLA availability.

Although we should design a MONGODB system to make its working set suitable for memory, disk I/O remains a critical performance consideration. MongoDB periodically flushes writes to disk and submits it to the journal, so the underlying disk subsystem can become overwhelmed when the write load is heavier. The Iostat command can be used to display high disk utilization and excessive write queues.

CPU selection-speed or kernel?

MongoDB's performance is usually not tied to the CPU. Because MongoDB rarely encounters workloads that require a large number of cores, the best choice for multicore servers with slower clock speeds is faster clock speeds.

Regardless of the system, it is important to measure the utilization of the CPU. If the CPU is observed to be highly utilized but there are no other problems such as disk saturation or page faults, there may be an unusual problem in the system. For example, an infinite loop of mapreduce work or a query that sorts and filters a large number of documents in a working set without a good index can cause a spike in CPU utilization, but they do not cause disk system problems or page faults. The tools used to monitor CPU utilization are described below.

Extend the database--when and how to extend it?

MongoDB provides horizontal scaling capabilities through a technology called sharding. Sharding is able to distribute data between multiple physical partitions, called slices. Sharding enables MONGODB deployments to address the hardware limitations of a single server without increasing the complexity of the application, and the hardware limitations addressed include the bottleneck of RAM and disk I/O.

MongoDB Auto-sharding and application transparency

It is very easy to implement sharding before system resources become limited, so capacity planning and proactive monitoring are important elements when you need to successfully scale your application.

In the following scenario, the user should consider deploying a shard of the MongoDB cluster:

    • RAM limit: the size of the system active working set will quickly exceed the maximum capacity of the system RAM.
    • Disk I/O Limitations: The system has a large amount of write activity, but the operating system does not write data fast enough to meet demand, while/or I/O bandwidth limits the speed at which data is written to disk.
    • storage limit: The data set approaches or exceeds the storage capacity of a single node in the system.

One of the goals of Sharding is to distribute data consistently across multiple servers. If the utilization of server resources is not approximately equal, there may be a potential problem that raises a dispatch error. For example, choosing a bad shard key may result in unbalanced data distribution. In this case, even if not all but most of the queries are directed to the individual MongoDB that is managing the data.

In addition, MongoDB may attempt to redistribute documents to achieve a more optimal balance between servers. While redistribution eventually results in a more satisfying document distribution, there is a large amount of work related to rebalancing the data, which in itself is likely to have an impact on SLAs that result in the inability to achieve the expected performance.

By running the db.currentop () command, you will be able to understand what the cluster is doing now, including document rebalancing across shards.

To ensure that data is distributed evenly across all shards in the cluster, it is important to select an excellent sharding key. The MongoDB documentation contains a tutorial on how to select excellent sharding keys.

High availability of MongoDB replica sets

MongoDB uses local replication to maintain multiple copies of data between replication sets. Replication sets can avoid downtime by discovering errors (server, network, OS, or database) and automated failure repair. The recommended practice is that all MONGODB deployments should be configured for replication.

(Click to enlarge image)

Using MongoDB replica set self-recovery

Modifications to the primary node database are replicated to other level two nodes through a log named Oplog . Oplog contains a collection of sort idempotent operations that are replayed on a level two node. The size of the Oplog is configurable and defaults to 5% of the free disk space.

As illustrated in the following diagram, you can provide fault tolerance for servers, racks, data center failures, and network partitions by locating replicas.

(Click to enlarge image)

Replication latency is monitored as part of the normal operation. It represents the time it takes to replicate a write operation on the primary node to a level two node. A certain delay is normal, but it can cause problems if replication latency grows. Typical causes of replication latency include network latency, connectivity problems, and disk latency (for example, the throughput of a level two node is inferior to the primary node).

The replication status and replication delay can be recovered by the replsetgetstatus command.

Log overview

As part of all deployments, you should monitor the application and database logs to discover errors and view additional system information. It is important to associate the logs of the application with the database in order to determine whether the activities in the application ultimately need to be responsible for other issues in the system. For example, user write spikes may increase the capacity to write to MongoDB, which in turn may overwhelm the following storage system. Without the association of application and database logs, it may take more time to determine that the increase in write capacity is an issue for applications rather than some processes running in MongoDB.

MongoDB Monitoring Tools

MongoDB includes a variety of monitoring tools that enable you to actively manage the operation and performance of your system.

MongoDB Management Services (MMS)

MongoDB Management Services (MMS) provides cloud monitoring and backup services to help users optimize clusters, address performance issues, and mitigate operational risks. The MMS backup service will be discussed in a future article.

MMS Monitoring supports charts, custom dashboards, and custom warnings. MMS requires minimal installation and configuration. The user installs a local agent on all MongoDB instances, which tracks hundreds of key health metrics related to database usage, including:

    • Operand (Op Counters)-number of operations performed per second
    • Memory-mongodb The amount of data being used
    • Lock percentage (Percent)-% of write lock consumption time
    • Background refresh (Background flush)-the average time spent flushing data to disk
    • Connection (Connections)-mongodb the number of currently open connections
    • Queue (Queues)-Number of operations waiting to run
    • Page faults-page error number of disk
    • Replication (Replication)-the length of the primary node operation log and the replication delay
    • Log (Journal)-The amount of data written to the log

(Click to enlarge image)

These metrics are safely reported to the MMS service, telling them where they are handled, aggregated, notified, and visualized in the browser. Users can easily understand the health status of their clusters based on various performance indicators.

Hardware monitoring

Munin node is an open source software program that monitors hardware and reports metrics such as disk and RAM usage. MMS collects this data from Munin node and presents it to the user in the MMS dashboard, along with other data. Because each application and deployment is unique, users should create alerts for spikes in disk utilization, major changes in network activity, and an increase in average query length/response time.

Database analysis Tools

MongoDB provides a performance analysis tool that can record fine-grained information about database operations. The analysis tool can log information for all events, or only those events that have lasted longer than the configured threshold. The analysis data is stored in a fixed set, and users can easily search for related events--querying this collection may be easier than trying to parse the log file.

Other monitoring tools

There are a variety of monitoring tools that allow you to gain a deep understanding of the MongoDB system from other aspects.

    • Mongotop is a tool provided with MONGODB that tracks and reports the current read and write activity of a MongoDB cluster.
    • Mongostat is another tool provided with MongoDB, which provides a comprehensive overview of all operations, including updates, insertion counts, page faults, index loss, and many other important metrics related to system health.
    • Linux tools such as Iostat, Vmstat, netstat, and SAR can also provide valuable information for exploring MongoDB systems in depth.
    • For users on the Windows environment, the Performance Monitor (Performance Monitor, a Microsoft Management Console unit) is a very useful tool that can be used to measure various metrics.

If you want to get more information about monitoring tools and monitoring content, you can view the Monitoring database system page in your MongoDB documentation.

Configure MongoDB

The user should store the configuration options in the MongoDB configuration file. SysAdmins is able to achieve consistent configuration across the cluster in this way. The configuration file supports all the options supported by the MongoDB command line. Installation and upgrade should be done automatically through popular tools such as Chef and puppet, while the MongoDB community also provides and maintains sample scripts for these tools.

A basic MongoDB configuration file resembles the following:

    • Fork = True
    • BIND_IP = 127.0.0.1
    • Port = 27017
    • Quiet = True
    • DBPath =/srv/mongodb
    • LogPath =/var/log/mongodb/mongod.log
    • Logappend = True
    • Journal = True

You can find out more about the MongoDB configuration options through the documentation. The latest recommendations for the specific configuration of the operating system, file system, storage device, and other system related topics are also maintained on the MONGODB Documentation Product description page.

Conclusion

In this article we describe the concepts, operations, and processes that are used to deploy relational databases that can be directly applied to MongoDB, as well as best practices for hardware selection and deployment and monitoring. In addition, a detailed discussion of all these topics can be obtained from the MongoDB Operations Guide (A. pdf file).

(go) Prepare for the first deployment of MongoDB: capacity Planning and monitoring

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.