Policies of nagios monitoring server (1)

Source: Internet
Author: User

Nagios Monitoring Server.

Policy 1: select a Monitored object

In a large network, the Monitored Objects may include servers, firewalls, switches, routers, and other devices, as well as services running on various objects. However, we do not need to put all objects in this monitoring system. For example, putting some test systems in monitoring will cause the trouble that the man above received the alarm text message throughout the night. Therefore, selecting the right monitoring object is a prerequisite for effective monitoring. For personal suggestions, there are only those with high importance levels, objects that cannot be stopped at Will-such as online transaction systems-are worth monitoring. Of course, server users always want you to monitor it, even if it is not so important.

Policy 2: Fault Alarm Method Selection

The boss hopes that we will sit on the computer tirelessly, but he is just wishful thinking. A proper fault warning mechanism must be provided for the monitoring system. Currently, common alarm mechanisms include email, SMS, msn, and web page display. Among these methods, SMS alarm is the best. In our sleep at night, we can't receive emails at any time, but text messages can wake us up and inform us that a fault has occurred, and before the boss and users find this fault. For organizations without channels, leasing sp services is a safer way. Other methods such as mobile Feixin are not well tested and are not suitable for key business operations. In addition, I used a small trick to let the monitoring platform send me a text message every afternoon, no matter whether there is any fault, so that I can know whether the text message interface is normal.

Policy 3: Fault Alarm timeliness and interval Selection

Due to uncontrollable factors such as network communication, there may be false positives. It is not a good policy to send an alarm if the alarm is set to one failed detection. Experience shows that the system fails to send messages three to four times and does not delay us in troubleshooting. If an alarm is triggered when the test fails, the phone and text message space can be quickly filled up, which will make you sleep well.

After a fault alert is sent, it is generally sent endlessly until the fault is rectified and normal, and a message similar to *** is OK will be sent !" . Setting the alarm sending interval also requires a lot of effort. If it is too short, it will consume your text message fee continuously. If it is too long, I am afraid it will not be enough to wake up sleeping people; if no one handles the fault or stops the notification, the alarm information will be sent continuously.

So how is a suitable range? My practice is to trigger an alarm when the alarm fails for four times. The alarm interval is 10 minutes. The alarm is sent for a total of eight times, and then the alarm is stopped. If no one handles the alarm for 3rd Times, I will notify you by phone, if no response is received, the monitoring of the object is canceled and the event is recorded.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.