The main purpose of the blog is to facilitate the sharing of some time to avoid cramming everywhere to find things. The previous period of time on the open source monitoring software Nagios is very interested, also did in the production environment of large-scale application, and even the brain hole open to each sub-module to do a separate application. Make some records first.
Nagios specific introduction, online have, do not write, take a picture of the current production environment deployment scale, about 300 devices, more than 3,000 monitoring modules, the current performance can also be estimated to run to 6,000 modules.
Here's how to use the Check_http feature in Nagios's Nrpe module to do an auto-downgrade service for applications, as well as email notifications. The goal is to reduce the time for large-area failures from dozens of minutes to 1 minutes or seconds.
Map topology map is as follows, and the actual topological discrepancies (intensive phobia patients with caution):
Further excavation of the Nrpe module function of Nagios