The technical stack of operation and maintenance monitoring is roughly introduced in the previous article, but in fact, some open source monitoring software has comprehensive functions.
3. Open source system monitoring software
Zabbix VS Nagios VS Open-Falcon
The technical stack of operation and maintenance monitoring is roughly introduced above, but in fact, some open source
monitoring software functions are very comprehensive, providing support from data collection to data display. If you are a small team and do not want to build your own monitoring platform, choose these open source Software is actually a good choice.
Zabbix
Zabbix (pronounced zæbix) is an enterprise-level open source solution that provides distributed system monitoring and network monitoring functions based on the WEB interface.
Zabbix can monitor various network parameters to ensure the safe operation of the server system; and provides a flexible notification mechanism to allow system administrators to quickly locate/solve various problems.
Zabbix consists of 2 parts, Zabbix server and optional component Zabbix agent.
Zabbix server can provide remote server/network status monitoring, data collection and other functions through SNMP, Zabbix agent, ping, port monitoring, etc. It can run on Linux, Solaris, HP-UX, AIX, Free BSD, Open BSD, On platforms such as OS X.
As an enterprise-level open source distributed monitoring solution, Zabbix supports the implementation of the collection of millions of index data from tens of thousands of servers, virtual machines, network equipment, etc. It has the functions of common commercial monitoring software (mainframe performance)
Monitoring, network device performance monitoring, database performance monitoring, FTP and other common protocol monitoring, multiple alarm methods, detailed report chart drawing) support for automatic discovery of network devices and servers; support for distributed, centralized display and management of distributed monitoring points ; Strong scalability, the server provides a common interface, you can develop and improve all kinds of monitoring.
Description of important components of Zabbix:
zabbix server: the core component responsible for receiving the report information sent by the agent, all configuration, statistical data and operation data are organized by it;
database storage: dedicated to store all configuration information and data collected by zabbix;
web interface: GUI interface of zabbix;
proxy: optional component, commonly used in distributed environments with many monitoring nodes, proxy server collects part of the data and forwards it to the server, which can reduce the pressure on the server;
agent: deployed on the monitored host, responsible for collecting host local data such as CPU, memory, database and other data sent to the server or proxy;
advantage:
All in One: deployment is very convenient
Server has very low requirements for host performance.
Automatically discover servers and network equipment
Distributed monitoring and WEB centralized management function
At the same time, it supports agent collection and agentless collection. The host collects data through agent or ipmi. Network devices and storage devices collect data through SNMP clients. The agent supports common UNIX and Windows operating systems.
Comprehensive functions, data collection, data storage, data display, event alarm.
Open interface, strong extensibility, easy to write plug-ins
insufficient:
Database bottleneck, using mysql as the underlying storage, when reading and writing big data, the pressure on the database is very large
Need to install agent in the host
Poor support for container monitoring, you need to expand it yourself.
Supplemental comparison:
Zabbix is an enterprise-level open source operation and maintenance platform that provides distributed system monitoring and network monitoring functions based on the WEB interface. It is also the most widely used monitoring software among domestic Internet users. More than 85% of users encountered by
Cloud Intelligence are using Zabbix to do Monitoring solution.
Easy to get started, simple to use, powerful, and open source is the most intuitive evaluation of Zabbix. Zabbix is easy to manage and configure, and can generate relatively beautiful data graphs. Its automatic discovery function greatly reduces the workload of daily management. Rich data collection methods and API interfaces allow users to flexibly collect data, and the distributed system architecture can support monitoring. More equipment. In theory, through the plug-in architecture provided by Zabbix, it can meet any needs of enterprises.
User group: more than 85% of pan-Internet companies.
advantage:
1. Enterprise-level distributed open source monitoring software that supports multiple platforms
2. Simple installation and deployment, flexible integration of multiple data collection plug-ins
3. Powerful function, can realize complex multi-condition alarm,
4. Comes with drawing function, the obtained data can be drawn into graphics
5. Provide multiple API interfaces and support calling scripts
6. When there is a problem, the command can be executed remotely and automatically (the execution permission of the agent needs to be set)
Disadvantages:
1. Inconvenient batch modification
2. Although the community is mature, there are relatively few Chinese materials and limited service support;
3. It is easy to get started and can realize basic monitoring, but the deep-level requirements need to be very familiar with Zabbix and carry out a large number of secondary custom developments, which is more difficult;
4. There are relatively many system-level alarm settings, if there is no filtering, there will be a lot of alarm emails; and custom project alarms need to be set by themselves, the process is more cumbersome;
5. Lack of data summary function, if you cannot view the average value of a group of servers, you need to conduct secondary development;
6. The data report needs special secondary development definition;
Nagios
Nagios full name (Nagios Ain’t Goona Insist on Saintood), the original project name is NetSaint.
Nagios is an open source free network monitoring tool that can effectively monitor the host status of Windows, Linux and Unix, network devices such as switches and routers, printers, etc. Send an email or SMS alarm when the system or service status is abnormal to notify the website operation and maintenance personnel for the first time, and send a normal email or SMS notification after the status is restored.
Nagios is a monitoring system that monitors the operating status of the system and network information. Nagios can monitor the designated local or remote hosts and services, while providing exception notification functions.
Nagios can run on the Linux/Unix platform, while providing an optional browser-based WEB interface to facilitate system administrators to view network status, various system problems, and logs.
It is a free and open source IT infrastructure monitoring system. It is powerful and flexible, and can effectively monitor the status of Windows, Linux, VMware and Unix hosts, switches, routers and other network settings. The core function of Nagios is to monitor and alarm. The alarming ability is very good, but the graphic display effect is very poor. At the same time, nagios is more flexible, and many functions must be implemented through plug-ins. For students who are not so strong in technical skills, it will be difficult to get started. Of course, for the veteran of operation and maintenance, getting started soon.
Nagios features are as follows:
Monitor network services (SMTP, POP3, HTTP, NNTP, PING, etc.);
Monitor host resources (processor load, disk utilization, etc.);
Simple plug-in design allows users to easily expand their own service detection methods;
Parallel service inspection mechanism;
Have the ability to define network hierarchical structure, use "parent" host definition to express the relationship between network hosts, this relationship can be used to discover and clarify the host down or unreachable state;
When the service or host problem is generated and resolved, send an alarm to the contact (via EMail, SMS, user-defined method);
You can define some processing procedures, so that it can play a preventive role in the event of service or host failure;
Automatic log scrolling function;
Can support and achieve redundant monitoring of the host;
Optional WEB interface is used to view the current network status, notification and fault history, log files, etc.
Supplemental comparison:
Nagios is an open source
enterprise-level monitoring system that can implement basic system monitoring of system CPU, disk, network and other parameters, as well as various basic service types such as SMTP, POP3, HTTP, NNTP and so on. In addition, by installing plug-ins and writing monitoring scripts, users can implement application monitoring and deploy a hierarchical monitoring architecture for a large number of monitoring hosts and multiple objects.
The biggest feature of Nagios is its powerful management center. Although its functions are monitoring services and hosts, Nagios itself does not include this part of the function code. All monitoring and alarm functions are completed by related plug-ins.
User Group: Enterprises suitable for complex IT environments
advantage:
1. The faulty server, application and device will automatically restart and the log will automatically scroll
2. Flexible configuration, can customize shell script, through distributed monitoring mode
3. Support host monitoring in redundant mode, with various alarm settings
4. Command to reload the configuration file without disturbing the operation of Nagios
Disadvantages:
1. The event console is very weak, and the plug-in is not easy to use
2. Weak handling of performance, flow and other indicators
3. No historical data can be seen, only alarm events can be seen, it is difficult to trace the cause of the failure
4. The configuration is complicated, and the time, energy and cost of the beginners are relatively large
Open-Falcon
Open-Falcon is an Internet enterprise-level monitoring system open sourced by Xiaomi's operation and maintenance department. Currently, Open-Falcon is used by Xiaomi, Kingsoft Cloud, Meituan, JD Finance, and Ganji.com. Open-Falcon can be divided into two parts, namely drawing component and alarm component. "Drawing component" is responsible for data collection, collection, storage, archiving, sampling, query, display (Dashboard/Screen) and other functions. It can work alone as a storage display solution for time-series data. "Alarm component" is responsible for alarm strategy configuration (portal), alarm judgment (judge), alarm processing (alarm/sender), user group management (uic), etc., which can work independently. The structure is as follows: