The open source platform of operation and maintenance tool big treasure

Source: Internet
Author: User
Tags rrdtool snmp opennms opentsdb

From http://cio.it168.com/a2015/1128/1782/000001782714_all.shtml

"IT168 Technology" in the operation and maintenance tools Big Treasure series of the first article, "Operation and maintenance tools, operation and maintenance requirements of the Big Book," Cloud Wisdom on the cloud enterprise operation and maintenance needs of the summary, of which the 6th "strong demand for open source" is mainly from operators, especially technical Daniel, they like everything in the grasp of the feet, And that requires open source operations tools.

Currently popular open-source operations tools such as Zabbix, Nagios and many from abroad, although these open-source products are very powerful, but the technical requirements are very high, and lack of sufficient Chinese documentation and local service support, general operations personnel to use it very difficult.

So there are domestic it manufacturers Xiaomi, TalkingData on its own development of the operation and maintenance system open source, while as a commercial monitoring service providers on behalf of the cloud wisdom, but also for its monitoring Bao products to gradually open source, let operations, developers in the easy deployment and ease of use at the same time, Flexible two-time development according to your business needs.

Here is a detailed review of cloud intelligence on open source monitoring products:

Zabbix

Recommended stars: ★★★★★

Zabbix is a web-based interface to provide distributed system monitoring and network monitoring functions of enterprise-level open source operation and maintenance platform, is currently the most widely used in domestic Internet users monitoring software, cloud wisdom encountered more than 85% users in the use of Zabbix to do monitoring solutions.

Easy to get started, easy to get started with, powerful and open source free is the most intuitive evaluation of cloud intelligence to Zabbix. Zabbix is easy to manage and configure, can generate more beautiful data graph, its auto-discovery function greatly reduces the workload of daily management, rich data collection method and API interface can give users the flexibility of data collection, and distributed system architecture can support monitoring more devices. Theoretically, the plug-in architecture provided by Zabbix can meet any needs of the enterprise.

User base: More than 85% of the pan-Internet enterprises.

Advantages:

1. Support multi-platform enterprise-level distributed open source monitoring software;

2. Easy installation and easy management;

3. Powerful, flexible monitoring, can achieve complex multi-condition alarm;

4. A variety of data acquisition plug-ins, flexible integration;

5. With drawing function, the obtained data can be plotted as graphs;

6. Support the call script at the same time, very convenient;

7. Provide a variety of API interface, customization of the highest monitoring software;

8. Automatic remote execution of the command when there is a problem (need to set EXECUTE permission on agent);

Disadvantages:

1. Project batch modification is inconvenient;

2. Although the community is mature, but the Chinese information is relatively small, service support is limited;

3. Easy to get started, can achieve the basic monitoring, but the deep demand needs to be very familiar with Zabbix and carry out a large number of two custom development, the difficulty is large;

4. System-level alarm settings are relatively many, if not filtered alarm mail will be many, and custom project alarms need to set their own, the process is more cumbersome;

5. Lack of data summary function, such as the inability to view a set of server averages, two development required;

6. The data report requires a special two-time development definition;

Nagios

Recommended stars: ★★★★☆

Nagios, formerly known as Netsaint, is an open-source enterprise-class monitoring System, launched in 1999 and developed and maintained by Ethan Galstad. Nagios is able to implement basic system monitoring of CPU, disk, network and other parameters of the system, and can also monitor various basic service types including SMTP,POP3,HTTP,NNTP. In addition, by installing plug-ins and writing monitoring scripts, users can implement application monitoring and deploy hierarchical monitoring architectures for a large number of monitoring hosts and multiple objects.

The most important feature of Nagios is that its developers design Nagios as the Monitoring center, although its function is to monitor the service and host, but he does not include this part of the function code, all the monitoring, alarm function is done by the relevant plug-ins.

User base: Over 1 million users worldwide. Many multinational enterprises and organizations are using (Siemens, Philips, Yahoo, Sony, AOL, etc.), especially for the complex IT environment of enterprises.

Advantages:

1. Automatic operation and maintenance, error server, application and device will automatically restart;

2. Flexible configuration, many monitoring projects, can customize the shell script, through the distributed monitoring mode, very suitable for large-scale network;

3. Automatic log scrolling;

4. Support the redundancy of the host monitoring;

5. Good correlation between service events and host events;

6. Command to reload the configuration file without disturbing the operation of Nagios;

7. Alarm setting diversity;

Disadvantages:

1. Very weak event consoles;

2. Performance, flow and other indicators of the processing of the Force;

3. Can not see the historical data, only to see the alarm events, it is difficult to trace the cause of failure;

4. The configuration is complex, the beginner invests the time, the energy is relatively big;

5. The usability of the plugin is not good;

Ganglia

Recommended stars: ★★★★☆

Ganglia is an open source cluster monitoring project launched by UC Berkeley, designed to monitor thousands of network nodes. Ganglia is a distributed monitoring system based on a scalable, high-performance computing platform. It has been extensively ported to various operating systems and processor architectures, and is currently used in thousands of clusters around the world.

User base: Applies to server cluster users.

Advantages:

1. Suitable for monitoring the performance of the system, through the curve is easy to see the working state of each node, the reasonable adjustment, distribution system resources, improve the overall performance of the system play an important role;

2. Support browser access, but can not monitor the node hardware technical indicators;

3. Suitable for large-scale cluster environment;

4. Easy to deploy, no need to add configuration to machine;

5. One server can manage tens of thousands of machines through different tiers;

6. Can customize the monitoring items, monitoring display of the table and image two, support mobile version.

Disadvantages:

1. There is no built-in message notification system;

2. No alarm mechanism, there are problems can not be timely alarm;

Zenoss

Recommended stars: ★★★★☆

Zenoss Core is an open source version of Zenoss, and its commercial version is Zenoss Enterprise. As an enterprise-class intelligent monitoring software, Zenoss core allows IT administrators to rely on a single Web console to monitor the state and health of the network architecture. The power of Zenoss core comes from an in-depth list and configuration management database to discover and manage the various assets of the corporate IT environment (including servers, networks, and other structural devices). Zenoss also provides an event and error management system associated with the CMDB to help improve the management efficiency of events and reminders.

Zenoss achieves a better combination of open source and commercialization, absorbing the benefits of open source software, while also ensuring reliable follow-up software services through commercial operation.

Advantages:

1. Zenoss's great place is its dashboard, which can be configured with many portlets (i.e. widgets).

2. Each user's interface is managed separately, and custom dashboard does not affect other users.

3. Powerful monitoring function (server, routing switch, firewall, storage, database, middleware)

4. In the service pool, the latest Docker technology is used to facilitate the user to update and manage the console.

5. Data storage Architecture: Use HBase-based OPENTSDB to store data for any time period

6. Better integration of condition monitoring, performance monitoring, resource management, and better reporting mechanism

7. Intuitive and professional management interface is very attractive to users.

Disadvantages:

1. Resource requirements are high, and even if you manage only a few devices, it also consumes additional resources such as hardware and memory.

2. For Windows systems, the open source version only provides SNMP, through WMI detection cpu,disk, hardware and software and performance only available in the paid version.

Hyperic HQ

Recommended stars: ★★★☆☆

Hyperic HQ is a Java-based Web infrastructure monitoring and management platform that provides visibility into the various technology stacks in the production environment. The key elements of the architecture are the HQ Server, which can be used for centralized management and persistent storage, and the HQ agent provides the basis for monitoring and control of each monitoring platform.

User base: Typically used in large computing environments, the core value is the ability to automatically and easily manage and control thousands of software resources for hundreds of machines. The repository includes: operating systems, application servers, application components, and other software components.

Advantages:

1. The auto-discovery function is excellent. A list of assets can be found by clicking on it.

2. Monitor + + system with 75+ resource plugin.

3. Maximize usability: Before the problem occurs, alert, control to correct the problem.

4. Can track performance, configuration, security changes.

Disadvantages:

1. The indicators are provided by default and cannot be customized (on a non-development basis).

2. The lack of basic functions requires a strong two-time development capability.

OpenNMS

Recommended stars: ★★★☆☆

OPENNMS is an enterprise-class Java/xml-based distributed network and system monitoring management platform. OpenNMS is a great tool for managing your network, showing the status and configuration of the terminals and servers in your network, providing effective information for you to manage your network conveniently.

OPENNMS focuses on three areas: Service polling, data collection, event and prompt management.

Advantages:

1. Customizable dashboards for Amazing features

2. Already widely popular, there are more than 15,000+ plug-ins available for users to choose from.

3. Search function is practical. Search for specific services such as DNS or POP3, and search for data fields related to assets, including location, operating system, and operational status, as a node.

4. The report feature is comprehensive, with a large number of pre-created templates and the ability to run interim reports.

Disadvantages:

1. The interface is not intuitive to users

Cacti

Recommended stars: ★★★☆☆

Cacti is a complete network traffic monitoring graphics analysis solution, based on RRDtool data storage and graphics capabilities to achieve network monitoring. Cacti provides fast data query, Advanced graphics template, multiple data collection methods and user management functions. With an intuitive, easy-to-use interface that enables complex network monitoring from LAN size to hundreds of of devices, you can specify that each user can view the tree structure, host, and any map, as well as user authentication in conjunction with LDAP, as well as the ability to add templates on their own and be very powerful.

Advantages:

1. Good-looking interface, the main purpose is to collect historical data and drawings;

2. The tree diagram sets the high degree of freedom, can adjust the regular view of the picture put in front;

3. User rights setting is fine;

Disadvantages:

1. The frequency of the test is the default of 5 minutes, increase the frequency will have some bugs;

2. Web interface settings are difficult to find;

3. Adding a custom chart is troublesome;

Surveillance treasure

Recommended stars: ★★★★★

Performance is the SaaS product of cloud intelligence to provide users with it performance monitoring (it monitoring), including website monitoring, server monitoring, middleware monitoring, database monitoring, application monitoring, API monitoring and page performance monitoring and other functions. Includes free version, enjoy edition and Enterprise Edition, current user about 400,000, monitor Bao app is also the only product that provides mobile monitoring service in the country.

User groups: Covering e-commerce, mobile internet, advertising media, online games, education and medical industries, hundreds of thousands of of users, Millet, Mo mo, High-gold, UF, Jinshan, cattle, poly mei Excellent products, lufax, China Peace, CCB Credit Card center, Spring rain doctor, swimming, National Grid, China Telecom, drip taxi, Spring Airlines, Phoenix and other industry leading enterprises and the Chinese Internet hundred enterprises more than 30% in the use of surveillance bao.

Advantages:

1, as the earliest provider of network monitoring platform based on SaaS, monitoring treasure not only for the primary users to provide free standard services, enterprise users can also purchase the required monitoring, alarm resources, to maximize the cost savings of enterprise operation and maintenance;

2. Through more than 300 distributed monitoring nodes around the world, the monitor and control system actively monitors and analyzes the stability and availability of the network, supports HTTP (https), FTP, ping, UDP, TCP, SMTP, Traceroute and other protocols. Measure CDN effect and DNS status, full-network performance trend analysis.

3, real-time capture server deep performance indicators, support linux/unix/windows system and cloud platform, support CPU utilization, CPU average load, memory usage ratio, disk IO, disk space utilization, network traffic and system process statistics and other physical indicators and more than 30 kinds of application services, The cloud host monitoring side is turned on without complex configuration. For application service monitoring, the monitoring treasure has supported common application types including: Apache, LIGHTTPD, Nginx, Tomcat, IIS, Memcache and redis, storage layer monitoring support for Hadoop, MySQL, MongoDB, Health status and performance monitoring for SQL Server, Oracle.

4, monitoring Bao is currently the only support API monitoring network monitoring products, through the API interface call simulation user process, support for GET, post, put, delete, head, options six kinds of requests for real-time monitoring; support for JSON, XML, Text, Response Status Verification and Postman script import.

5, Docker monitoring is also the exclusive function of monitoring treasure, to monitor the CPU, memory, network traffic and swap status of Docker containers in real time, so that developers and operations personnel can clearly master their resource consumption when using Docker.

6, Monitoring Bao provides page performance management, based on international standards to develop page performance index, identify the status and correctness of loading elements, the full user load response time analysis, while accurately locating the problem elements and optimization recommendations.

7, timely and effective alarm notification for operation and maintenance is very important, monitoring treasure can be set according to the SLA alarm threshold, the first time to send an alarm notification. Surveillance Bao covers the most comprehensive alarm notification method: E-mail, SMS, telephone voice, URL callback notification, APP push and so on. In addition, the monitoring Bao provides grading alarm notification, can be based on the different levels of alarm events to push different alarms to different people, support enterprise hierarchical management!

8, monitoring Bao currently on its smart agent open source, users can customize the development agent according to business needs, while the user's data security is guaranteed.

9, the monitoring treasure to provide privatization deployment solutions to meet the enterprise and financial industry proprietary network monitoring needs.

10, from Compuware, CA, IBM and other enterprise IT service senior experts, more than 5 years of localization of enterprise-class SaaS service experience, as well as more than hundred people of technical Service team, to provide users with the best service protection.

Disadvantage: The free version only supports 6 monitoring points, monthly free SMS 100, monitoring frequency of 30 minutes.

Open-falcon

Recommended stars: ★★★☆☆

Open-falcon is the Xiaomi operations team from the needs of the Internet company, based on years of operation and maintenance experience, combined with SRE, SA, devs experience and feedback, developed a set of Internet-oriented enterprise-level open source monitoring products.

Open-falcon Architecture

User base: Released in May 2015 and formed a hundreds of-person communication group, there are currently dozens of enterprise users in different degrees of use.

Advantages:

1. Powerful and flexible data acquisition: Auto discovery, Support falcon-agent, SNMP, support user active push, user custom plugin support, OPENTSDB data model like (timestamp, endpoint, metric, Key-value tags)

2. Horizontal expansion capability: Support the data acquisition, alarm judgment, historical data storage and query of billions of times per cycle.

3. Efficient alarm Policy Management: Efficient portal, support policy templates, template inheritance and overlay, multiple alarm modes, support callback calls

4. User-friendly Alarm settings: Maximum number of alarms, alarm level, alarm recovery notification, alarm pause, different thresholds for different periods, support maintenance cycle

5. High-efficiency Graph components: single-stand support 2 million metric escalation, archiving, storage (1-minute period)

6. Efficient historical Data Query component: Using RRDtool data archiving strategy, second level returns hundreds of metric year of historical data

7. Dashboard: Multi-dimensional data display, user-defined screen

8. High availability: The system has no core single point, Isianvi, easy to deploy, can be horizontally expanded;

9. Plug-in monitoring framework, through a variety of plug-ins currently support Linux host monitoring (more indicators), Windows host monitoring, MySQL monitoring, redis monitoring, memache monitoring, RABBITMQ monitoring and switch monitoring.

Disadvantage: Because of the reputation of Xiaomi company, its operation and maintenance level ability is very high, the function of Open-falcon is relatively complete, open and free features, I believe in the future in the domestic control operations in the field occupy a very high position. However, due to its short release time, many of the basic service Monitoring plug-ins (such as Tomcat, Apache, etc.) are not supported, many features are still being perfected, in addition to lack of special support, although there are open communities, but the efficiency of the problem is relatively low.

OWL

Recommended stars: ★★☆☆☆

OWL is a distributed enterprise-class monitoring solution developed by Big Data company TalkingData operations. It can monitor IT infrastructure resources, support other data monitoring, integrate the language and technology (such as Python,shell, etc.) that the Ops people loved, and also serve the developers, so as to facilitate the flexibility to put more business monitoring metrics.

OWL Architecture

Because TalkingData is a big data analysis company, so in the design of owl, fully consider a variety of large data algorithms and distributed storage, making monitoring alarm more flexible, more rich data analysis, business monitoring is more convenient.

User group: TalkingData for self-use, is expected at the end of open source, has been a lot of operation and maintenance began to pay attention.

Advantages:

1. Floating alarm rules based on complex algorithms: OWL not only supports fixed alarm thresholds, but also supports floating alarms. That is, after reaching the preset threshold, automatically append the threshold, so that to a certain extent can reduce the amount of information sent, after the system returns to normal, OWL monitoring system can automatically revert to the previous threshold;

2. Flexible and convenient user-defined reports: each user of the monitoring system (such as network engineer, system engineer, Dba,devops, etc.) can customize their own chart workbench;

3. Truly visual Asset Management: The latest version of OWL has maintained its previous features-simulation of the cabinet diagram, real assets at the same time display the host's monitoring status, location and status at a glance;

4. Easy to deploy agent, support process daemon: OWL's monitoring agent does not rely on the OS, easy to deploy, can support a variety of plug-ins, and with the help of the twin mechanism, the implementation of the process daemon;

5. Parallel expansion of the underlying data storage: Select the parallel extensibility Good hbase, the upper layer using the TSDB package. Although this loss of flexible data query form, but for data storage, can be relatively good transparency;

Disadvantage: Because the product has not been released publicly, so can only from the introduction of information judgment, the current product is not very mature, the function is mainly around TalkingData's own operation and maintenance needs, including visual asset management content. Other features, in addition to being alerted, may be more in-depth, similar to Open-falcon.

The above is the cloud wisdom of the domestic popular open-source operation and maintenance of the comparison of monitoring tools, open source products, although the initial input less, the use of flexible features, but in the management costs, learning curve and security is difficult to obtain large enterprises and high-speed growth of the Internet enterprise recognition, Therefore, commercial operation and maintenance products in the domestic enterprise-level market still occupies a large proportion, in the follow-up article "operation and maintenance tools big treasure of business software", we will provide you with a comparative analysis of the advantages and disadvantages of commercial operations tools.

The open source platform of operation and maintenance tool big treasure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.