Monitoring and control of monitoring system selection

Source: Internet
Author: User

I. BACKGROUND


With the rapid development of the Internet, each cloud business continues to rise, 5 years ago, tens of thousands of internet companies may not exceed 200, in recent years, the number is rising, and the speed of growth is staggering, especially the emergence of virtualization, container technology, 100,000 levels of servers should also be gradually popularized. The soldiers are not moving, monitoring first. The importance of monitoring needless to say, there has been a monitoring analogy to transport Koriyuki eye, as an operator, if not monitored, is equivalent to "busy." But for a powerful monitoring software, exactly what needs to be, how to show to be more acceptable to the public and the implementation of the difficulties ..... Do you think about a series of questions?


Second, analysis


A powerful monitoring system, some of the following features are essential.

1. Data batch acquisition and optimization processing (raw data collection and processing)

2. Centralized display (Chart display and digital display and personalized display)

3. Alarms (Specify alarms, alarm escalation, and alarm suppression)

4. Separation of permissions (user rights Isolation)

5. Audits (security audits)


Third, detailed

1. Data acquisition and processing

Like cooking, you first need to buy the ingredients before you start cooking. Monitoring system He also needs ingredients----data. He needs to collect huge amounts of data from all over the place. Such as: CPU, memory, network, disk and other data, of course, there are other data: including server uptime, OS information and so on. Once the data is collected, it is centrally stored in the central database, which is then processed uniformly. Here is a question, is not all the data need to be all stored in the database? If all the data needs to be stored, space can be a problem, perhaps you think the disk hardware is cheap now, but if the raw data is not processed, his growth is beyond your imagination.

Let's do a simple calculation: Let's say we have 6,000 monitors (about 60 hosts, each host100 item), and every 60s, that is, we need to pick up 6000/60=100 per second, that is, the database needs 100 new values per second. These values need to be kept in the database for a certain amount of time, usually months, six months, or a year. Each new data and index value requires a certain amount of disk space, here if we keep the minimum time: 30 days. We now add 100 values per second, 30 days to accumulate: (30*24*3600) *100=259,200.000, or about 260M. Depending on the database engine being used, the type of value received (float, integer, string, log file, etc.), disk space remains a single value that may vary from 40 bytes to hundreds of bytes. Typically, each value is approximately 50 bytes. In our case, this means that the value of 260M will require 260M * 50 bytes = 13GB of disk space. That is, 60 hosts, 100 monitoring items per host, one months of the amount can basically reach 13GB of disk space, and so on, if it is 6000 units, 100 monitoring items per unit, the data need to maintain six months, that is 13GB 600 times times, that is, 7800GB. If it is 60,000 units, 160,000 units, and the monitoring items here average 100, has been calculated at least.

Do you have to save every minute of your data? I personally think that, in addition to some of the key business (such as billing), most of the others can be sparse processing. For example, the acquisition time is one minute acquisition, the data line is a minute, but to the late, time is long, we may not need so fine, but just need to see a trend map, we can generally do this: for example, one weeks, we can be a point per hour (calculate the average of one hour, the maximum value , the minimum value), and one months, we can a point in a day or half a day to do this, here are the time period of the average, maximum, minimum value.


2. Data presentation and analysis

As such a large number of monitoring systems, the daily acquisition of a large number of server data, but the light collected data and storage, is useless. We need to focus on showing and digging to turn this data into business value, and through these data mining and analysis, we can guide our online production environment and make some cost budgeting and trend analysis. For example: Server selection, fault diagnosis, traffic analysis, user distribution, game activity effect, resource requirements and so on. These are a great guide for cost reduction, fault handling, and online operations. Of course, there is a need to think about big data, and we don't analyze it so deeply. Haha, mostly I haven't reached that level yet.

3. Alarm

If we need to monitor the survival or not of a game process, how do we need to do that? The common practice is to write a script, and then do a timed task, scheduled to execute the script scan for existence, once the detection process does not exist, then SMS and e-mail notification. But there are many problems with this primitive approach:

1. If you need every machine to write a script and then do a timed task, you may be running every day between the script and the scheduled task

2. If you need server maintenance, then you go to each machine to stop the scheduled tasks, continue to run on each machine's scheduled tasks

3. If a new demand is coming up one day and you need to pull up the process after monitoring to the outage, you may go into the next round of scripts and scheduled tasks

is a great ops person willing to be this round every day?


4. Separation of Privileges

When the machine has reached a certain scale, it means that the user of the machine has reached a certain scale, at this time, if people in different departments need to view only the machine information they are responsible for the project, but also you need to consider the separation of authority time. At the same time, the separation of permissions, including viewing, use, alarm, alarm shielding, template settings, such as a series of separation of permissions.


5. Audit

Auditing is a must for any system, that is, any operation of the users of the system can be traced, it is necessary to record which user at which point in time to do what.

At this point, a powerful monitoring system must be analyzed, and then we look at the current monitoring system for our options, of course, this option is limited to open source system, as for why choose Open Source, will be discussed later.


This article from the "Breeze pavilion----Heart boundless Love boundless" blog, please be sure to keep this source http://51ctoo.blog.51cto.com/5742811/1697170

Monitoring and control of monitoring system selection

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.