Common monitoring concepts and terminology explained

Last Update:2017-01-17 Source: Internet

Author: User

Tags file transfer protocol

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Server performance monitoring: Refers to the operating state of the server system and the monitoring of the indicators, the specific monitoring indicators please refer to: What performance indicators can be monitored to the server?

With custom monitoring, you can monitor the data you want to monitor, such as memcached, Java Virtual machines, forum online numbers, and more.

Custom alarm settings are supported for a variety of monitoring projects, allowing you to more flexibly set alarm thresholds such as ping response times of more than 200ms for 3 consecutive times, or server CPUs with an average load of more than 10 over the last 5 minutes.

The URL callback feature allows you to send alert notifications to the URL you specify, giving you more flexibility in handling alarm messages

Ping monitoring is the ICMP ping detection of the specified server, the availability report and the monitoring of the response time, packet loss rate, etc.

TCP monitoring is the monitoring of the server's specified port availability and response time through the TCP protocol.

FTP monitoring refers to the monitoring of the availability and response time of FTP (File Transfer Protocol) servers.

Network operator refers to the provision of network access services, before the domestic network operators: Unicom, Mobile, telecommunications, Netcom, CRC, Satcom, after the merger of the current domestic network operators: China Mobile, China Unicom, Chinese Telecom.

Site monitoring refers to a specific standard network protocol to the site or server for external monitoring, it includes a variety of types, please refer to: What types of site monitoring include?

Service performance monitoring refers to the operational status of the software and the monitoring of the indicators for APCHE/MYSQL/NGINX/LIGHTTPD.

HTTP monitoring is the monitoring of the availability and response time of the site through the HTTP protocol.

DNS monitoring refers to monitoring the availability and response time of DNS (domain name System) name resolution servers.

UDP monitoring is the monitoring of the availability and response time of server-specified ports via UDP protocol.

SMTP monitoring refers to monitoring the availability and response time of SMTP mail servers.

++++++++++++++ Availability Rate
Availability is the percentage of the total time that a website or server can normally access.
For example, the home page in a day can always be normal access, then the first page of the day's availability rate is 100%.

If the home page has 9 minutes to access, and a total of 1440 minutes a day, then the first page availability rate is:
((1440-9)/1440) * 100%, which is 99.37%.

Date% availability Failure time
100%-2017-01-27
2017-01-26 99.37% 9 min
100%-2017-01-25

++++++++++++++ Packet loss Rate
Packet loss rate is the ratio of the number of packets lost to the packets sent.

++++++++++++++ Response Time

First of all, we want the response time to be as short as possible, which means that users can access your site or server faster.
We color-coded the response time, which means:

Green: Indicates the response time is in the normal range, relatively fast;
Blue: Indicates the response time is a bit slow, need to attract attention;
Yellow: The response time is relatively slow, need to attract attention;
Red: The response time is very slow, need to find ways to optimize;

So what is the specific definition of response time? It is the response time that begins when a user sends a request to a site or server, until the target content is downloaded to the client.
For Web page/http type of site monitoring, response time only for the Web page itself, including from the DNS resolution, network connection with the site server, Web server processing to download the content of the Web site, and so on, detailed records of each check snapshot, you can use this data to analyze how to optimize performance. You can refer to: HTTP response time detailed analysis
DNS Domain name resolution 1.3MS
Establish Connection 19.59ms
Server Computing 26.24ms
Download Content 278.60ms
It is important to note that the response time of the webpage does not include the download time of other components in the Web page, such as CSS and JavaScript scripts.
For ping type monitoring, the response time is actually what we often see in the command line as a ping, which is what we often call a ping value.

++++++++++++++++ CPU Usage
CPU usage refers to the ratio of CPU usage to the total CPU run time.
Where the Linux/unix operating system divides the CPU utilization into:
The rate at which user time is spent executing the process;
The rate at which System time executes kernel processes and interrupts;
Wait io is a percentage of the time that the CPU is idle due to IO waiting;
The percentage of idle CPU time spent in a free State;
User Time + system times + Wait IO = Total usage, while the CPU usage in the Windows operating system is used only for state and idle state, and the percentage of usage state is used.

+++++++++++++++ failure rate
The failure rate is the proportion of the total length of time that the project has failed for a certain period.
For example, a project is monitored at a frequency of 2 minutes, monitored 5 times in 10 minutes, and 3 monitoring points (A, B, C) are in operation for each monitoring period. The specific monitoring results are shown in the table below, then
Failure rate = (2+0+0+0+0)/10=20%

Monitoring for the first time the second monitoring of the third monitoring of the IV monitoring
Monitor point A is unavailable available
Monitoring point B unavailable available available
Whether the fault is no no no
Fault duration 2 minutes 0 minutes 0 minutes 0 minutes 0 minutes

Note:
Fault definition: Every monitoring, all monitoring points fault is recorded as the project failure.
Fault Duration: Each time the monitoring result is a fault, the length of the fault is added to the length of the monitoring frequency.

++++++++++++++++++ Frame composition
The application schema diagram has a maximum of five layers and is fixed sequentially, from top to bottom: site layer, network layer, service layer, storage layer, physical layer. Each layer includes the following table of project types:

Web site layer HTTP, Web page performance management
Network layer FTP, SMTP, Ping, traceroute, DNS, TCP, UDP
Service layer Apache, LIGHTTPA, Nginx, Memcache, Tomcat, IIS
Storage tier Mysql, MongoDB, Redis, SQL Server, Oracle
Physical Layer Server performance

+++++++++++++++++ Average Available Rate
The average availability rate is the average of the available rate of the monitoring point.
For example, users in the "Availability statistics-monitoring point Data" page selected Northwest & Telecommunications (such as), including 3 monitoring points: Xi ' an telecom 99.86%, Urumqi Telecom 100%, Lanzhou Telecom 100%. The
Average Available rate = (99.86%+100%+100%)/3=99.95%

This article from the "Development and operation of the" blog, declined to reprint!

Common monitoring concepts and terminology explained

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More