When writing an application, logs are usually logged for post-mortem analysis, and in many cases, after a problem has been created, it is a static analysis after a while to view the log. In many cases, we may need to understand how the entire system is running at present, or at some point, such as how many services are available in the current system, what the response time is, what happens over time, and how often the system goes wrong. These dynamic quasi-real-time information is important for monitoring the health of the entire system.
Some applications, such as the webservice of external interfaces or services, monitoring the real-time operation of the whole system is particularly important, like the resource manager in our operating system, if we can see the CPU, memory and other resources of the whole system in real-time or quasi-real-time, It is important for us to respond quickly to the system and to optimize it. And, these real-time performance parameter information, for some advanced applications, such as the service of the fusing mechanism (the need for real-time statistical system error ratio and response time), only real-time monitoring to provide this data, in order to achieve this improve the system robustness of the function.
A few days ago, in an interview article in Infoq, "profiling is particularly important." If you can have a particularly powerful profiling system, you will know where the entire system, which machine, spent a lot of CPU, memory, disk IO or network bandwidth and other resources, to know what the best place to optimize the benefits. ”
Similarly, for webservice monitoring, for example, in that place, on that machine, how much CPU, how much memory, the response time of each service, the frequency of errors, and so on, when the information is recorded, we can see the dynamic performance of the service at runtime, Easier to identify errors or locate problem points for optimization.
The simplest way to do this is to place the exits in the key areas of the application, or at the entrances of all the programs, and then send the sampled information to a message queue or to a memory db, and then read the analysis and display on the other systems.
In Java there is an open source project called Metrics, which captures the JVM and application-level performance parameters, and his author, Coda Hale, describes what Mertics is and why metrics is necessary in an application system, Videos and speeches can be downloaded separately on YouTube and SlideShare. The project that has been ported in. NET is called Metrics.net.
Before you start introducing metric, you need to look at some basic types of metrics and common usage scenarios to know how to use them.
One metric type
Metrics offers 5 basic metric types: Gauges, Counters, histograms, Meters and timers
Gauge
Gauge is the simplest metric type, with only a simple return value, which he uses to record the instantaneous value of some object or thing.
For example, we type gauge counter to record the number of cities that a service is currently open
Metric.gauge ("Service Cities Count", () = Cities.count, New Unit ("a"));
Counters
Counter is a simple 64-bit counter that he can increase and decrease.
For example, we can define two counter types of counters to count all service requests, and the total number of requests currently being processed.
<summary>///keep the total count of the requests///</summary>private readonly Counter totalrequestscount ER = metric.counter ("Requests", unit.requests);///<summary>///count the current concurrent requests///</ Summary>private readonly Counter concurrentrequestscounter = Metric.counter ("Samplemetrics.concurrentrequests", unit.requests);
This allows both counters to be incremented at the beginning of our request processing.
When a request is processed, reduce the request that is currently being processed by one
This.concurrentRequestsCounter.Decrement (); Decrement number of concurrent requests
This counter can also be used to count how many people are currently online, or how many sessions in the server are within the validity period.
Meters
Meter is a self-increasing counter that is typically used to measure the rate at which a series of events occur. He provides the average rate, as well as the exponential smoothed average rate, as well as the sampled 1 minute, 5 minute, and 15 minute rate.
For example, the rate of requests for statistics, such as how many requests come in per minute. Just need to define a metric
<summary>///measure the rate at which requests come in///</summary>private readonly Meter Meter = Metric . Meter ("Requests", unit.requests,timeunit.seconds);
Where the request is being processed, the mark method can be called.
This.meter.Mark (); Signal a new request to the meter
For example, to measure the probability of a service error, such as how many errors per hour. You can define a metric.
<summary>///measure the rate of service exception///</summary>private readonly Meter errormeter = Metric . Meter ("Error", Unit.errors, timeunit.hours);
This way, when the request is processed, if an exception occurs, call the Errormeter's Mark method.
This.errorMeter.Mark ();//signal a new error to the meter
Histograms
Histrogram is used to measure the distribution of value in the stream data, Histrogram can calculate the maximum/small value, the mean, variance, the number of bits (such as the median, or the 95th), such as the extent to which the 75%,90%,98%,99% data is located.
For example, we want to measure the length distribution of all request parameters that pass in the service. Then, you can define a histogram.
<summary>///keep a histogram of the input data of our request method///</summary>private readonly Histo Gram Histogramofdata = Metric.histogram ("Resultsexample", Unit.items);
Then, in the requested place, call its Update method to renew the value.
This.histogramOfData.Update (Request.length, methodName); Update the histogram with the input data
Timer
A timer is a combination of histogram and meter, such as the rate at which the current request is being counted and the processing time.
You can define a timer:
<summary>///measure the time rate and duration of requests///</summary>private readonly Timer timer = Me Tric. Timer ("Requests", unit.requests);
When used, call the Newcontext of the timer.
Using (This.timer.NewContext (i.ToString ()))//measure until disposed{ ...}
Output of two metric data
After collecting so much data, we need to show or save the data in real time. Metric provides a variety of data reporting interfaces. Including the Metrics.NET.FlotVisualization, as well as the output to the professional system monitoring graphite, output to open source, distributed, time series in the Influxdb, or output to Elasticsearch. It is also very simple to configure. For example, if you want to display directly on an HTTP page, simply set the appropriate endpoint when initializing:
Metric.config . Withhttpendpoint ("http://localhost:1234/metrics/") . Withallcounters () . Withinternalmetrics () . withreporting (config = config . Withconsolereport (Timespan.fromseconds (30))
Then enter http://localhost:1234/metrics/in the browser, you can see the various acquisition of quasi-real-time various metrics information:
The performance of the above dashboard is slightly humble. In general, we typically store these real-time collected data in a distributed sequential database influxdb, and then use the Open source Chart control Grafana to present the data in real time, for example, to create a dynamic performance quasi-real-time monitoring system like the following:
Three summary
This article describes how to use embedded points and various measurement tools to monitor the performance of your application in real time. NET the use of metrics measurement tools. Unlike traditional logging methods, this real-time or quasi-real-time sampling and monitoring of various key indicators of the current system provides a dynamic perspective for application operations and performance optimization, which helps us to better understand the various performance parameters and performance states of the current application or service online. Metrics sampling should minimize the intrusion of the original system, so it is generally best to store the sampled results in a message queue or in memory db, and then perform a dynamic display, in addition to the sampling frequency is an important factor to consider. Because for a larger system, the amount of data generated by real-time sampling is larger. InfluxDB seems to be available only on non-Windows platforms, so this article does not fully demonstrate the construction of the entire Metrics+influxdb+grafana build application real-time monitoring system. But I believe you should follow the relevant documents, should not be difficult to achieve.
Hopefully this article will help you understand how to monitor application performance in real time and how to build application performance Parameters dashboard.
Four references
- Http://www.infoq.com/cn/articles/interview-alibaba-zhaohaiping
- Https://github.com/etishor/Metrics.NET
- Http://www.cnblogs.com/shanyou/p/4004711.html
- http://influxdb.com/
- http://grafana.org/
- http://xiaorui.cc/tag/influxdb/
Metrics Monitoring Applications