Java Runtime Monitoring, part 3rd: Monitoring the performance and availability of application ecosystems (1)

Last Update:2017-02-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In parts 1th and 2nd of this series (a total of three articles), I introduced the techniques and patterns for monitoring Java applications, in which I focused on the JVM and application classes. In this final installment, I'll introduce tips for collecting performance and availability data from application dependencies, such as the underlying operating system, the network, or the application's back-end database. At the end of the article I'll discuss ways to manage the patterns of data collection and report and visualize data.

Spring-based collectors

In the 2nd part, I implemented a basic Spring based component model for managing monitoring services. The basic principles and benefits of the model are:

Using xml-based configurations makes it easier to manage a large number of parameter sets for configuring more complex performance data collectors.

With a separation of concerns structure, you can use simpler components that interact with each other by injecting Spring's dependencies.

Spring provides a lifecycle for simple collection beans, consisting of initialization, startup, and stop operations, and an option to expose the Java Management extensions (Java Management extension,jmx) management interface to the bean, so that you can For control, monitoring and troubleshooting.

I'll introduce more details about Spring based collectors in each section of this article below.

Monitor host and operating system

Java applications always run on the underlying hardware and the operating system that supports the JVM. One of the most critical components of a comprehensive monitoring infrastructure is the ability to collect performance, health, and availability metrics from hardware and os-, typically through OS collection. This section covers a number of techniques for obtaining such data and tracking the application performance management system (application performance MANAGEMENT,APM) through the ITracer class described in part 1th.

Typical OS performance metrics

The following summary lists the typical metrics that are related to multiple parts of the domain operating system. Although the details of data collection are very different and the interpretation of the data must be in a given OS context, these metrics are essentially equivalent on most standard hosts:

CPU Usage: Represents the CPU occupancy on a specific host. The unit is typically used as a percentage, and the CPU busy time is expressed as a percentage of a specific period of elapsed clock time at a lower level. A host can have multiple CPUs, while a CPU can contain multiple cores, but multiple cores are typically abstracted from the OS to represent a CPU. For example, a host with two dual-core CPUs would be said to have four CPUs. Metrics can typically be collected on a per-CPU basis or as a total resource utilization, which represents the overall usage of all processors. Whether to monitor each CPU separately or monitor the overall CPU is usually dependent on the nature of the software and its internal architecture. Standard multithreaded Java applications typically balance the load on all CPUs by default, so the overall monitoring is more appropriate. However, in some cases, individual OS processes are "specific" to specific CPUs, and the overall metrics may not capture the appropriate level of granularity.

CPU usage is usually split into four categories:

System: Processor Time spent executing system or OS kernel-level activity

User: Processor time spent performing user activity

I/O wait: The processor time spent waiting to complete an I/O request in idle state

Idle: Alluding to no processor activity

The other two related indicators are the running queue Length (that is, pending matters for requests waiting for CPU time) and context transformations (the conversion of the processor time allocation from one process to the instance of another process).

Memory: The simplest memory metric is the percentage of physical memory available or in use. Other things to consider are virtual memory, memory allocation rates and reallocation rates, and finer-grained metrics that indicate which areas of memory are being used.

Disk and I/O: Disk metrics a simple (but critical) report of the availability or use of disk space for each logical or physical disk device, as well as the read and write rates for these devices.

Network: Refers to the network interface data transmission rate and error incidence rate, it is usually divided into Advanced network Protocol category, such as TCP and IP.

Process and process groups: you can say that the metrics described earlier are the total activities of a specific host. They can also be divided into the same metrics, but represent the consumption or activity of individual processes or related process groups. Monitoring a process's use of resources helps explain the proportion of resources consumed by each application or service on the host. Some applications can instantiate only one process, and in other cases a service such as Apache 2 Web Server can instantiate a group of processes that represent a logical service.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Java Runtime Monitoring, part 3rd: Monitoring the performance and availability of application ecosystems (1)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support