Comparison of several distributed call chain monitoring components
MicroServices architecture, services are split according to different dimensions, and one request request often involves multiple services. Internet applications are built on different sets of software modules, which may be developed by different teams, may be implemented in different programming languages, and may be made up of thousands of servers spanning multiple different data centers. As a result, you need tools that can help you understand system behavior and analyze performance problems so that you can quickly locate and resolve problems when they fail.
The distributed call chain monitoring component is generated in this environment. The most famous is the Dapper that Google's public paper mentions. The dapper was developed to gather more information about the behavior of complex distributed systems and then present it to Google's developers. Such a distributed system has a special advantage, because those large-scale low-end servers, as the carrier of Internet services, is a special cost-effective platform. To understand the behavior of distributed systems in this context, you need to monitor the associated actions across different applications and different servers.
Most of the APM (Application performance Management) Theoretical models on the market are used for reference (borrow) Google Dapper paper, which focuses on the following APM components:
Pinpoint
Pinpoint is an APM tool for large-scale distributed systems written in Java, a distributed tracking component that is open source for Korean people. GitHub Address: Github-naver/pinpoint:pinpoint is a open source APM (Application performance Management) tool for large-scale Distributed systems written in Java. A friend who is interested in the performance analysis of the Java domain should look at this open source project, through the javaagent mechanism to do bytecode code implantation, to achieve the purpose of adding traceid and fetching performance data. The performance analysis of tools such as Newrelic and ONEAPM on the Java platform is also a similar mechanism.
Skywalking
GitHub Address: Wu-sheng/sky-walking This is a domestic brother called Wu Yu Open source, but also a Java Distributed Application cluster business operation tracking, alarm and analysis system, on GitHub also has more than 400 stars. function is relatively pinpoint or slightly weaker, plug-ins are not so rich, but also difficult to get.
Zipkin
Official website: Openzipkin A distributed tracing Systemgithub address: Github-openzipkin/zipkin:zipkin is a distributed tracing system this is the open source of Twitter, but also Refer to the dapper system to do it. The Java application of Zipkin is implemented by a component called brave to achieve performance analysis data acquisition within the application. Brave GitHub Address: Https://github.com/openzipkin/brave This component implements a sequence of Java interceptors to track the call process for Http/servlet requests and database accesses. You then complete the performance data collection for your Java application by adding these interceptors to a configuration file such as spring. The timing data for the collection of services to address latency issues in the microservices architecture, including data collection, storage, discovery, and presentation.
CAT
GitHub Address: Github-dianping/cat:central Application Tracking This is the public reviews open source, the implementation of the function is also quite rich, there are some companies in the domestic use. But the way he implements tracking is to hardcode the code to write some "buried points", which are intrusive. This has advantages and disadvantages, the advantage is that they can be buried in their own places, more targeted; the downside is the need to change existing systems, and many development teams are reluctant.
Comparison
In the first three tools, I recommend the order of pinpoint-"zipkin-" CAT.
The reason is simple, that is, the three tools for the program source code and configuration file intrusion, is incremented in turn:
Pinpoint: Basically do not modify the source code and configuration files, as long as in the start command to specify the javaagent parameters can, for the operation and maintenance personnel is most convenient;
Zipkin: Need to make changes to the configuration files such as spring, web. XML, relative trouble;
CAT: Because of the need to modify the source settings buried point, so it is not likely to be done by the OPS alone, and must be the depth of the developers involved, and many developers are relatively resistant to add these things in the code, compared to the traditional monitoring software (Zabbix) difference, APM is concerned with the performance bottleneck analysis of internal execution and inter-system invocation, which is more advantageous to the specific cause of the problem, and not only provides some scattered monitoring points and indicators just like traditional monitoring software, even if the alarm is not known where the problem is.
Java microservices distributed call chain APM monitoring