As with other large and medium-sized Internet applications, the microblogging platform consists of a number of distributed components, the user through the browser or mobile client every HTTP request to the application server, will pass through many business systems or system components, and leave footprints (footprint). But these scattered data are limited to troubleshooting, or process optimization. For such a typical cross process/cross threading scenario, it is particularly important to collect and analyze such logs. On the other hand, the collection of performance data for each footprint (footprint) and the flow control or demotion of subsystems according to the strategy are also important factors to ensure the high availability of the microblogging platform. To be able to track the full call link for each request, collect performance data for each service on the call link, and then give the watchman system a microblog based on these goals by calculating performance data and then returning it to the control flow, which is compared to performance metrics (SLAs). In the industry, Twitter's Zipkin and Taobao's Eagle Eye System are similar systems.
Such a system usually has several design goals:
Low-Invasive (NON-INVASIVENSS): As a non-business component, it should be as small as possible to invade or not invade other business systems, maintain the transparency of the consumer, can greatly reduce the burden of developers and access threshold.
Flexible application Strategy (APPLICATION-POLICY): You can determine the scope and granularity of the data collected.
Timeliness (time-efficient): From data collection and generation, to data calculation/processing, to presentation or feedback control, are required to be as fast as possible.
Decision Support (Decision-support): Whether these data data can play a role at the decision support level, especially from the DevOps perspective.
Watchman System Architecture diagram
What does the watchman system do for these design goals?
Now that you're tracking the call link to collect data, it's common practice to log the logs through code burial points. This requires the intrusion of code to be modified at all places where data is needed, and (possibly) the introduction of new dependencies. Taobao's Eagle Eye system, for example, logs data and passes request contexts (Request-context) through buried points on both sides of a remote call across processes (stubs and skeleton).
The Watchman-runtime component weaves enhanced logic (load-time weaving) in the loading phase using bytecode enhancements, in order to pass the request context across process/thread. The implementation of several thread pools (ThreadPool or classes of executor) for the JDK itself was modified by the javaagent of the cross threaded watchman-enhance component in the application startup and loading class. An implementation (Proxy-pattern) with tracing capability on the incoming Runnable/callable object package when the client code commits (execute or submit), and inherits or initializes the request context from the parent thread (request-context) , as shown in the following illustration: