[ Span style= "font-family: Song Body" > Description: This article is to read google thesis " dapper, a large-scale distributed Systems tracing Infrastructure ", complete translation can be referred to here Other paper " uncertainty in Aggregate estimates from sampled Distributed traces twitter open source zipkin google Dapper ]
Dapper was originally designed to track the process of request processing for online service systems. For example, in the search system, a user's request in the system will be processed by several subsystems, and these processing occurs in different machines and even different clusters,It is important to quickly identify the problem and pinpoint which part of the problem is occurring when the request handling exception occurs, and dapper is to solve the problem.
The tracking of system behavior must be ongoing, because the occurrence of an exception is unpredictable and may be difficult to reproduce. At the same time, tracking needs are ubiquitous and pervasive, otherwise some important points may be missed. Based on thisDapper has the following three most important design goals:The low overhead, the transparency of the application, is extensible. Simultaneous production ofTrackingdata needs can be quickly analyzed, which can help usersReal TimeGet online service status.
Implementation Methodlow overhead: Not all requests are tracked but sampled, and two samples are collected when the trace data is gatheredtransparency of the application:Modify the base library code such as multi-threading, control flow, RPC, and insert code that is responsible for tracing. in Google, the app uses the samemulti-threading, control flow, RPC and other basic library code, so you can only modify them to achieve tracking capabilities. When a thread is processing a trace, dapper associates a trace context to the thread localization store. The trace context contains span-related attributes such as trace and span IDs. For situations where asynchronous processing is required, Google developers usually implement callbacks using a common control flow library and dispatch them to a thread pool or an executor. Dapper guarantees that all callbacks will save their creators 'trace context, and when the callback is invoked, theThe trace context is also associated with the corresponding thread. In this way, dapper can implement the tracking of this asynchronous process. For tracked RPC calls, the span and trace IDs are also then passed from the client to the server. Functionally, this part of the code mainly includes the creation of a span, sampling, local disk log writes, but because it will be relied on by many applications, maintenance and bug fix difficult, need very high stability and robustness. It also has to be lightweight, in fact this part of the CodeC + + implementationand less than 1000 lines in total.
Dapper supports users to get the Tracer object directly and output their own custom information, the user can output their own arbitrary output content, in order to prevent the user over-output, provide user configurable parameters to control its upper limit.
tracing requires that the request be flagged, resulting in a unique ID (a 64-bit integer in dapper) used to identify the request. for Dapper, a trace (trace process) is actually a tree in which nodes in the tree are called a span, and the root node is called root span. As described:It is important to note that a span may contain information from multiple hosts, and in fact each RPC span contains information from the client and the server side. Butthe clock on the client and the server is biased, and the text does not indicate how to solve the problem, just sayingcan beusing the following fact "RPC'sinitiatingThe client is on the server before it is received, and the response to the RPC is issued by the server before the client receives ", determines the timestamp of theaUpper and lower bounds. The entire data collection process for Dapper is as follows:The span data is first written to the local log file, then the data is collected and written to BigTable, and each trace record is used as a row in the table.The sparse table structure of bigtable is ideal for storing trace records because each record may have any span. The entire collection process is Out-of-band, with two independent processes that are completely unrelated to request processing, so that the processing of the request is not affected. If you change to In-band, the trace data is sent back to the RPC response message, which affects the network condition of the application, and the RPC calls may not be perfectly nested, and some RPC calls may return early before the return of the ones it relies on. The dapper provides APIs that allow users to access these trace data directly. Google's in-house developers can develop generic or application-specific analysis tools based on these APIs. This has had a surprising effect on improving the role and influence of Dapper.
Tracking Overheadif the additional overhead of tracing is too high, the user usually chooses to turn it off, so low overhead is very important. Sampling can reduce overhead, but simple sampling can lead to a non-representative sampling result, and the dapper uses an adaptive sampling mechanism to meet both performance and representativeness requirements. Trace generation overhead is critical for dapper, because data collection and analysis can be temporarily switched off, as long as the data has been generated and can be collected and analyzed later. In trace generation, the largest header is in the creation and destruction of span and annotation. Root span creation and destruction takes an average of 204ns, and normal span requires only 176ns, except that root span requires a globally unique trace ID. If span is not sampled, add it to thethe overhead of annotation is basically negligible, it takes about 9ns. But if it is sampled, the average overhead is 40ns. These tests were performed on a 2.2GHZ x86 server. Local Disk writes are the most expensive operations in the dapper runtime, but they can be asynchronous and batched, so they basically affect only those applications with high throughput rates.
reading the trace data through the dapper background process can also incur some overhead. However, according to our observation, the CPU cost of Dapper daemon is always under 0.3%, the memory footprint is also very small, but also bring a light amount of network overhead, but each span average only 426 bytes, network overhead only accounted for the entire product system traffic is less than 0.01%.
for applications where each request is likely to generate a lot of tracking data, we also reduce the overhead by sampling. We ensure that the tracking overhead can always be kept at a very low level, allowing users to confidently use it. Initially our sampling strategy was simple to select a trace in every 1024 requests, which is guaranteed to keep track of valuable information for a highly requested service, but for those with low load, it can lead to too little sampling frequency, thus omitting important information. So we decided to use time as a sampling unit to ensure that a fixed number of samples can be sampled in a unit of time, so that the sampling frequency and overhead are better controlled.
Applicationusers can access tracking data directly through DAPI (Dapper "Depot API"). DAPI provides the following access methods: Specify the trace ID to access, large-scale batch access, the user can be accessed through the MapReduce job in parallel, the user only need to implement a dapper trace as a parameter of the virtual function, within the function to complete their own processing can be, The framework is responsible for calling the function for all traces within the specified time period for the user, and for access by index, dapper indexes the trace data, which can be queried by the user because the trace ID is randomly generated, so the user usually needs to retrieve it through the service name or machine name ( In fact, the dapper is indexed by (service name, machine name, timestamp).
Most users are using dapper through an interactive web interface, as shown in the typical process:1. The user enters the service and time window that he is interested in, selects the corresponding tracking mode (here is the span name), and one of his most concerned metrics (this is the service delay)2. A performance summary of all distributed execution processes for a given service is displayed on the page, and the user may sort them as needed, then select a Detailed View3. Once the user has selected an execution process, there will be a graphical representation of the execution process, which the user can click to select the process they care about.4. The system displays a histogram based on the measurement parameters selected by the user in 1, as well as the specific procedure selected in 3. Here is a histogram of the delay distributions for the Getdocs, and the user can click on the example on the right to select a specific execution process to view5. Displays specific information about the execution process, above a timeline where users can expand or collapse to see the cost of each component of the execution, where green represents processing time, and blue represents time spent on the network.
Lessons Learnedusing dapper during development can help users improve performance (analyze request latency, identify unnecessary serialization on critical paths), perform correctness checks (whether the user request is sent correctly to the service provider), understand the system (request processing may depend on many other systems, Dapper helps users understand overall latency, redesign minimized dependencies, and tests (to verify system behavior and performance before new code release through a dapper tracking test). For example, the delay was reduced by two orders of magnitude by using the Dapper,ads review team.
We also integrate the tracking system with the anomaly monitoring system, if the exception occurs in a sampled dapper tracer context, then the corresponding trace and span IDs will also be used as metadata for exception reporting, and the front-end exception monitoring service will also provide links to the tracking system. This helps the user to understand what happens when an exception occurs.
Resolve long tail delays to help users analyze latency issues in complex system environments. The instantaneous degradation of network performance does not affect the throughput rate of the system, but has a significant impact on latency. Many expensive query patterns are caused by unexpected interactions between services, and the advent of dapper makesThis kind of problemdiscovery became very easy.
the inference that helps the user to rely on the service. Google maintains a very large number of clusters, each hosting a variety of tasks, and there may be dependencies between tasks. But each task needs to know exactly what service information it relies on to help identify bottlenecks or move services. The dependencies between services are complex and dynamic, and relying solely on configuration files is difficult to judge. But by using Dapper's trace information and the Dapi MapReduce interface, you can automate the determination of inter-service dependencies.
Help network administrators analyze the application layer of network activity across clusters. Help identify the cause of some expensive network request overhead.
Many storage systems are shared. For example, GFS, there will be many users, some may be direct access to GFS, some may be such as through the bigtable to generate access to GFS, if there is no dapper, this shared system will be difficult to debug, through the dapper provided by the data, The owner of a shared service can easily sort the user based on metrics such as Network load and request time.
Fire. But not all fire fighting, can use dapper to complete. For example, dapper users who are fighting fires need access to the latest data, but he may not have time to write a new DAPI code or wait for periodic reports to occur. For services that are experiencing high latency, the dapper user interface is not suitable for quickly locating latency bottlenecks. However, it is possible to interact directly with Dapper Daemon, which makes it easy to collect the latest data. In the event of a catastrophic failure, there is usually no need to look at the statistical results, a single example can explain the problem. But for those shared storage services, the informationAggregationwill be very important. For shared services, dapper aggregation information can be used for post-mortem analysis. But if the aggregation of this information cannot be completed within 10 minutes of the problem, the effect will be greatly weakened, so for shared services, the dapper is not as good as it should be in the fire.
by opening the data to the user, the user's creativity has been stimulated and many unexpected applications have been created. applications that do not have tracking capabilities only need to recompile their programs with the new libraries, and the tracking functionality is easy to migrate.
There are, however, some areas where improvements are needed:
the impact of the merger. We usually assume that the various subsystems will process one request at a time. However, in some cases, the request is cached and then a single operation is performed on a set of requests (such as Disk writes). In this case, the tracked request is not actually a process of its own.
Track the bulk processing. Although Dapper is designed for online service systems, it was originally intended to understand the series of system behavior that a user has sent to Google. But offline data processing actually has this kind of demand. In this case, we can associate the trace ID with some meaningful unit of work, such as a key in the input data (or a key range).
Look for the root cause. Dapper can quickly find the system's short board, but it is less efficient in finding the root cause. For example, a request slows down probably not because of its own reasons, but because there are many requests queued before it. Users can give certain parameters, such as queue size, to the tracking system at the application level.
Log kernel-level information. We have many tools to track and profiling the kernel execution process, but it is difficult to gracefully implement the kernel-level information directly bundled into the trace context of an application layer. We intend to adopt a compromise solution by getting a snapshot of the kernel activity parameters at the application layer and then associating them with an active span.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Google's large-scale distributed system monitoring infrastructure dapper