This is a creation in Article, where the information may have evolved or changed. Diagnosing and debugging complex systems is a complex matter. Multiple levels of diagnostic data are often needed to figure out the possible causes of latency problems. A distributed system consists of a number of servers that depend on each other to complete the service requested by the user. At any time,-a process in the system may handle a large number of requests. -In high-concurrency servers, there is no easy way to separate events from one request lifecycle. -In highly concurrent servers, we have no good visibility into events that occur in response to a request. As Go has become a popular language for writing servers in recent years, we are aware of the need to understand what happened in the go process over a request's life cycle. Many runtime activities occur during program execution: scheduling, memory allocation, garbage collection, and so on. However, it is not possible in the past to associate user code with runtime events and to help users analyze how runtime events affect their performance. Web site Reliability Engineers (SRE) may ask for help diagnosing a delay problem and want someone to help them understand and optimize a particular server. Even for those who are proficient in Go, it is also very complicated to estimate the impact of the runtime on the particular situation they are experiencing. There is no easy way to explain why some requests are delayed very much. Distributed tracking allows us to pinpoint which handler need to be looked at carefully, but we need to dig deep down. Is the GC, scheduler, or I/O allowing us to wait so long to service a request? -sre in people outside the system, when they experience delays, they only know the fact that they have waited longer than expected (356MS) to receive a response, except they don't know anything else. [] (https://raw.githubusercontent.com/studygolang/gctt-images/master/debugging-latency/1.png) * User access using GET method/ Messages waits for 356 MS to receive a response. * From the point of view of developers who have access to distribution tracking, it is possible to see the details of the delay and how much each service contributes to the total delay. With distributed tracking, we have better visibility of the situation. [] (https://raw.githubusercontent.com/studygolang/gctt-images/master/debugging-latency/2.png) * See the delay details through the distributed tracking you collect. * In this case, in order to serve/messages, we have written three internal Rpc:auth. AccEsstoken, Cache. Lookup and Spanner. Query. We can see how much each RPC contributes to the delay. This time, we see is **auth. accesstoken** spent a long time than usual. We succeeded in narrowing the problem down to a specific service. We can view auth by associating a particular process. Accesstoken The source code, find the codes that lead to high latency, or we can take a look at whether the problem recurs on an instance of a validation service. Under Go 1.11, we will have additional support for the execution tracker to be able to indicate runtime events at the time of the RPC call. With this new feature, users can gather more information about what happens to a call life cycle. In this case, we will focus on Auth. Accesstoken the part within the range. A total of 30 + 18µs were spent on the network, blocking system calls 5µs, garbage collection 21µs, real execution handler spent 123µs, most of which were spent on serialization and deserialization. [] (Https://raw.githubusercontent.com/studygolang/gctt-images/master/debugging-latency/3.png) by looking at this level of detail, We can finally say that we unfortunately encountered a GC during the RPC call, and we took a long time to serialize/deserialize. Then the engineers can point out the recent auth. Accesstoken the changes to the message, thereby improving performance issues. They can also see whether garbage collection often affects the processing of this request on critical paths and optimizes the way memory is allocated. # # Go 1.11 under Go 1.11, Go's execution tracker will introduce some new concepts, APIs and tracking features:-User events and user annotations, see [Runtime/trace] (http://tip.golang.org/pkg/runtime/ Trace).-The association between the user code and the runtime. -The feasibility of implementing a tracker to correlate with distributed tracking. The execution tracker introduces two upper-level concepts: *region* and *task*, so users can plug in their code. Region is the area of code where you want to collect tracking data. A region begins and ends within the same goroutine. On the other hand, the task is a logically grouped group that is associated with the region. The start and end of a task can be in different goroutThe INE. We expect the user to launch an execution tracker for each distributed tracking span, create a region, enable the execution tracker as soon as the problem occurs, record some data, analyze the output, and make a full-scale pile-up of their RPC framework. # # Hands-on testing Although these new features can only be used in Go 1.11, you can still [follow the instructions to install from the source] (Http://golang.org/doc/install/source). I also recommend that you try it in distributed tracking. I recently added support for a task in the execution tracker in the span created by [census] (https://github.com/census-instrumentation/opencensus-go). "' Goimport (" Runtime/trace "" Go.opencensus.io/trace ") ctx, Span: = Trace. Startspan (CTX, "/messages") defer span. End () trace. Withregion (CTX, "Connection.init", Conn.init) "If you are using [Grpc] (https://github.com/census-instrumentation/ OPENCENSUS-GO/TREE/MASTER/EXAMPLES/GRPC) or [HTTP] (https://github.com/census-instrumentation/opencensus-go/tree/ Master/examples/http), then you don't need to create the span manually because they've been created automatically. Inside your handler, you can simply dock the received context using Runtime/trace. Register [Pprof. Trace] (https://golang.org/pkg/net/http/pprof/#Trace) handler when you need to perform a data collection with the data of the execution tracker. "Goimport _" Net/http/pprof "Go func () {log. Println (http. Listenandserve ("localhost:6060", Nil)} () "If you need to execute the tracker's data, log it right away and start the visualizer:" ' $ curl http://server:6060/debug/pprof/trace?seconds=5-o trace.out$ Go tool trace trace.out2018/05/04 10:39:59 parsing trace ... 2018/05/04 10:39:59 Splitting trace ... 2018/05/04 10:39:59 Opening Browser. Trace Viewer is listening on http://127.0.0.1:51803 "and then you can see it in/usertasks by HelloWorld. Greeter.sayhello creates a distribution of the execution tracker task. [] (Https://raw.githubusercontent.com/studygolang/gctt-images/master/debugging-latency/4.png) The time distribution of the *RPC task. * You can click on the 3981µs bucket to further analyze what happened in that particular RPC life cycle. At the same time,/userregions lets you list the collected region. You can see the region of Connection.init and the corresponding number of records. (Note that Connection.init is manually integrated into the source code of the GRPC framework for demonstration purposes, and more of the piling work is still in progress.) )! [] (Https://raw.githubusercontent.com/studygolang/gctt-images/master/debugging-latency/5.png) The time distribution of *region. * If you click on any of the links, it will give you more information about the region in the deferred bucket. In the following example, we see a bucket in the region located in 1000µs. [] (Https://raw.githubusercontent.com/studygolang/gctt-images/master/debugging-latency/6.png) The region of *1000µs is waiting Additional time was spent on the GC and the scheduler. * So you see a fine-grained delay detail. You can see that the 1309µs region overlaps the garbage collection. This increases the number of critical paths in the form of garbage collection and scheduling.Pin. In addition, the execution of handler and the processing of blocked system calls took about the same amount of time. # # Limitations Although the features of the new execution tracker are powerful, there are some limitations. -Region can only start and end in the same goroutine. The execution tracker is not currently able to automatically record data that spans multiple goroutine. This requires us to manually insert the region. The next big step would be to add fine-grained piles in standard packages such as the RPC framework and net/http. -The format of the output of the tracer is difficult to parse, and ' Go tool trace ' is the only standard tool to understand this format. There is no simple way to automatically correlate the data of the execution tracker with the distributed tracking data-so we collect them separately and then do the correlation. # # # Conclusion Go is trying to become a powerful online service runtime. With data from the execution tracker, we are a step closer to high visibility of the online server, and more actionable data is available when the problem arises.
Via:https://medium.com/observability/debugging-latency-in-go-1-11-9f97a7910d68
Author: JBD Translator: Krystollia proofreading: polaris1119
This article by GCTT original compilation, go language Chinese network honor launches
This article was originally translated by GCTT and the Go Language Chinese network. Also want to join the ranks of translators, for open source to do some of their own contribution? Welcome to join Gctt!
Translation work and translations are published only for the purpose of learning and communication, translation work in accordance with the provisions of the CC-BY-NC-SA agreement, if our work has violated your interests, please contact us promptly.
Welcome to the CC-BY-NC-SA agreement, please mark and keep the original/translation link and author/translator information in the text.
The article only represents the author's knowledge and views, if there are different points of view, please line up downstairs to spit groove
2,197 Reads