Performance debugging issues for GO programs-CPU

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Note: The original text of this article debugging performance issues in Go programs written by Dmitry Vyukov in 05/10/2014-07:06
Note: The original text is too long, we have to see all the text, other parts, follow-up slowly translation.

Let's say you want to improve the performance of your go program. Here are some tools to help you accomplish this task. These tools can help you locate multiple types of hotspots (Cpu,io, memory), and you have to focus on where hotspots occur in order to significantly improve program performance. However, another result is possible-these tools can help you identify obvious performance flaws in your program. For example, when you start each program, you can prepare an SQL statement before each query. Another example is if an O(N^2) algorithm is somehow caught up in a situation that is clearly present and expected O(N) . In order to determine such a situation, you need to check the completeness of the profiles you see in the. For example, in the first case, a large amount of time is spent on the preparation of SQL statements, which is beyond the alarm line.

It is also important to understand the various boundary factors that affect performance. For example, a program communicates through a bandwidth connection of up to two Mbps, and it's already using more than a half Mbps, and there's nothing that can be done to improve the performance of this program. These similar boundary factors include disk IO, memory consumption, and computational tasks.

With this in mind, we can look at these available tools.

Note: These tools may interfere with each other, for example, accurate memory analysis may affect CPU analysis. Goroutine blocking analysis can affect scheduling tracking and so on, using these tools in isolation to get more accurate information.

Note: All descriptions here are based on the Go1.3 version of the

CPU Analyzer

Go runtime contains the built-in CPU analyzer, which shows the percentage of CPU time consumed by the function, and here you have 3 ways to access it:

    1. The simplest way is to use go test the-cpuprofile tag of the command. For example, the following command:
      $ go test -run=none -bench=ClientServerParallel4 -cpuprofile=cprof net/http
      Writes a profile of the baseline and CPU configured to the cprof file.
      And then:
      $ go tool pprof --text http.test cprof
      A list of hotspot functions will be printed.
      There are several output types available, the most useful being the following:--text,--web and--list. Run go tool pprof to get the most complete list.
      The most obvious feature of this option is that it is only available for testing.
    2. Net/http/pprof package. This is the ideal solution for network services. You only need to import net/http/pprof collect the profile with the following command:
$ go tool pprof --text mybin http://myserver:6060:/debug/pprof/profile
    1. Manual configuration collection. You need to import runtime/pprof and add the following code to the main function:
if *flagCpuprofile != "" {    f, err := os.Create(*flagCpuprofile)    if err != nil {        log.Fatal(err)    }    pprof.StartCPUProfile(f)    defer pprof.StopCPUProfile()}

This profile will be written to the specified file, imagining it as the first option in the command line.

Here is an --web example of an intuitive profile that is generated using options:

You can use the -list=funcname options to study a single function, and the following profile shows the time spent in the Append function:

.      .   93: func (bp *buffer) WriteRune(r rune) error {.      .   94:     if r < utf8.RuneSelf {5      5   95:         *bp = append(*bp, byte(r)).      .   96:         return nil.      .   97:     }.      .   98: .      .   99:     b := *bp.      .  100:     n := len(b).      .  101:     for n+utf8.UTFMax > cap(b) {.      .  102:         b = append(b, 0).      .  103:     }.      .  104:     w := utf8.EncodeRune(b[n:n+utf8.UTFMax], r).      .  105:     *bp = b[:n+w].      .  106:     return nil.      .  107: }

Here you can find the pprof details of the tool and a numerical map of the description.

When it cannot unlock the stack, there are 3 special entries for the parser to use: GC, System and Externalcode. The GC represents the time spent in memory garbage collection, as described in the Memory Analyzer and garbage Collector Trace Optimization recommendations section below. System represents the time spent goroutine scheduling, stack management code, and other auxiliary runtime code. Externalcode represents the time spent by the local dynamic library.

Here are some tips on how to interpret the information you see in profiles.

If a lot of time is spent on runtime.mallocgc functions, the program may spend too much on small memory allocations. Profile will tell you where to assign from and see how the Memory Analyzer recommends optimizing the section.

If a lot of time is spent in the channel Operations section, the sync.Mutex code and other synchronization primitives or system components, the program may be competing for resources. Consider refactoring programs to eliminate frequent access to shared resources. Common technologies include sharding, partitioning, local buffering/batch processing, and write-time copy technology.

If you spend a lot of time on syscall.Read/Write it, the program may cost too much to read and write. Bufio Packaging os.File net.Conn may be helpful in this case.

If a lot of time is spent on GC components, the program is not allocating too many temporary objects, or the heap size setting is too small to cause garbage collection to occur frequently. See the garbage collector Trace and Memory Analyzer Tuning recommendations section.

Note: CPU Profiler is not currently working on Darwin
Note: On a Windows Server you need to install Cygwin, Perl and Graphviz to generate Svg/web format Profile
Note: On Linux You can try Perf System Profiler, it cannot unlock the GO stack, but it can parse and unlock cgo/swig code and kernel. So it is very useful in locating and analyzing local/kernel performance bottlenecks.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.