Golang Locating and Optimizing GC problem case (i)

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Background:

Read a large file of 500w rows, read each row of data to do the data integration after merging, followed by a certain logic and algorithm processing after the deposit into Redis.

File format:

URL address user 32-bit ID number of clicks

Http://jingpin.pgc.panda.tv/hd/xiaopianpian.html AAAAAAAAAAAAAAAAAAAA 5

Specific scenarios:

This section first looks at the most simple case of large file processing, that is, in the process of reading a file for each line of the file to open a co-operation to do the data merge, to see the location and optimization of the idea of the situation.

Problem phenomenon:

If the entire file is executed serially for data integration, it takes only 4 or 5s to complete, but each line of concurrent processing takes a few 10 seconds to a few 10 minutes.

The code is as follows:

Simple.go

Package Mainimport ("Bufio" "FMT" "io" "io/ioutil" "Log" "Net/http" _ "net/http/pprof" "OS" "Singleflight" "StrConv" " Strings "Sync" "Time" "Utils") VAR (wg sync. WAITGROUPMU sync. Rwmutex//Global Lock single = &singleflight. task{}) Func main () {defer func () {if err: = Recover (); Err! = nil {fmt. PRINTLN (Err)}} () go func () {log. Println (http. Listenandserve ("localhost:8080", Nil)} () File: = "/data/origin_data/part-r-00000" If fp, err: = OS. Open (file); Err! = Nil {panic (err)} else {start: = time. Now () defer FP. Close () defer func () {//time consuming FMT. Println ("Time Cost:", time.) Now (). Sub (Start))} ()//Count the number of click users per URL hostnums: = Hostsstat () buf: = Bufio. Newreader (fp) Hosttofans: = Make (map[string]utils. midlist)//[url][] User Idfor {line, err: = buf. ReadString (' \ n ') if err! = Nil {if Err = = Io. EOF {//Encounters end-of-line FMT. Println ("Meet the End") Break//Jump out of the dead Loop}panic (err)}//each line to process the WG separately. ADD (1) Go Handleline (line, Hosttofans, Hostnums)}wg. Wait () fmt. Println ("*************************handle file Data complete************************")}}//PlaceThe data for each row is Func handleline (line string, Hosttofans map[string]utils. Midlist, Hostnums map[string]int) {defer WG. Done () line = strings. Trimspace (line): = Strings. Split (line, "\ T")//first to determine whether it is a legitimate site urlschemes: = Strings. Split (Components[0], "/") if Utils. In_array (Utils. Validplatforms, schemes[2]) = = False {fmt. Println ("Invalid URL:", components[0]) return}mu. Rlock () If _, OK: = Hosttofans[components[0]]; OK {mu. Runlock () Click_times, _: = StrConv. Atoi (components[2]) mu. Lock () hosttofans[components[0]] = hosttofans[components[0]]. Append (Components[1], click_times) mu. Unlock ()} else {//next urlmu. Runlock () Startelement: = false//To identify whether the initial element of a URL statistic//singleflight code prevents multiple simultaneous accesses to the same URL, the URL corresponding to the []string has not been initialized, resulting in multiple m The execution of the Ake code is single. Do (Components[0], func () (interface{}, error) {mu. Rlock () If _, OK: = Hosttofans[components[0]]; OK {//once again, to prevent high concurrency, multiple write map operations of the same URL will enter the step mu to reallocate space. Runlock () return nil, Nil}mu. Runlock () Mu. Lock () Click_times, _: = StrConv. Atoi (components[2]) hosttofans[components[0] =Utils. Newmidlist (Hostnums[components[0]]) hosttofans[components[0] [= Hosttofans[components[0]]. Append (Components[1], click_times) mu. Unlock () startelement = Truereturn nil, nil}) if!startelement {mu. Lock () Click_times, _: = StrConv. Atoi (components[2]) hosttofans[components[0]] = hosttofans[components[0]. Append (Components[1], click_times) mu. Unlock ()}}}//for URL: User's statistics file This file lists the number of independent users per URL func hostsstat () map[string]int {hoststats: = "./scripts/data/stat.txt "Bytes, _: = Ioutil. ReadFile (hoststats)//....some Code....return Hostnums}

After executing this code, it is found that the program executes shortly after the memory consumption of the miso is up to 90%+, and after a while the CPU takes up to a very low level, but the load remains overloaded again. So make a bold guess because the GC caused the process to be rammed. later with Pprof and Gctrace also confirmed the idea, if the pprof and gctrace not too clear classmate can see the author before the article golang How to troubleshoot and locate GC problem.

In fact, in the author's first code, there have been some places in the attention to reduce the consumption of memory, for example, in the initialization of each URL corresponding to the user ID collection of reference Groupcache Singleflight, to ensure that the application space will not be repeated multiple times , such as the URL corresponding to the user ID slice, first calculate the specific size andmake. Although the code is simple, the code above clearly has some problems.

Through pprof, I found that during the process of program execution, most of the time was spent on GC, such as. The red line is all functions related to GC. So the question becomes to troubleshoot why the GC takes so long.

In most cases the GC is caused by the memory allocated to a certain threshold, and it is clear that in this case, the memory footprint is stable at 90+. So why is this process taking up so much memory? The author has been trying to use Pprof's heap and profile to analyze the problem, but it has been fruitless. Until one time the status of Goroutine is viewed through pprof, it is found that the currently working process is up to hundreds of thousands of, sometimes reaching a magnitude of nearly 150w, such as. This can explain a part of the problem, if a single co-process is 3K in size, then when the number of processes reached million, even if there is nothing in the co-process will occupy 4G of memory. And the author in the experiment Machine only 8g of memory, so certainly will appear memory is eaten full of frequent GC cause process tamping.

So the first step, must be to control the current number of processes, not unlimited growth. In the loop that reads the text content, plus the calculation of the number of rows, so that at each threshold, you can take a break and postpone the rate at which the co-process grows. With the restriction, the process is no longer stuck and the entire execution time is stable between 20~30s.

iterator := 0for {        line, err := buf.ReadString('\n')        if err != nil {                if err == io.EOF { //遭遇行尾                        fmt.Println("meet the end")                        break //跳出死循环                }                   panic(err)        }           //每一行单独处理 这里需要加逻辑防止并发过大导致大量占用cpu和内存,使得整个进程因为gc夯住,        //可以每读10w行左右就休息一会降低一下程序同时在线的协程        iterator++        if iterator <= 120000 {                wg.Add(1)                go handleLine(line, hostToFans, hostNums)        } else {                iterator = 1                <-time.After(130 * time.Millisecond) //暂停需要5s左右                wg.Add(1)                go handleLine(line, hostToFans, hostNums)        }}wg.Wait()

Comparing the results of serial execution, it can be found that even though the concurrency is now stable, it is still much slower to run the sleep time than the serial execution, so there is definitely room for optimization. This time, Pprof's profile and heap analysis will have a place to play. Take a look at the following two graphs, namely memory consumption and CPU consumption graphs:

CPU Time-consuming analysis

Memory Footprint Analysis

It only captures a portion of the graph, but from there we can find a place to optimize. You can see strings. The split function is time consuming and memory intensive, mainly because it generates slice. analysis of the previous code can be found at least to determine whether the URL is a legitimate site this piece of strings. Split is not allowed. Not only will there be extra running time, but it will also generate slice that consumes memory causing the GC. so the transformation of this function:

//先判断是否是合法平台的主播if is_valid_platform(utils.ValidPlatforms, components[0]) == false {        fmt.Println("invalid url: ", components[0])        return nil, errors.New("invalid url")}func is_valid_platform(platforms []string, hostUrl string) bool {                                                                                                                                                         for _, platform := range platforms {                if strings.Index(hostUrl, platform) != -1 {                        return true                }        }        return false}

This reduces the allocation space caused by unnecessary slice. After the change is done, the whole task is stable around 15s, minus the sleep time is 10s. In fact, the optimization is almost there, but there is still a place to look.

The above heap analysis diagram can be seen in fact singleflight. (*task). The DO function consumes more memory and also consumes a lot of CPU time, as follows:

In addition to some native functions, it is the highest, and the function will only be executed every time a new URL appears. You can see the main structure of the singleflight, you will find that the pointer variable is used, while the pointer variable in the GC will cause two traversal , making the whole GC slower. Although I use the singleflight here must not be modified, but if possible, try to use less pointers.

This section can only be a simple talk about the optimization of the idea and process, I hope the next energy saving the full version of the optimization program written out.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.