Summarize the progressive optimization of Go language performance from BAA development

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

There are already a lot of web frameworks in go ecology, but it feels like no one fits our idea, we want a 简洁高效 core framework that provides 路由 , context 中间件 and 依赖注入 , and refuses to use 正则 and 反射 so we start to build the BAA framework. At first, the simplest way to use the first version of the function, the basic available, but the performance rotten to explode, the path of optimization is open.

The best article should be each step plus before and after the benchmark comparison results, to the reader with the most intuitive feeling. I first BS myself, because I am lazy, did not go back to compare this result diagram.

Reject Regular and reflection

This is a basic principle when we make this framework, the whole implementation has not used RegExp, reflect package. This is the basis for our performance pursuit. Another benefit is that without magic, it's a very easy-to-understand implementation that makes the whole framework simple.

Use Sync. Pool Reuse Objects

These methods are described in my last translated article Cockroachdb GC optimization Summary, which is also described by the author in the Go Language Bible, using sync. Pool can reuse objects from one GC to avoid frequent creation of objects and memory allocations. In the pursuit of performance, we want to minimize or even achieve 0 allocation of memory, this is one of the most important usage.

The following code snippet is in the BAA:

b.pool = sync.Pool{    New: func() interface{} {        return newContext(nil, nil, b)    },}

When used:

c := b.pool.Get().(*Context)c.reset(w, r)

End of Use:

b.pool.Put(c)

Using array to optimize slice

The essence of slice is a variable-length array that dynamically reallocate memory migration data based on the capacity of the storage. If the length is constantly changing, it can cause constant reallocation of memory, and in certain scenarios, if we can use a fixed length array to optimize memory allocation.

var nameArr [1024]stringpNames := nameArr[0:0]pNames = append(pNames, "val")

Pnames is a slice, but data operations are always done on the array Namearr, and the memory is not redistributed throughout the use process.

The above pseudocode, which no longer exists in the BAA, has been replaced by the following technique instead of the fixed-length array.

Slice can also be reused

Slice reuse, in fact, and the above using the array optimization basically consistent, is the initial allocation of a larger capacity, as far as possible in the use of the process will not exceed the capacity, of course, do not worry, if not enough, will automatically expand, but will be a memory allocation.

The following code snippet is in the BAA:

// newContext create a http contextfunc newContext(w http.ResponseWriter, r *http.Request, b *Baa) *Context {    c := new(Context)    c.Resp = NewResponse(w, b)    c.baa = b    c.pNames = make([]string, 0, 32)    c.pValues = make([]string, 0, 32)    c.handlers = make([]HandlerFunc, len(b.middleware), len(b.middleware)+3)    copy(c.handlers, b.middleware)    c.reset(w, r)    return c}// reset ...func (c *Context) reset(w http.ResponseWriter, r *http.Request) {    c.Resp.reset(w)    c.Req = r    c.hi = 0    c.handlers = c.handlers[:len(c.baa.middleware)]    c.pNames = c.pNames[:0]    c.pValues = c.pValues[:0]    c.store = nil}

Note the c.pnames and c.pvalues in Newcontext and the C.pnames and c.pvalues in Reset, using slice[:0] to reuse the previous slice to avoid memory reallocation. As for the length 32 above, it is a value based on experience, and as far as possible to ensure that the length to meet the needs of most cases is not too large.

Rewrite routes using radix tree

黑夜路人群A question was also discussed in the previous section: are algorithms and data structures useful in the actual work? To tell the truth, the general situation is really not how to use, but here is a scene.

In the first version, the route is a map, the route match is a range, simple, clear, but the performance is not natural. Reference macaron and the echo framework of the design, are used 基数树(radix tree) to achieve, but the implementation of the details of different, here we also have different details to achieve, but the idea is basically unchanged. Specific implementations can refer to wikis, and BAA router parts router.go

The performance of string is not how

Many articles have been introduced, try to use []byte instead of string, here we do the same.

Map's range is inefficient.

Map and slice range performance is poor by an order of magnitude ah, so, you will find that we canceled a large number of maps to slice, in slice也能重用 This section of the code example Pnames and pvalues is used to replace the original map[string]string, Because the map range is inefficient.

There's a cost to an iteration.

Slice's iteration is fast, but it's always iterative, and it's an iterative overhead, and it's crazy to pursue extreme performance. When routing matches, we set the single-byte index to all route pattern, and if the first letter does not match, there is no need to continue the subsequent character matching.

Route entry creation:

// newRoute create a route itemfunc newRoute(pattern string, handles []HandlerFunc, router *Router) *Route {    r := new(Route)    r.pattern = pattern    r.alpha = pattern[0]    r.handlers = handles    r.router = router    r.children = make([]*Route, 0)    return r}

Route entry matches:

// findChild find child static routefunc (r *Route) findChild(b byte) *Route {    var i int    var l = len(r.children)    for ; i < l; i++ {        if r.children[i].alpha == b && !r.children[i].hasParam {            return r.children[i]        }    }    return nil}

Note that it r.alpha is used to avoid further performance improvements as much as possible.

Defer is also only convenient

In the pursuit of extreme performance on the road, I am almost crazy, in the process of step-by-step testing, found to remove defer can also improve some performance, the 雨痕学堂 public number of an article also mentioned this problem, because defer has additional overhead to ensure that deferred calls and even panic can be executed, And most of the time we can terminate the program at the end, avoid defer mechanism, and then a little bit faster.

Function calls are also overhead

Closer to the target, but there is a little gap, we are getting crazy, and finally actually do it, we have to cancel some of the frequently called functions, instead of directly in a function to complete, because we found that even if just a function call, TMD is the overhead ah.

Pprof is an artifact.

Throughout the process, how to analyze performance issues in step-by-step, location-optimised places, go test-cpuprofile, go test-memprofile, go test-bench are the best tools, every time you modify, bench see the results, Profile to see performance analysis.

Summarize

This article briefly summarizes the various techniques in the optimization process, as well as some code examples, more use of posture, self-experience, welcome to exchange and shoot bricks.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.