This is a creation in Article, where the information may have evolved or changed.
There are already a lot of web frameworks in go ecology, but it feels like no one fits our idea, we want a 简洁高效
core framework that provides 路由
, context
中间件
and 依赖注入
, and refuses to use 正则
and 反射
so we start to build the BAA framework. At first, the simplest way to use the first version of the function, the basic available, but the performance rotten to explode, the path of optimization is open.
The best article should be each step plus before and after the benchmark comparison results, to the reader with the most intuitive feeling. I first BS myself, because I am lazy, did not go back to compare this result diagram.
Reject Regular and reflection
This is a basic principle when we make this framework, the whole implementation has not used RegExp, reflect package. This is the basis for our performance pursuit. Another benefit is that without magic, it's a very easy-to-understand implementation that makes the whole framework simple.
Use Sync. Pool Reuse Objects
These methods are described in my last translated article Cockroachdb GC optimization Summary, which is also described by the author in the Go Language Bible, using sync. Pool can reuse objects from one GC to avoid frequent creation of objects and memory allocations. In the pursuit of performance, we want to minimize or even achieve 0 allocation of memory, this is one of the most important usage.
The following code snippet is in the BAA:
b.pool = sync.Pool{ New: func() interface{} { return newContext(nil, nil, b) },}
When used:
c := b.pool.Get().(*Context)c.reset(w, r)
End of Use:
b.pool.Put(c)
Using array to optimize slice
The essence of slice is a variable-length array that dynamically reallocate memory migration data based on the capacity of the storage. If the length is constantly changing, it can cause constant reallocation of memory, and in certain scenarios, if we can use a fixed length array to optimize memory allocation.
var nameArr [1024]stringpNames := nameArr[0:0]pNames = append(pNames, "val")
Pnames is a slice, but data operations are always done on the array Namearr, and the memory is not redistributed throughout the use process.
The above pseudocode, which no longer exists in the BAA, has been replaced by the following technique instead of the fixed-length array.
Slice can also be reused
Slice reuse, in fact, and the above using the array optimization basically consistent, is the initial allocation of a larger capacity, as far as possible in the use of the process will not exceed the capacity, of course, do not worry, if not enough, will automatically expand, but will be a memory allocation.
The following code snippet is in the BAA:
// newContext create a http contextfunc newContext(w http.ResponseWriter, r *http.Request, b *Baa) *Context { c := new(Context) c.Resp = NewResponse(w, b) c.baa = b c.pNames = make([]string, 0, 32) c.pValues = make([]string, 0, 32) c.handlers = make([]HandlerFunc, len(b.middleware), len(b.middleware)+3) copy(c.handlers, b.middleware) c.reset(w, r) return c}// reset ...func (c *Context) reset(w http.ResponseWriter, r *http.Request) { c.Resp.reset(w) c.Req = r c.hi = 0 c.handlers = c.handlers[:len(c.baa.middleware)] c.pNames = c.pNames[:0] c.pValues = c.pValues[:0] c.store = nil}
Note the c.pnames and c.pvalues in Newcontext and the C.pnames and c.pvalues in Reset, using slice[:0] to reuse the previous slice to avoid memory reallocation. As for the length 32 above, it is a value based on experience, and as far as possible to ensure that the length to meet the needs of most cases is not too large.
Rewrite routes using radix tree
黑夜路人群
A question was also discussed in the previous section: are algorithms and data structures useful in the actual work? To tell the truth, the general situation is really not how to use, but here is a scene.
In the first version, the route is a map, the route match is a range, simple, clear, but the performance is not natural. Reference macaron
and the echo
framework of the design, are used 基数树(radix tree)
to achieve, but the implementation of the details of different, here we also have different details to achieve, but the idea is basically unchanged. Specific implementations can refer to wikis, and BAA router parts router.go
The performance of string is not how
Many articles have been introduced, try to use []byte instead of string, here we do the same.
Map's range is inefficient.
Map and slice range performance is poor by an order of magnitude ah, so, you will find that we canceled a large number of maps to slice, in slice也能重用
This section of the code example Pnames and pvalues is used to replace the original map[string]string, Because the map range is inefficient.
There's a cost to an iteration.
Slice's iteration is fast, but it's always iterative, and it's an iterative overhead, and it's crazy to pursue extreme performance. When routing matches, we set the single-byte index to all route pattern, and if the first letter does not match, there is no need to continue the subsequent character matching.
Route entry creation:
// newRoute create a route itemfunc newRoute(pattern string, handles []HandlerFunc, router *Router) *Route { r := new(Route) r.pattern = pattern r.alpha = pattern[0] r.handlers = handles r.router = router r.children = make([]*Route, 0) return r}
Route entry matches:
// findChild find child static routefunc (r *Route) findChild(b byte) *Route { var i int var l = len(r.children) for ; i < l; i++ { if r.children[i].alpha == b && !r.children[i].hasParam { return r.children[i] } } return nil}
Note that it r.alpha
is used to avoid further performance improvements as much as possible.
Defer is also only convenient
In the pursuit of extreme performance on the road, I am almost crazy, in the process of step-by-step testing, found to remove defer can also improve some performance, the 雨痕学堂
public number of an article also mentioned this problem, because defer has additional overhead to ensure that deferred calls and even panic can be executed, And most of the time we can terminate the program at the end, avoid defer mechanism, and then a little bit faster.
Function calls are also overhead
Closer to the target, but there is a little gap, we are getting crazy, and finally actually do it, we have to cancel some of the frequently called functions, instead of directly in a function to complete, because we found that even if just a function call, TMD is the overhead ah.
Pprof is an artifact.
Throughout the process, how to analyze performance issues in step-by-step, location-optimised places, go test-cpuprofile, go test-memprofile, go test-bench are the best tools, every time you modify, bench see the results, Profile to see performance analysis.
Summarize
This article briefly summarizes the various techniques in the optimization process, as well as some code examples, more use of posture, self-experience, welcome to exchange and shoot bricks.