Go learn to organize your notes

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Recently, some of the key points in go learning, including go performance optimization and the underlying source code points, make this note and consolidate the basics.

One, concurrent

1. The goroutine of Go language is similar to the combination of thread and co-process, which can maximize execution efficiency and play multi-core processing ability.

2, usually, with multi-process to achieve distributed load balancing, reduce the single process garbage collection pressure, multi-threading (LWP) to rob more processor resources, using the process to improve processor time slice utilization;

3, compared to the system default MB level of line stacks, Goroutine custom stack initially only need 2KB, so you can create thousands of concurrent tasks. The custom stack adopts the on-demand strategy, which can be scaled up to GB scale when needed;

4. Go may create many threads at runtime, but only a limited number of threads are involved in concurrent task execution at any time. The amount is equal to the number of processor cores by default and can be runtime. Gomaxprocs function or environment variable modification;

5. Channel:

1), channels (channel) equivalent to a concurrent security queue;

2), Goroutine leak refers to goroutine in the sending or receiving blocking state, but has not been awakened, the garbage collector does not collect such resources, causing them to sleep in the queue for a long time, forming a resource leak;

6. Synchronization:

1), channels are not used to replace locks, they have their own different usage scenarios. The channel tends to solve the concurrent processing architecture at the logical level, while the lock is used to protect the local data security.

2), when the mutex as an anonymous field, the relevant method must be implemented as Pointer-receiver mode, or it will cause the lock mechanism to fail because of replication;

3), the mutex lock granularity should be controlled within the minimum range, early release;

7. Recommendation:

1), the performance requirements are high, should avoid the use of defer Unlock;

2), read and write concurrency, with Rwmutex performance will be better;

3), the single data read and write protection, you can try to use atomic operation;

4), the implementation of rigorous testing, as far as possible to open the data competition check;

II. testing/monitoring/performance tuning

Go language in addition to high performance, cross-platform, [modular design ideas, boulevard to Jane Design principles], language-level concurrency support and other advantages, there is a more prominent advantage, that is to provide a rich chain of tools for code testing, performance monitoring and tuning provides a good support.

1. Testing

1), testability is also a reflection of the code quality;

2), the unit test can pass the test result to the code review to provide the screening basis, avoids the cumbersome causes the code review to become formalism;

3), code coverage can be achieved by example: Go test-cover-covermode count-coverprofile cover.out command, and can view the results on the browser;

4), by writing the test code, using the command: Go test-bench. The benchmark test can be used to test the performance bottleneck of some parts of the module.

2. Monitoring

1), where performance bottlenecks are often predictable, can be targeted to write some benchmark code, using the appropriate tools for analysis and detection;

2), by writing test code, using commands (see more parameters via Man Go-testflag):

Go Test-run none-bench. -memprofile mem.out-cpuprofile cpu.out-blockprofile block.out net/http

To test and monitor performance indicators (Cpu&mem&block), you can test the performance bottlenecks of certain modules or methods in a targeted manner.

Note that the generated profile performance monitoring results file should use the Go tool pprof file name to open the view;

3), using the go-to-bring performance monitoring Tool –pprof (written in the Perl language):

A, Runtime/pprof: After the introduction of the call in the code to use the relevant runtime command for local code monitoring, and output monitoring results, through the Go Tool pprof command to view;

B, net/http/pprof: HTTP encapsulation of Runtime/pprof, through import _ "Net/http/pprof" can be introduced;

4), GC Monitoring: Runtime plus environment variable Godebug gctrace=1

3. Other tools:

go-torch– features similar to pprof, but produces more intuitive performance flare diagrams

goreporter– Generate Go Code quality assessment Report

dingo-hunter– used to find the Deadlocks Static analyzer in the GO program

flen– Get function length information in Go package

Go/ast–package AST declares the type of the go package used to represent the syntax tree

gocyclo– Estimating the complexity of cyclomatic functions in Go source code

Go Meta linter– simultaneous go lint tool and output standardization of tools

Go vet– detects go source code and reports suspicious constructs

ineffassign– detection of invalid assignments in Go code

Safesql–golang static analysis tools to prevent SQL injection

4, some other design/coding optimization:

http://johng.cn/go-optimize-brief/

1), Memory optimization:

A), small objects merged into the structure of a single allocation, reduce memory allocation times;

b), buffer content to allocate a sufficient size space, and appropriate reuse;

c), slice and map mining make when created, the estimated size of the specified capacity;

D), long call stack to avoid the application of more temporary objects;

e), avoid frequent creation of temporary objects;

2), concurrency optimization:

A), high concurrency of task processing using Goroutine pool;

b), avoid high concurrent call synchronization system interface;

c) Avoid mutual exclusion of shared objects when high concurrency;

3), Other optimizations:

A), avoid using CGO or reduce the number of CGO calls;

b), reduce []byte and string conversion, try to use []byte to string processing;

c), the concatenation of the string priority to consider bytes. Buffer;

Three, memory allocation

1, the built-in runtime programming language will often abandon the traditional memory allocation, change from self-management. This enables operations such as pre-allocation, memory pooling, and so on, to avoid performance problems with system calls. Of course, one of the important reasons is to better cooperate with garbage collection;

2, the basic strategy of Go memory allocation:

1), each time from the operating system to request a large chunk of memory (such as 1MB) to reduce system calls;

2), the application to the large chunks of memory in accordance with the specific size of pre-cut into small pieces, forming a linked list;

3), to allocate memory for the object, only need to extract a small chunk from the appropriate size list;

4), when the object memory is reclaimed, the small block memory is returned to the original linked list for reuse;

5), such as excessive idle memory, try to return some of the memory to the operating system, reduce the overall cost;

3. The memory allocator only manages memory blocks and does not care about object state. And it does not actively reclaim memory, the garbage collector after the completion of the cleanup operation, triggering the memory allocator recovery operations;

4. The memory allocator divides the memory blocks it manages into two types:

1), span: A large chunk of memory consisting of multiple addresses contiguous pages (page);

2), object: Divides the span into smaller chunks according to a specific size, each small block can store an object;

5, the memory allocator according to the number of pages to distinguish between different sizes of span. For example, in the number of pages to store a span in the management array (array length fixed 60), when necessary, the number of pages as an index to find;

6, the memory allocator will attempt to combine several small objects into an object block to save memory;

7. The memory allocator consists of three components:

1), cache: Each run-time worker thread will bind a cache for the lock-free object allocation (Request memory block specification retrieval is done here);

2), Central: for all caches to provide a well-segmented backup span resources;

3), Heap: Manage idle span, request new memory to the operating system when needed;

8. Distribution Process:

1), calculate the corresponding specification (size class) of the object to be allocated;

2), from the Cache.alloc array to find the same size span;

3), extract the available object from the Span.freelist list;

4), such as Span.freelist is empty, from central to obtain a new span;

5), such as Central.nonempty is empty, from the Heap.free/freelarge (32k as the boundary) obtained, and cut into the object linked list;

6), such as the heap does not have a suitable size of idle span, to the operating system to apply for new memory block;

9. Release process:

1), return the object marked as recyclable to the owning span.freelist;

2), the span is put back to central, can be used by any cache to regain access;

3), if span has recovered all object, it is returned to the heap for re-slicing reuse;

4), periodically scan the heap for a long period of idle span, releasing its occupied memory;

5), the above does not include large objects, it is directly from the heap allocation and recovery;

10, normally, the compiler has the responsibility to use registers and stacks to store objects whenever possible, which helps to improve performance and reduce the pressure of the garbage collector;

11. The go compiler supports escape analysis, which analyzes whether a local variable will be externally referenced at compile time by building a call graph to determine whether it can be allocated directly on the stack/heap;

12, Memory Recovery:

1), the reason that "recycling" rather than "release" is because the core idea of the entire memory allocator is memory reuse;

2), based on efficiency considerations, recovery operations naturally do not stare directly at individual objects, but instead use span as the basic unit. By the scanning mark in the bitmap, the object is gradually nationalised to the original span, and the central or heap reuse is finally handed in.

3), whether it is to apply for memory to the operating system, or to clean up the recovery of memory, as long as a span in the heap, will try to merge left and right adjacent idle span to form a larger free block;

13. Memory Release:

1), in the runtime entrance function Main.main, will be specifically launched a monitoring task Sysmon, it each time (about 5 minutes) will check the heap of idle memory block;

2), traverse the free, freelarge all span, if idle time exceeds the threshold value, then releases its associated physical memory;

3), the so-called physical memory release, is actually called madvise inform the operating system (*nix), a section of memory is not used, it is recommended that the kernel recover the corresponding physical memory;

Iv. Garbage Collection

1. The Go language garbage collection strategy is Mark-erase (Mark and sweep);

2, garbage collector is go has been improving the most efforts of the part, all the changes are to shorten the STW (Stop-the-world) time, improve the real-time program;

3, starting from Go 1.5 to increase the three-color concurrency tag detection, the concurrency here refers to garbage collection and user logic concurrent execution;

4. Three-color marking principle:

White: objects to be recycled

Gray: Objects in process

Black: Active Object

1), at first all objects are white (although white, but not marked, can not be directly recycled);

2), scan to find all the objects, marked as gray, into the queue to be processed (gcwork high-performance cache queue);

3), lift the gray object from the queue, mark its reference object as gray into the queue, itself marked black;

4), write barrier Insight Object Memory modification, re-color or put back to the queue;

5), when the completion of all the scanning and marking work, the remaining is not white is black, representing to be recycled and active objects, clean-up operation only need to retrieve the white object memory;

5, although the token is executed concurrently, when STW performs garbage collection, all user logical threads are still paused ( Although the 1.7 version has greatly optimized GC performance, 1.8 or even in bad cases, the GC is 100US, but the pause time depends on the number of temporary objects, the more temporary objects, the longer the pause time, and CPU consumption;

V. Concurrency scheduling

1, built-in runtime, on the basis of processes and threads to do a higher level of abstraction is the most popular practice of modern language;

2, concurrent scheduling model related components:

1), Processor (p), which acts like a CPU core, to control the number of concurrent tasks can be performed concurrently;

2), Goroutine (g), everything in the process is run in G mode, including runtime related services, and Main.main entry function;

3), System thread (abbreviated as M), which is bound to the p to execute the G concurrent task in a cyclic manner. m by modifying the register (CPU instruction store), the execution stack points to the stack memory of G, and allocates the stack frame in this space, executes the task function;

3, although the p/m constitute the implementation of the composition, but the number of the two is not one by one correspondence. Typically, the number of P is relatively constant, the default is the same as the number of CPU cores, but it may be more or less (runtime. Gomaxprocs), and M is created on demand by the scheduler.

4, although the runtime can be used during operation. The Gomaxprocs function modifies the number of p, but at great cost: Stoptheworld && Starttheworld;

5, System Monitoring thread (Sysmon) for memory allocation, garbage collection, concurrency scheduling is very important, the main role is as follows:

1), releasing the span physical memory block that is idle for more than 5 minutes;

2), if there is no garbage collection for more than 2 minutes, enforcement;

3), add the netpoll result of long time not processed to the task queue;

4), to the long-running G-task issued preemption scheduling;

5), recover the p which is blocked for a long time because of syscall;

6, preemption scheduling only set a preemption flag on the target G, when the task calls a function, the compiler will check the instructions placed on the flag, so as to determine whether the current task will be suspended;

7, Stoptheworld: The user logic must be paused on a security point, or it will cause many unexpected problems. Therefore, Stoptheworld also through the "notification" mechanism, to all the running G-task to issue a preemption schedule, so that it paused;

8. Defer deferred call: Deferred invocation is far from being a call command, and involves a lot of content. such as object allocation, caching, and multiple function calls. The use of defer should be avoided in some cases where performance requirements are relatively high;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.