Reproduced The beauty of Go language concurrency

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Original: http://qing.blog.sina.com.cn/2294942122/88ca09aa33002ele.html

IntroductionMulticore processors are gaining popularity, is there an easy way to get the software we write to unleash the power of multicore? The answer is: yes. With the rise of Golang, Erlang, scale, and other programming languages, the new concurrency pattern is becoming clearer. Just as procedural programming and object-oriented, a good programming pattern requires an extremely concise kernel, as well as a rich extension on top of it, to solve a variety of problems in the real world. This paper takes go language as an example to explain the kernel and denotation. kernel of concurrency modeThis concurrency mode kernel requires only the process and channel to be sufficient.        Where the association is responsible for executing the code, the channel is responsible for passing events between the threads. Concurrent programming has always been a very difficult task. To write a good concurrency program, we have to understand the way threads, locks, semaphore,barrier and even CPU updates cache, and they all have a strange temper, everywhere is a trap. I will never manipulate these underlying concurrency elements myself unless I can.         A concise concurrency pattern does not require these complex underlying elements, only the process and channel are sufficient. The process is a lightweight thread. In procedural programming, when a procedure is called, it needs to wait for its execution to return. When calling a process, you do not have to wait for it to finish and will return immediately. The process is very lightweight, and the go language can perform a number of 100,000 of processes in a single session, still maintaining high performance. For a normal platform, a process has thousands of threads, its CPU is busy with context switching, and performance drops sharply. It's not a good idea to create threads at will, but we can use a lot of them.

The channel is the data transfer channel between the processes. A channel can pass data between numerous threads, and can be a reference, or a specific value. There are two ways to use a channel.

· The coprocessor can attempt to put data into the channel and, if the channel is full, suspends the co-process until the channel can put the data into it.

· The coprocessor can attempt to request data to the channel, and if the channel has no data, it suspends the co-process until the channel returns data.

Thus, the channel can control the operation of the process while transmitting the data. It's kind of like an event driver, and a bit like a blocking queue. These two concepts are very simple, each language platform will have the corresponding implementation. There are also libraries on both Java and C that can implement both.

As long as there are co-processes and channels, you can gracefully solve the problem of concurrency. You do not have to use other and concurrency-related concepts. So how do you use these two sharp knives to solve all kinds of practical problems? extension of concurrency modeCo-threads can be created in large numbers than threads. Open this door, we expand the new usage, can make the generator, can let the function return "the service", can let the loop execute concurrently, also can share the variable. However, new usages, along with new thorny problems, can also leak and inappropriate use can affect performance. Various usages and questions are described below. The demo code is written in the go language because it is simple and straightforward, and supports all functions. GeneratorSometimes we need to have a function that can generate data continuously. For example, this function can read a file, read a network, generate a self-growing sequence, generate random numbers. These behaviors are characterized by a number of variables known as functions, such as file paths. It then keeps calling, returning the new data.

The following generates a random number as an example to allow us to do a random number generator that will execute concurrently.

The non-concurrency approach is this:

function rand_generator_1, return int

funcrand_generator_1 () int {

return Rand. Int ()

Above is a function that returns an int. If Rand. Int () This function call takes a long time to wait, and the caller of the function hangs accordingly. So we can create a co-process that specifically executes Rand. Int ().

function rand_generator_2, return channels (channel)

Funcrand_generator_2 () Chan int {

Create a Channel

Out: = Make (chan int)

Create a co-process

Go func () {

for {

Writes data to the channel and waits if no one reads

Out <-Rand. Int ()

}

}()

Return out

}

Funcmain () {

Generate random numbers as a service

Rand_service_handler: =rand_generator_2 ()

Read random numbers from the service and print

Fmt. Printf ("%d\n", <-rand_service_handler)

}

This section of the above function can execute rand concurrently. Int (). It is worth noting that the return of a function can be understood as a "service". But when we need to get random data, we can always access this service, he has prepared the corresponding data for us, no need to wait, follow. If we call this service is not very frequent, a process sufficient to meet our needs. But what if we need a lot of access? We can use the multiplexing technology described below to start a number of generators and then integrate them into a large service.

Call the builder and you can return a service. can be used for continuous access to data. It is widely used to read data, generate IDs, and even timers. This is a very concise way of thinking, the program concurrency.

Multiplexing

Multiplexing is the technology that allows multiple queues to be processed at once. Apache uses a process to process each connection, so its concurrency performance is not very good. Nginx uses multiplexing techniques to allow a process to handle multiple connections, so concurrency is better. Similarly, multiplexing is required, but different, in the context of a co-process. Multiplexing can combine several similar small services into one large service.


So let's use multiplexing to make a higher concurrency random number generator.

function rand_generator_3, return channels (channel)

Funcrand_generator_3 () Chan int {

Create a two random number generator service

rand_generator_1: = rand_generator_2 ()

Rand_generator_2: = rand_generator_2 ()

Create a Channel

Out: = Make (chan int)

Create a co-process

Go func () {

for {

reading data from generator 1, consolidating

Out <-<-rand_generator_1

}

}()

Go func () {

for {

reading data from Generator 2, consolidating

Out <-<-rand_generator_2

}

}()

Return out

}

Above is a high-concurrency version of the random number generator using multiplexing technology. By consolidating two random number generators, this version is twice times more capable than the previous one. Although the process can be created in large numbers, many of the processes still compete for the output channel. The Go language provides the SELECT keyword to solve, and each home has its own tips. Increasing the buffer size of the output channel is a common solution.

Multiplexing technology can be used to consolidate multiple channels. Improve performance and ease of operation. The use of other modes is of great power.

Future technology

The future is a very useful technique, and we often use the future to manipulate threads.  When we use threads, we can create a thread, return to the future, and then wait for the result. But in the context of the future can be more thorough, input parameters can also be the future.


When invoking a function, it is often the argument that is ready. The same is true when calling the co-process. But if we set the passed parameter to a channel, then we can call the function without having the parameters ready. Such a design can provide a great degree of freedom and concurrency. The two processes of function invocation and function parameter preparation can be fully decoupled. Here is an example of accessing a database using this technique.

A query structure body

typequery struct {

Parameter Channel

SQL Chan string

Results Channel

Result Chan string

}

Execute Query

Funcexecquery (q query) {

Start the process

Go func () {

Get input

sql: = <-q.sql

Accessing the database, outputting the result channel

Q.result <-"get" + SQL

}()

}

Funcmain () {

Initialize Query

Q: =

Query{make (Chan string, 1), make (Chan string, 1)}

Execute query, note that there is no need to prepare parameters when executing

ExecQuery (q)

Prepare parameters

Q.sql <-"SELECT * fromtable"

Get results

Fmt. Println (<-q.result)

}

The above uses the future technology, not only to get the results in the future, parameters are also acquired in the future. When the parameters are ready, they are executed automatically. The difference between the future and the generator is that the future returns a result, and the generator can be called repeatedly. Another notable point is that the parameter channel and the result channel are defined as parameters in a struct, rather than returning the result channel. This can increase the degree of aggregation, and the benefit is that it can be combined with multiplexing technology.

The future technology can be combined with various other technologies. Multiple results can be monitored via multiplexing techniques, and automatically returned when there is a result. It can also be used in combination with generators, which constantly produce data, and future technologies process data one by one. The future technology itself can also be connected to the end, forming a parallel pipe filter. This pipe filter can be used to read and write data streams and manipulate data streams.

The future is a very powerful technical tool. You can not care about whether the data is ready at the time of the call, or whether the return value is a good problem. Let the components in the program run automatically when the data is ready.

Concurrent Loops

Loops are often a hot spot in performance. If the performance bottleneck is on the CPU, then the 90% possibility hotspot is inside a loop body. So if the loop body can be executed concurrently, then the performance will be much higher.

It is easy to have a concurrent loop, only to start the process within each loop body. The process can be executed concurrently as a loop body. A counter is set before the call starts, and each loop body is executed with an element on the counter, and the call is completed by listening to the counter and waiting for the loop to complete.

Set up counters

SEM: =make (chan int, N);

For loop body

For i,xi:= range Data {

Establish a co-process

Go func (i int, Xi float) {

DoSomething (I,XI);

Count

SEM <-0;

} (I, xi);

}

Wait for the loop to end

For I: = 0; i < N; ++i {<-sem}

Above is an example of a concurrent loop. Use the counter to wait for the loop to complete. If you combine the future technology mentioned above, you don't have to wait. You can wait until the results are really needed, and then check to see if the data is complete.

With concurrent loops, you can provide performance and leverage multicore to address CPU hotspots. It is because the process can be created in a large number of loops in order to be used in the loop body, if the use of threads, you need to introduce the thread pool and other things, to prevent the creation of too many threads, and the process is much simpler.

Chainfilter Technology

As mentioned earlier, the future technology is connected to the end, and can form a concurrent pipe filter. This way you can do a lot of things, if each filter is made up of the same function, there is an easy way to connect them together.

Since each filter coprocessor can run concurrently, this structure is very advantageous for multicore environments. Here is an example of how to generate prime numbers with this pattern.

Aconcurrent Prime Sieve

Packagemain

Sendthe sequence 2, 3, 4, ... to channel ' ch '.

Funcgenerate (ch chan<-int) {

For I: = 2;; i++ {

ch<-I//Send ' I ' to Channel ' ch '.

}

}

Copythe values from the channel ' in ' to the channel ' out ',

Removing those divisible by ' prime '.

Funcfilter (in <-chan int, out chan<-int, prime int) {

for {

I: = <-in//Receive valuefrom ' in '.

If I%prime! = 0 {

Out <-I//Send ' I ' to ' out '.

}

}

}

Theprime Sieve:daisy-chain Filter processes.

Funcmain () {

CH: = make (chan int)//Create a newchannel.

Go Generate (CH)//Launch Generate goroutine.

For I: = 0; I < 10; i++ {

Prime: = <-ch

Print (prime, "\ n")

CH1: = make (chan int)

Go Filter (CH, ch1, Prime)

CH = ch1

}

}

The above program creates 10 filters, each filtering a prime number, so you can output the first 10 primes.

Chain-filter creates a concurrent filter chain with simple code. This approach also has a benefit, that is, each channel only two routines access, there will be no fierce competition, performance will be better.

Shared variables

communication between the processes can only pass through the channel. But we are used to sharing variables, and many times using shared variables can make the code more concise. For example, a server has two states on and off. Others just want to get or change their state, and how to do it. This variable can be used for 0 channels and is maintained using a co-process.


The following example describes how to implement a shared variable in this way.

Shared variables consist of a read channel and a write channel

Typesharded_var struct {

Reader Chan int

Writer Chan int

}

Shared variable Maintenance co-process

Funcsharded_var_whachdog (v Sharded_var) {

Go func () {

Initial value

var value int = 0

for {

Listen to read and write channels, complete service

Select {

Case Value =<-v.writer:

Case V.reader <-value:

}

}

}()

}

Funcmain () {

Initialize and start the maintenance process

V: = Sharded_var{make (chan int), make (chan int)}

Sharded_var_whachdog (v)

Read initial value

Fmt. Println (<-v.reader)

Write a value

V.writer <-1

Read the newly written value

Fmt. Println (<-v.reader)

}

In this way, a co-security shared variable can be implemented on the basis of the co-process and the channel. Define a write channel and write a new value when you need to update the variable. Then define a read channel, which needs to be read and read from the inside. These two channels are maintained through a single process. Ensure consistency of data.

In general, it is not recommended to use shared variables to interact with each other, but in this way it is advisable to use shared variables in some situations. Many platforms have relatively native shared variable support, in the end with that kind of implementation is better, a matter of opinion. In addition, the use of the process and channel, you can also achieve a variety of common concurrency data structures, such as locks and so on, do not repeat. co-process leaks

The process is the same as memory, and is the resource of the system. For memory, there is automatic garbage collection. However, for the process, there is no corresponding recovery mechanism. Will not be a few years later, the process of popularization, the co-process leaks and memory leaks as a programmer forever pain? In general, the completion of the execution of the process will be destroyed. The co-process also consumes memory and, if a process leak occurs, the impact is as serious as a memory leak. Slow down the program and weigh the machine down.

C and C + + are not automatic memory recycling programming language, but as long as there is good programming habits, can solve the problem of avoidance. For the association process is the same, as long as there are good habits can be.

There are only two situations that cause the association to fail to end. One scenario is that the association wants to read data from a channel, but no one is writing data to the channel, and perhaps the channel has been forgotten. Another scenario is that the process wants to write data to a channel, but because no one listens to the channel, the association will never be able to execute down. The following sections discuss how to avoid both of these situations.

This is the case where the coprocessor wants to read data from a channel, but no one writes data to the channel. The solution is simple, adding a timeout mechanism. For situations where there is uncertainty about whether to return, a timeout must be added to avoid a permanent wait. It is not necessary to use a timer to terminate the process. An Exit alert channel can also be exposed externally. Any other association can use this channel to remind the thread to terminate.

For the coprocessor to write data to a channel, but the channel blocking cannot write this case. The solution is also very simple, is to buffer the channel. But only if the channel receives a fixed number of writes. For example, if a channel is known to receive at most only n data, then the buffer for this channel is set to N. Then the channel will never clog, and the co-process naturally will not leak. You can also set its buffering to infinity, but this will take the risk of a memory leak. After the completion of the process, this part of the channel memory will be lost the reference, will be automatically garbage collection off.

Funcnever_leak (ch Chan int) {

Initialize timeout, buffered to 1

Timeout: = Make (chan bool, 1)

Start Timeout Association, because the cache is 1, it is impossible to leak

Go func () {

Time. Sleep (1 * time. Second)

Timeout <-True

}()

Monitoring channel, due to a timeout, it is impossible to leak

Select {

Case <-CH:

A read from CH hasoccurred

Case <-timeout:

The read from CH have timedout

}

}

Above is an example of avoiding leaks. Use time-outs to avoid read jams and use buffers to avoid write jams.

As with objects in memory, we don't have to worry about leaks for long-term co-existing processes. One is the long-term existence, the second is the quantity is small. The only ones to be vigilant are those that are created temporarily, and the number of these processes is large and the life cycle is short, often created in a loop, to apply the methods mentioned earlier to avoid leaks. The co-process is also a double-edged sword, if the problem, not only failed to improve the program performance, but will let the program crash. But like memory, there is a risk of leakage, but the more you use it, the more it slips away. implementation of concurrency mode

In the big line of concurrent programming today, the support of the process and channel becomes a part of each platform. Although each family has its own names, it can meet the basic requirements of the process-concurrent execution and large-volume creation. The author summarizes the way they are realized.

Here are some common languages and platforms that already support the process.

Golang and Scala, as the newest language, are born with a complete range of concurrent concurrency-based capabilities. Erlang's most veteran concurrent programming language, rejuvenation. Other second-tier languages are almost entirely joined by the new version.

Surprisingly, the three most mainstream platforms in the world, C + + and Java, do not provide language-level native support for the process. They are burdened with a thick history that cannot be changed or changed. But there are other ways they can use the co-process.

The Java platform has a number of ways to implement the co-process:

· Modifying a virtual machine: patching the JVM to implement the process, which works well, but loses the benefits of cross-platform

· Modify bytecode: Enhance bytecode after compilation, or use the new JVM language. Slightly increases the difficulty of compiling.

· Use JNI: Use JNI in a jar package, which is easy to use, but not cross-platform.

· Use the threading simulation process: To make the process heavyweight, fully dependent on the JVM's thread implementation.

The way in which bytecode is modified is more common. Because of this approach, performance and portability can be balanced. The most representative JVM language scale supports co-concurrency in a very good way. The popular Java Actor Model class library, Akka, is also a process that is implemented in a way that modifies bytecode.

For the C language, it is the same as the thread. Can be implemented using a variety of system calls. As a more advanced concept, the implementation of the process is too much, it is not discussed. The mainstream realization has LIBPCL, coro,lthread and so on.

For C + +, there are boost implementations, and there are other open source libraries. There is also a μc++ language, which provides concurrency extensions on a C + + basis.

It can be seen that this programming model has been widely supported in many language platforms, no longer a small audience. If you want to use it, you can add it to your toolbox at any time. ConclusionIn this paper, an extremely concise concurrency model is discussed. In the case of only the two basic components of the co-process and the channel. can provide rich functions to solve a variety of practical problems. And this model has been widely implemented as a trend. This concurrency model is believed to be far less powerful, and there will be more and more concise usage. Perhaps the number of CPU cores in the future will be as much as the number of neurons in the brain, and by that time we have to rethink the concurrency model again.

About the author

View larger image

Outskirts, researcher at EMC China Research Institute, focusing on big data, cloud computing and other fields

Weibo: http://weibo.com/yankaycom

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.