This is a creation in Article, where the information may have evolved or changed.
Here's the original. Unfortunately, the article only raises questions and does not explicitly provide solutions to these problems. But in any case, for this can cause reflection of the article, is not let go. In addition, I have to admit that it seems that the abstraction of a high-level distributed system is easier to describe in a functional language paradigm (not necessarily in reality).
———— Translation Divider Line ————
Channel underpowered
Or why the assembly line is not that easy.
Brave and clever golang concurrency model.
@kachayev Writing
Overview
Go is designed to make it easier to build concurrent systems, so it has goroutine that run independent computations and channel for communication between them. We've all heard this story before. All the examples and guides look good: we can create a new channel that can send data to the channel, read from the channel, and even have beautiful and elegant SELECT statements (Incidentally, why are we still using statements in 21st century?). ), block read and cache ...
Subject:99%, I don't really care if the response was passed by the channel, or if a magical unicorn brought it from its horns.
It's really cool when you're writing a guide for beginners! But it's painful when you're trying to implement a large, complex system. The channel is too primitive. They are low-level components, and I quite doubt that you are willing to deal with them every day in your daily work.
See "Advanced Mode" and "Flow-shop". It's not that simple, is it? There are so many things to think about and always remember: when, how to close the channel, how to pass the error, and how to free up resources. I complained about it because I had tried to achieve something and then failed. And I've been dealing with these things every day.
You may say that there is no need for beginners to understand all the details. But...... Describe a pattern that is really "advanced"? Unfortunately, the answer is no. They are basic and common sense.
Get a closer look at the flow-shop problem. Is this really an assembly-flow? No, "... The checksum is computed for each path from the directory MD5
, and the result is stored in a map[string][string]
... ". This is just one pmap
(parallel map). Or have 池化执行器
limited parallelization pmap
. Instead of pmap
needing me to enter so many lines of code. Want to know the real flow--I'll introduce you at the end of the article (see the paragraph "Building a Twitter Analyzer").
So what about the pattern?
In order to develop real-world applications quickly, we should be able to extract a higher abstraction than the original channel level. They are just the transport layer. We need the abstraction of the application layer to write the program (compared to the OSI), otherwise you will find that you are always tangled in the details of the low-level channel network, trying to reproduce in a production environment, sporadically, without any effective means to find out why it does not work. See how Erlang OTP solves a similar problem in a targeted way: protect you from the low-level message delivery code.
What's wrong with the low-level code? Here's an awesome article, "Edward C + + hand:" Edward Scissors Hand ":
It's not always so bad to have scissors in your hand. Edward has many talents: for example, he can create the hair of the hottest dog. Don't get me wrong-it shows a lot of hot dog hairstyles (I mean elegant and simple C + + code), but the main content is about how to avoid cutting, and in the event of a cut to do first aid.
During the Kyiv go party, I went through the same situation: 20 lines of clean, readable code on a single page of the slide. A non-general race condition and a possible run-time error. Is this obvious to all listeners? No. At least half of them don't understand.
The cause of the pain?
Well, let's try to collect some similar patterns. From experience in work, from books, from other languages (yes, guys, I know it's a little hard to believe, but there are many other languages that have concurrent designs too).
Rob Pike discusses fan-in, fan-out. In many cases, this is useful, but it is still about the channel of the network. Instead of your app. In any case, look (shamelessly stolen from here).
Rob Pike talks about Fan-in, fan-out. It's useful in many ways and still about the network of channels. Not about your application. In any case, let's check (shamelessly stolen from here).
Func Merge (cs ... <-chan int) <-chan int { var wg sync. Waitgroup out: = Make (chan int) //goroutine for each input channel in CS to start an output. //Copy the value from C until C is closed, and then call the WG. Done. Output: = func (c <-chan int) {for N: = range C {out <-n } WG. Done () } WG. ADD (Len (CS)) for _, c: = Range cs { go output (c) } //Once all outputs are goroutine completed, a goroutine is started to close out. //This must be in WG. The ADD call starts after. go func () { WG. Wait () Close (out) } () return out}
Uh ..... <-chan int
Reuse in my app is not that abstract (for example, migrating to a library) ... And it's not so clear that every time I need it, it's all over again. So how do you make it reusable? <-chan interface{}
? Welcome to the realm of type conversions and run-time errors. If you want to implement a high-level fan-in (merge), you must sacrifice type safety. The same (unfortunately) is the same with other models.
What I really want is:
Func merge[t] (cs ... <-chan t) <-chan t
Yes, I know that Go doesn't have generics, because who needs them?
What's the weather like now?
Back to mode. Let's analyze a hypothetical project where server-side development is very close to real-world experience. We need a server to receive the request, enter a U.S. state, and return the information collected from Openweathermap. For example, this:
$ http localhost:4912/weather?q=cahttp/1.1 okaccess-control-allow-credentials:trueaccess-control-allow-methods : GET, Postaccess-control-allow-origin: *connection:keep-alivecontent-type:application/json; Charset=utf-8
[{" clouds": { "all": + }, "id": 5391959, "main": { "temp": 288.89, "Temp_max": 291.48, "Temp_min": 286.15 }, "name": "San Francisco", "weather": [ { "description": "Mist", "icon": "50d", " ID": 701, "main": "Mist" } ]}, {" clouds": { "all": "id": 5368361, "main": { "temp": 292.83, "Temp_max": 296.15, "temp_min": 289.15 }, "name": "Los Angeles", "weather": [ { "description": "Mist", "icon": "50d", "id": 701 , "main": "Mist" }] }]
Pmap
Let's start with something we already know. So, we received the request ?q=CA
. I don't want to explain where to get a list of cities. We can use this database, cache in memory, and anything else reasonable. Let's say we have a magical findCities(state)
function that returns chan City
(as go
in the case of a delay sequence that is usually performed by a program). And then what? Every city we have to call the Openweathermap API and parse the results into one map[City]Weather
. We've talked about this pattern. This is a pmap
. I want my code to look like this:
Chancities: = findcities (state) Resolver: = func (name City) Weather {return openweathermap.askfor (name)}weather: = Chanci Ties. Par.map (Resolver)
or limit the number of concurrency:
Chancities: = findcities (state) Pool: = Newworkers (a) Resolver: = Func (w Worker, name City) Weather {return w.askfor (name) }weather: = ChanCities.Par.BoundedMap (pool, resolver)
I hope all these <-done
synchronizations and sacred are select
completely hidden.
Futures & Promises
Getting the current weather can take a long time, for example, you have a long list of cities. Of course, you don't want duplicate API calls, so you should be able to manage parallel requests in some way:
Func Collect (state string) Weather { Calc, OK: = Calculations.get (state)//check if it's in progress if!ok {
calc.calculations.run (state)//Run otherwise } return Calc. Wait ()//wait Until done}
This is also called future/promise. Explanation of Wiki:
They describe the role of an object acting as a proxy for the result, which is unpredictable at first, usually due to its value not being completed.
I've heard too many people say go is future
simple:
F: = make (chan int, 1)
This is wrong, because all the waiting people should get the result (the broadcast that implements the channel's subscription and the change value is really another big head). And this version is also wrong:
C
F: = make (chan int, 1)
V <-FF <-v//used here v[/c] because it is not possible to manage resources in this way. So, when a guy loses that part of his code f <- v
, I hope you're lucky to find it.
It's not so complicated to send data directly to all the promise (I'm not sure if this code is a bug):
Type Promisedelivery chan interface{}type Promise struct { sync. Rwmutex Value interface{} waiters []promisedelivery}func (P *promise) deliver (value interface{}) { P.lock () defer p.unlock () p.value = value for _, W: = Range p.waiters { LOCW: = w go func () { LOCW <- Value } () }}func (P *promise) value () interface{} { if p.value! = nil { return p.value } Delivery: = Make (promisedelivery) p.waiters = append (p.waiters, delivery) return <-delivery}func Newpromise () *promise { return &promise{ value:nil, waiters: []promisedelivery{}, }}
How to use him?
P: = Newpromise () go func () {
However, there are interface{} and type conversions. What do I actually want?
In those well-tested libraries, even stdlib type promisedelivery[t] chan ttype promise[t] struct { sync. Rwmutex value T waiters []promisedelivery[t]}func (P *promise[t]) deliver (Value T) func (P *promise[t]) value () Tfunc Newpromise[t] () *promise[t]//My Code: V: = Newpromise[int] () go func () { v.deliver ("woooow!")//Error V.deliver (42)} () V.value ()//block and return 42 instead of interface{}
Yes, of course, no one needs generics. What the hell am I talking about?
You can also avoid listening by using select p.Lock()
deliver
, and operate in one goroutine
wait
. You can also introduce methods that are extremely useful to end users .ValueWithTimeout
. There are many many other "you can ...". Although we are actually talking about a 20-line code (its length may start to grow every time you discover more details about future/promise interactions). Do I really need to know (or think) that the channel passes the value for me? No!
Pub/sub
Suppose we want to build a real-time service. Then our client can now open a websocket connection, pass the Q=CA request, and instantly get weather changes in California. It should look like this:
Deliverercalculation. Whendone (func (State string, W Weather) { broker. Publish ("CA", W)})//Clientch: = Broker. Subscribe ("CA") for UPDATE: = range ch { w.write (update. Serialize ())}
This is a typical pub/sub (announcement/subscription). You can learn it from Advanced Go mode presentations, and even find an instant-to-use implementation. The problem is that they are all based on interfaces.
Is there any possibility of achieving:
Broker: = newbroker[string, Weather] ()//So thatbroker. Subs//Compilation failure//Andbroker. Subs ("CA")//Returns (Chan Weather) Not (Chan interface{})
Of course! If you are brave enough to copy and paste code between projects and make changes everywhere.
Map/filter
Suppose you want to give more flexibility to our users and introduce new query parameters: The show
value can be all|temp|wind|icon
.
Maybe you can start with the basics:
CH: = Broker. Subscribe ("CA") for UPDATE: = Range ch { temps: = []temp for _, T: = update. Temp { temps = append (temps, t) } w.write (Temps)}
However, after writing 10 of these methods, you will realize that it is less modular and boring. You may need to:
CH: = Broker. Subscribe ("CA"). Map (func (w Weather) Temp {return w.temp}) for UPDATE: = range ch { w.write (update)}
Wait, did I mention that the channel is a functor (a functor)? Just like Future/promise.
P: = Newpromise (). Map (func (w Weather) Temp {return w.temp}) go func () { p.deliver (weather{temp{42}})} () P.value (). ( TEMP)//temp, not Weather
This means that I reuse the same code for the future channel. You can also use something like transducers to finish it. I often use the techniques in Clojurescript code:
(->> (send URL); Returns Chan, put single value to it {: status 200:result (async/filter< # (= (: status%)); check That:status is (async/map<: Result)) ;; Expose to end user;; Note, that it'll close all channels (including implicit intermediate one) properly
When I can simply do x.Map(transformation)
and get the same type of value, do I really need to care about a x
channel or a future? In this case, why should I create make(chan int)
and not create make(Future int)
?
Request/reply
Suppose our users like this service and use it frequently. Then you need to introduce some simple API restrictions: The number of daily, per IP requests. It's easy to collect this amount in one map[string]int
. The Go document says "Do not communicate through shared memory, use communication to share memory." Well, that sounds a good idea.
Req: = Make (Chan string) go func () {//Wow, look here-it's an actor! M: = map[string]int{} for r: = Range Req { if V, OK: = M[r];!ok { M[r] = 1 } else { M[r] = v + 1 }} } () go func () { req <-"127.0.0.2"} () go func () { req <-"127.0.0.1"} ()
It's easy. You can now calculate the number of each IP request. Not only that ... You can also require permissions to execute requests.
Type Req struct { IP string resp chan int}func newrequest (IP string) *req { return &req{ip, make (Chan I NT)}}requests: = Make (chan *req) go func () { m: = map[string]int{} for r: = Range Requests { if V, OK: = M[r . IP];!ok { M[r.ip] = 1 } else { M[r.ip] = v + 1 } r.resp <-M[r.ip] }} () go func () {
r: = Newrequest ("127.0.0.2") requests <-R FMT. Println (<-R.resp)} () go func () { r: = Newrequest ("127.0.0.1") requests <-R FMT. Println (<-R.resp)} ()
I'm not going to ask you again. Solutions for generics (string and int with no write-dead). In other words, I would like you to check that the code is correct? Is it really that simple?
Are you sure r.resp <- m[r.ip]
it's a good idea? No, definitely not! I want to have anyone waiting for those very slow clients. Is it? And what happens if I have a lot of slow clients? Maybe I need to do some work on this.
And is requests <- r
this part simple? What if my actor (server) is overloaded and can't respond? Maybe I need to handle timeouts here ...
From time to time I need a specific initialization and cleanup process ... All require a timeout mechanism. And you need to be able to keep the request until the initialization is complete.
So what is the priority of the call? For example, when I need to implement the Dump method in order to analyze the system, but do not want all users to pause to collect the required data.
And also...... Look at the gen_server in Erlang. To be safe, I want it to be implemented as a library with good documentation, highly-covered tests. 98% of the cases, I don't want to see this introduction: Make (chan int,?) and I don't want to think about what I should be? replaced by how much.
99%, I don't really care if the response was passed by the channel, or if a magical unicorn brought it from its horns.
Numerous
There are many other common concurrency scenarios. I think you've got it.
Suffering
You can say that these patterns are not common. But...... I had to implement most of them in my project. Every! One! Times! Maybe I'm not lucky, and your project will be as simple as a guide for beginners.
I know that most of you will say, "The world is hard, and programming is misery." I'll keep hitting you: at least some of the languages show some examples of how to solve these problems. At least, try to fix it. Haskell and Scala's type systems provide the ability to build powerful, high-level abstractions, and even customize the control flow to handle concurrency. The Clojure of the other camp uses dynamic types to encourage and share advanced abstractions. Rust has channel and generic types.
Let it work--let it be elegant--so it can be reused.
Now, the first step is done. What's next? Don't get me wrong, go is a visionary language: Channel and Goroutine are better than for example Pthread, but are they really stuck here?
Add: Build a Twitter analyzer
About the real flow of the assembly.
You've probably seen Twitter's analysis, it's really great. Let's say it hasn't come up yet, and we need our own analysis tool: Provide a username to count how many users have seen (or at least theoretically) his tweets. What should we do? It's not hard: Read the user's timeline, filter out all the retweet and replies, and then request the retweeter of the other tweets, request a list of follower for each retweeter, merge all Retweeter follower together, and then add This user's follower. The result I want for this step is:
map[TweetId][]Username
(Retweeter) and map[Username][]Username
. These are enough to construct a magical table that is presented to the requestor.
There are some technical details you should be aware of:
The
Twitter API requires OAuth for every call and has a strong limit (450 calls per user every 15 minutes). To deal with this limitation, we will use a predefined list of OAuth tokens (for example, 50) to be used by the worker in a pool, and each worker can take a break before reaching the limit.
Most Twitter API calls use results paging through since_id or max_id. So you can't rely on a request to get the full result.
An example of a rough implementation. Note that you don't need to understand everything in this file. On the contrary, if you can't understand it, it just means we're doing the right thing.
So what do we have now?
- Calculation of some steps: Finalreducer, Followersreader, Retweetersreader, Timelinereader.
- Confession message. Because all stages of paging are recursive. This means that each step will send a message to the next phase and itself. In this case, it is difficult to deal with the cancellation situation. You cannot even discover that a step is complete.
- Spread as early as possible. There are at least two things: first, to collect []username] through Tweetid, we need to send the collected information directly from Retweetersreader to Finalreducer. Then, at first we knew that we needed to get the initial user's follower, so his username should be passed from Timelinereader to
Retweetersreader step.
- Intermediate contraction. Followersreader is not just a conduit. It filters out the names of the usernames that we've seen (because we don't always want to do it again).
- Worker who is working continuously. In many cases, you cannot wait for the worker to exit. For example, when you implement a service, it responds to many clients at the same time.