million requests a minute, Golang easy to fix
I have worked for more than 15 years in different software companies in anti-advertising, virus-killing, Trojan-horse and other industries, and understand the complexities of such system software as it processes massive amounts of data on a daily basis.
I am currently working as the CEO of Smsjunk.com and KnowBe4 (http://knowbe4.com/) as the main architect of the two network security companies.
Interestingly, over the past 10 years, as a software engineer, the Web backend code I've been exposed to has mostly been developed using Ruby on Rails. Please don't get me wrong, I like Ruby on Railds framework, and I think it's a nice set of frames, but for a long time you'll be used to thinking and designing systems in the Ruby language, forgetting to take advantage of multi-threading, parallelization, fast execution, and small memory consumption, The software architecture could have been so efficient and simple. I've been a c/c++,delphi and C # User for many years, and I've come to realize that using the right tools makes things easier.
My debate about the endless language framework on the internet is not a cold. Because I believe that the effectiveness of the solution and code maintainability is largely dependent on how simple your architecture can be.
Practical Issues
When implementing a telemetry analysis system, we encountered a real problem in processing a POST request from a millions of terminal. The Web request processing process receives a JSON document that contains a collection of loads of load data that we want to write to the Amazon S3 store, after which our map-reduce system can process the data.
In general, we will use the following components to create a background working layer of the schema, such as:
Sidekiq
Resque
Delayedjob
Elasticbeanstalk Worker Tier
RabbitMQ
Wait a minute
and set up two different service clusters, one for the Web front-end to receive data, the other to perform specific work, so that we can dynamically adjust the ability of the background processing work.
But from the start of the project, our team thought it would be possible to use the Go language to do the work, because in the course of the discussion we found that this could be a system with huge traffic. I've been using the go language for almost two years, and we've developed some systems with it in our work, but we haven't yet encountered a system with such a large load.
We start by defining some of the web's POST request payload data structures and a method for uploading to S3 storage.
Type payloadcollection struct {
WindowsVersion string ' JSON: ' Version '
Token string ' JSON: ' token '
Payloads []payload ' JSON: ' Data '
}
Type Payload struct {
[REDACTED]
}
Func (P *payload) UploadToS3 () error {
The Storagefolder method ensures that there is no name collision in
Case we get same timestamp in the key name
Storage_path: = Fmt. Sprintf ("%v/%v", P.storagefolder, time. Now (). Unixnano ())
Bucket: = S3bucket
B: = new (bytes. Buffer)
Encodeerr: = json. Newencoder (b). Encode (payload)
If Encodeerr! = Nil {
Return Encodeerr
}
Everything we post to the S3 buckets should be marked ' private '
var ACL = S3. Private
var contentType = "Application/octet-stream"
Return bucket. Putreader (Storage_path, B, Int64 (B.len ()), ContentType, ACL, S3. options{})
}
Go Routines's fool usage
At first we implemented a very simple POST processing interface, trying to work in parallel with a simple goroutine process:
Func Payloadhandler (w http. Responsewriter, R *http. Request) {
If R.method! = "POST" {
W.writeheader (http. statusmethodnotallowed)
Return
}
Read the body into a string for JSON decoding
var content = &payloadcollection{}
ERR: = json. Newdecoder (IO. Limitreader (R.body, MaxLength)). Decode (&content)
If err! = Nil {
W.header (). Set ("Content-type", "Application/json; Charset=utf-8 ")
W.writeheader (http. Statusbadrequest)
Return
}
Go through each payload and queues items individually to is posted to S3
For _, Payload: = range content. Payloads {
Go payload. UploadToS3 ()//<-----DON ' T do this
}
W.writeheader (http. Statusok)
}
In the case of a normal load, this code is sufficient for most people, but is quickly proven to be unsuitable for large traffic conditions. When we deployed the first version of the code to the production environment, we found that the actual situation was far beyond our expectations and that the system traffic was much larger than previously anticipated, and we underestimated the amount of data load.
The above approach is problematic in several ways. There is no way to control the number of go routines created. And we receive 1 million POST requests per minute, and the code must quickly crash.
Try again
We need to find another way out. From the very beginning, we were discussing how to ensure that request processing time is short, and then work in the background processing. Of course, this must be done in Ruby on Rails, or you'll block out all the Web processes, whether you're using Puma,unicorn,passenger (we don't discuss JRuby here). Then we may use common solutions, such as RESQUE,SIDKIQ,SQS, and so on. There are many ways to accomplish this task.
So the second iteration uses buffer channels (buffered channel), we can put some work into the queue first, and then upload them to S3, because we can control the size of the queue, and there is enough memory available, so we think the task is buffered into the channel queue.
var Queue Chan Payload
Func init () {
Queue = Make (chan Payload, Max_queue)
}
Func Payloadhandler (w http. Responsewriter, R *http. Request) {
...
Go through each payload and queues items individually to is posted to S3
For _, Payload: = range content. Payloads {
Queue <-Payload
}
...
}
Then we took the task out of the queue and processed it, and we used a code similar to the following:
Func Startprocessor () {
for {
Select {
Case Job: = <-queue:
Job.payload.UploadToS3 ()//<--still not works.
}
}
}
Frankly speaking, I don't know what we were thinking. This must have been the result of drinking Red Bull late at night. The solution didn't do us any good, we just replaced a problematic concurrency process with a buffer queue, and it just pushed the problem backwards. Our synchronization process only uploads one load data to S3 at a time, because the rate at which requests are received is much higher than the ability of a single routine to upload to S3, our buffer queue is quickly full, causing the request processing process to block and more data from being sent to the queue.
We foolishly ignored the problem and finally started the system death countdown. In a few minutes after the issue version was deployed, the system's latency increased at a constant rate.
a better solution
We decided to use a common mode of the Go channel to build a two-tier channel system, one for the task queue and the other to control the amount of concurrency when the task was processed.
The idea is to upload the data to S3 storage at a sustainable rate, which will not cause the machine to run off or create a S3 connection error. So we chose to use a job/worker mode. If you are familiar with languages such as java,c#, you can think of this as a way to implement a worker pool using the channel in the Go language.
VAR (
Maxworker = OS. Getenv ("Max_workers")
Maxqueue = OS. Getenv ("Max_queue")
)
Job represents the job to be run
Type Job struct {
Payload Payload
}
A Buffered channel that we can send work requests on.
var Jobqueue Chan Job
Worker represents the worker that executes the job
Type Worker struct {
Workerpool Chan Chan Job
Jobchannel Chan Job
Quit Chan bool
}
Func Newworker (Workerpool Chan Chan Job) Worker {
Return worker{
Workerpool:workerpool,
Jobchannel:make (Chan Job),
Quit:make (chan bool)}
}
Start method starts the run loop for the worker, listening for a quit channel in
Case we need to stop it
Func (w Worker) Start () {
Go func () {
for {
Register the current worker into the worker queue.
W.workerpool <-W.jobchannel
Select {
Case Job: = <-w.jobchannel:
We have received a work request.
If err: = job. PAYLOAD.UPLOADTOS3 (); Err! = Nil {
Log. Errorf ("Error uploading to S3:%s", err.) Error ())
}
Case <-w.quit:
We have received a signal to stop