This is a creation in Article, where the information may have evolved or changed.
Original record: Http://www.codedata.cn/hackne ...
I have worked for more than 15 years in different software companies in anti-advertising, virus-killing, Trojan-horse and other industries, and understand the complexities of such system software as it processes massive amounts of data on a daily basis.
I am currently working as Smsjunk.com's CEO and KnowBe4 's master architect in both cyber security companies.
Interestingly, over the past 10 years, as a software engineer, the Web backend code I've been exposed to has mostly been developed using Ruby on Rails. Please don't get me wrong, I like Ruby on Railds framework, and I think it's a nice set of frames, but for a long time you'll be used to thinking and designing systems in the Ruby language, forgetting to take advantage of multi-threading, parallelization, fast execution, and small memory consumption, The software architecture could have been so efficient and simple. I've been a c/c++,delphi and C # User for many years, and I've come to realize that using the right tools makes things easier.
My debate about the endless language framework on the internet is not a cold. Because I believe that the effectiveness of the solution and code maintainability is largely dependent on how simple your architecture can be.
Practical issues
When implementing a telemetry analysis system, we encountered a real problem in processing a POST request from a millions of terminal. The Web request processing process receives a JSON document that contains a collection of loads of load data that we want to write to the Amazon S3 store, after which our map-reduce system can process the data.
In general, we will use the following components to create a background working layer of the schema, such as:
- Sidekiq
- Resque
- Delayedjob
- Elasticbeanstalk Worker Tier
- RabbitMQ
- Wait a minute
and set up two different service clusters, one for the Web front-end to receive data, the other to perform specific work, so that we can dynamically adjust the ability of the background processing work.
But from the start of the project, our team thought it would be possible to use the Go language to do the work, because in the course of the discussion we found that this could be a system with huge traffic. I've been using the go language for almost two years, and we've developed some systems with it in our work, but we haven't yet encountered a system with such a large load.
We start by defining some of the web's POST request payload data structures and a method for uploading to S3 storage.
type PayloadCollection struct { WindowsVersion string `json:"version"` Token string `json:"token"` Payloads []Payload `json:"data"`}type Payload struct { // [redacted]}func (p *Payload) UploadToS3() error { // the storageFolder method ensures that there are no name collision in // case we get same timestamp in the key name storage_path := fmt.Sprintf("%v/%v", p.storageFolder, time.Now().UnixNano()) bucket := S3Bucket b := new(bytes.Buffer) encodeErr := json.NewEncoder(b).Encode(payload) if encodeErr != nil { return encodeErr } // Everything we post to the S3 bucket should be marked 'private' var acl = s3.Private var contentType = "application/octet-stream" return bucket.PutReader(storage_path, b, int64(b.Len()), contentType, acl, s3.Options{})}
Go Routines's fool usage
At first we implemented a very simple POST processing interface, trying to work in parallel with a simple goroutine process:
func payloadHandler(w http.ResponseWriter, r *http.Request) { if r.Method != "POST" { w.WriteHeader(http.StatusMethodNotAllowed) return } // Read the body into a string for json decoding var content = &PayloadCollection{} err := json.NewDecoder(io.LimitReader(r.Body, MaxLength)).Decode(&content) if err != nil { w.Header().Set("Content-Type", "application/json; charset=UTF-8") w.WriteHeader(http.StatusBadRequest) return } // Go through each payload and queue items individually to be posted to S3 for _, payload := range content.Payloads { go payload.UploadToS3() // <----- DON'T DO THIS } w.WriteHeader(http.StatusOK)}
In the case of a normal load, this code is sufficient for most people, but is quickly proven to be unsuitable for large traffic conditions. When we deployed the first version of the code to the production environment, we found that the actual situation was far beyond our expectations and that the system traffic was much larger than previously anticipated, and we underestimated the amount of data load.
The above approach is problematic in several ways. There is no way to control the number of go routines created. And we receive 1 million POST requests per minute, and the code must quickly crash.
Try again
We need to find another way out. From the very beginning, we were discussing how to ensure that request processing time is short, and then work in the background processing. Of course, this must be done in Ruby on Rails, or you'll block out all the Web processes, whether you're using Puma,unicorn,passenger (we don't discuss JRuby here). Then we may use common solutions, such as RESQUE,SIDKIQ,SQS, and so on. There are many ways to accomplish this task.
So the second iteration uses buffer channels (buffered channel), we can put some work into the queue first, and then upload them to S3, because we can control the size of the queue, and there is enough memory available, so we think the task is buffered into the channel queue.
var Queue chan Payloadfunc init() { Queue = make(chan Payload, MAX_QUEUE)}func payloadHandler(w http.ResponseWriter, r *http.Request) { ... // Go through each payload and queue items individually to be posted to S3 for _, payload := range content.Payloads { Queue <- payload } ...}
Then we took the task out of the queue and processed it, and we used a code similar to the following:
func StartProcessor() { for { select { case job := <-Queue: job.payload.UploadToS3() // <-- 仍然不好使! } }}
Frankly speaking, I don't know what we were thinking. This must have been the result of drinking Red Bull late at night. The solution didn't do us any good, we just replaced a problematic concurrency process with a buffer queue, and it just pushed the problem backwards. Our synchronization process only uploads one load data to S3 at a time, because the rate at which requests are received is much higher than the ability of a single routine to upload to S3, our buffer queue is quickly full, causing the request processing process to block and more data from being sent to the queue.
We foolishly ignored the problem and finally started the system death countdown. In a few minutes after the issue version was deployed, the system's latency increased at a constant rate.
A better solution
We decided to use a common mode of the Go channel to build a two-tier channel system, one for the task queue and the other to control the amount of concurrency when the task was processed.
The idea is to upload the data to S3 storage at a sustainable rate, which will not cause the machine to run off or create a S3 connection error. So we chose to use a job/worker mode. If you are familiar with languages such as java,c#, you can think of this as a way to implement a worker pool using the channel in the Go language.
var (maxworker = os. Getenv ("max_workers") maxqueue = OS. Getenv ("Max_queue")//job represents the job to being runtype job struct {Payload payload}//A buffered channel that we Can send work requests On.var Jobqueue Chan job//worker represents the worker that executes the JobType worker struct { Workerpool Chan Chan job Jobchannel Chan Job quit Chan Bool}func newworker (Workerpool Chan Chan job) Worker {R Eturn worker{Workerpool:workerpool, Jobchannel:make (Chan Job), Quit:make (chan bool)}}//Start met Hod starts the run loop for the worker, listening for a quit channel in//case we need to stop Itfunc (W worker) Start () { Go func () {for {//register ' the current worker ' into the ' worker ' queue. W.workerpool <-W.jobchannel Select {Case job: = <-w.jobchannel://We have re ceived a work request. If err: = job. PAYLOAD.UPLOADTOS3 (); Err! = Nil { Log. Errorf ("Error uploading to S3:%s", err.) Error ())} case <-w.quit://We had received a signal to stop Return}}} ()}//stop signals the worker to stop listening for work requests.func (W worker) Stop () {go func () {W.quit <-true} ()}
We modified the Web request processing process, created an instance with the data payload Job
, and then fed it into the JobQueue
channel for use by the work routine.
func payloadHandler(w http.ResponseWriter, r *http.Request) { if r.Method != "POST" { w.WriteHeader(http.StatusMethodNotAllowed) return } // Read the body into a string for json decoding var content = &PayloadCollection{} err := json.NewDecoder(io.LimitReader(r.Body, MaxLength)).Decode(&content) if err != nil { w.Header().Set("Content-Type", "application/json; charset=UTF-8") w.WriteHeader(http.StatusBadRequest) return } // Go through each payload and queue items individually to be posted to S3 for _, payload := range content.Payloads { // let's create a job with the payload work := Job{Payload: payload} // Push the work onto the queue. JobQueue <- work } w.WriteHeader(http.StatusOK)}
During the initialization of the Web service, we created an Dispatcher
instance, called the Run()
method to create the worker pool, and listened to the JobQueue
work task.
dispatcher := NewDispatcher(MaxWorker) dispatcher.Run()
The following code is a specific implementation of the task dispatcher:
type Dispatcher struct {//A pool of workers channels that is registered with the Dispatcher Workerpool Chan Chan job}func newdispatcher (maxworkers int) *dispatcher {pool: = Make (Chan Chan Job, Maxworkers) Return &dispatcher{workerpool:pool}}func (d *dispatcher) Run () {//starting n number of workers for I: = 0; i < d.maxworkers; i++ {worker: = Newworker (D.pool) worker. Start ()} go D.dispatch ()}func (d *dispatcher) dispatch () {for {select {case job: = <-jobqueu E://A job request has been received go func (Job Job) {//try to obtain a worker Jo b Channel is available. This would block until a worker is idle jobchannel: = <-d.workerpool//Dispatch the JO B to the worker job channel Jobchannel <-Job}}}
Note We provide a maximum number of parameters to control the initial number of routines in the working pool. Because this project uses the GO environment in Amazon Elasticbeanstalk and Docker, we strive to follow the 12-factor approach by reading the configuration values from the environment variables to facilitate system configuration in a production environment. In this way, we can control the number of work routines and the length of the work queue, and we can quickly adjust the parameter values without having to redeploy the cluster.
var ( MaxWorker = os.Getenv("MAX_WORKERS") MaxQueue = os.Getenv("MAX_QUEUE") )
After deploying this code, we found that the system's latency dropped sharply and our ability to handle requests was greatly improved.
A few minutes after our Elastic Load balancers is fully warmed up, we can see that our Elasticbeanstalk applications can handle nearly 1 million requests per minute, often at 1 million per minute when traffic peaks.
We've just deployed the new code, and the number of servers has dropped from 100 servers to about 20 servers.
After we adjusted the cluster configuration and Autoscale configuration, we were able to reduce the number of servers used to four EC2 C4. Large instance, and then set the Elastic auto-scaling to a CPU usage that lasts five minutes over 90%, add an instance.
Conclusion
In my cognition, "simplification" is the secret of constant victory. We could have designed a more complex system, with many queues and background work routines, and more complex deployments. But we finally took advantage of Elasticbeanstalk's auto-scaling capabilities and the fast and easy concurrency solutions that the Go language brings to us.
It's not something that happens every day: a four-machine cluster deals with 1 million POST requests per minute, writes data to Amazon S3 storage, and these machines may be worse than my current MacBook Pro performance.
There will always be more appropriate tools for each job. When your Ruby on Rails system requires powerful request processing power, try out the simpler and more effective solutions outside the Ruby ecosystem.
Translated from: Handling 1 Million Requests per Minute with Golang