This is a creation in Article, where the information may have evolved or changed.
have been using Qiniu storage services, generate pictures of thumbnails, blur, video WEBP, now need to move storage to S3, then these pictures, video processing will have to write their own, this article comb the general idea.
Analyze requirements
First look at the Qiniu interface is how to handle the picture, such as first capture the first second of the picture, then the picture thumbnail, finally stored to a new key, the command can be written so vframe/jpg/offset/1|imageMogr2/thumbnail/400x|saveas/xxx
that you can see three operations with |
symbolic segmentation, similar to UNIX pipe operation.
The above actions count as one cmd
, the API request can be processed at the same time multiple cmd
, separated cmd
by semicolons, after processing, in the callback to return the processing results, such as
{"id": "xxxxx", "Pipeline": "XXX", "code": 0, "desc": "The FOP was completed successfully", "Reqid": " xtsaafnxubr5j10u "," Inputbucket ":" xxx "," Inputkey ":" xxxxx "," items ": [{" cmd ":" Vframe/jp g/offset/1|imagemogr2/thumbnail/400x|saveas/ Zmftzs1wcml2yxrlom1vbwvudc9jb3zlci9zbmfwl3zpzgvvl2m5yzdjzjq5ltu3ngqtngzjms1izdfkltrkyjzkmzlkzwy1ni8wlza= "," Co De ": 0," desc ":" The FOP was completed successfully "," hash ":" Fhdn6v8ei4vw4xjgalsfxutvmeiv ", "Key": "XX", "Returnold": 0}, {"cmd": "Vframe/jpg/offset/1|imagemogr2/thum bnail/400x|imagemogr2/blur/45x8|saveas/ zmftzs1wcml2yxrlom1vbwvudc9jb3zlci9zbmfwl3zpzgvvl2m5yzdjzjq5ltu3ngqtngzjms1izdfkltrkyjzkmzlkzwy1ni8wlzbfymx1cg= = "," code ": 0," desc ":" The FOP was completed successfully "," hash ":" Fgnirzrcsa7tzx1x Vsb_4d5tiak3 "," key ":" XXX "," Returnold ": 0 } ]}
Decomposition requirements
This program requires a few parts:
An HTTP interface that accepts tasks, throws the task into the queue, and returns a job ID. Worker asynchronous processing tasks, the number of workers and the number of concurrent processing per worker can be configured, the worker has a retry mechanism.
From the job payload to resolve the required tasks, parse out each cmd, it is best to execute each cmd in parallel, record the results of each cmd
There are multiple in each cmd operation
, and with a pipe connection, the output of the previous operaion is the input of the latter operation
can be 1 and 2,3 separate, 1 more independent, previously written a worker model, referring to this article handling 1 Million requests per Minute with go, in more detail, is using the Go channel as a queue , I added a beanstalk as a queue of Providor. One further improvement is that only the number of worker settings is provided in the article, and I add a parameter that sets the number of threads that each worker can execute in parallel. So the following is the main talk about 3, 2 of the solution
Pipe
Refer to this library pipe for the following usage:
p := pipe.Line( pipe.ReadFile("test.png"), resize(300, 300), blur(0.5),)output, err := pipe.CombinedOutput(p)if err != nil { fmt.Printf("%v\n", err)}buf := bytes.NewBuffer(output)img, _ := imaging.Decode(buf)imaging.Save(img, "test_a.png")
is still more convenient, build a Cmd
struct, using the regular match each Operation
of the parameters, put into a []Op
slice, the last execution, the struct and method are as follows:
type Cmd struct { cmd string saveas string ops []Op err error}type Op interface { getPipe() pipe.Pipe}type ResizeOp struct { width, height int}func (c ResizeOp) getPipe() pipe.Pipe { return resize(c.width, c.height)}//使用方法cmdStr := `file/test.png|thumbnail/x300|blur/20x8`cmd := Cmd{cmdStr, "test_b.png", nil, nil}cmd.parse()cmd.doOps()
Sync. Waitgroup
Single cmd processing, is the parallel problem of multiple cmd, there is no good thinking, the direct use sync.WaitGroup
can be a perfect solution. Step by step, let's take a look at how this struct is used:
func Main () {cmds: = []string{} for I: = 0; i < 10000; i++ {cmds = append (cmds Fmt. Sprintf ("cmd-%d", I)} results: = Handlecmds (CMDS) fmt. Println (len (results))//10000}func doCmd (cmd String) string {return FMT. Sprintf ("cmd=%s", cmd)}func handlecmds (Cmds []string) (Results []string) {fmt. Println (Len (cmds))//10000 var count uint64 Group: = Sync. waitgroup{} Lock: = Sync. mutex{} for _, Item: = Range Cmds {//Count plus one group. ADD (1) Go func (cmd string) {result: = DOCMD (cmd) atomic. AddUint64 (&count, 1) lock. Lock () results = append (results, result) lock. Unlock ()//Count minus one group. Done ()} (item)}//block group. Wait () fmt. Printf ("count=%d \ n", count)//10000 return}
The group nature is probably a counter, the Count > 0 o'clock, group.Wait()
will block until the Count = = 0. Here is also a point to note that results = append(results, result)
the operation is thread insecure, clear here results is shared, need to lock to ensure synchronization, or not at last len(results)
10000.
We built one BenchCmd
to store the Cmds as follows:
type BenchCmd struct { cmds []Cmd waitGroup sync.WaitGroup errs []error lock sync.Mutex}func (b *BenchCmd) doCmds() { for _, item := range b.cmds { b.waitGroup.Add(1) go func(cmd Cmd) { cmd.parse() err := cmd.doOps() b.lock.Lock() b.errs = append(b.errs, err) b.lock.Unlock() b.waitGroup.Done() }(item) } b.waitGroup.Wait()}
The final call is like this:
var cmds []Cmdcmd_a := Cmd{`file/test.png|thumbnail/x300|blur/20x8`, "test_a.png", nil, nil}cmd_b := Cmd{`file/test.png|thumbnail/500x1000|blur/20x108`, "test_b.png", nil, nil}cmd_c := Cmd{`file/test.png|thumbnail/300x300`, "test_c.png", nil, nil}cmds = append(cmds, cmd_a)cmds = append(cmds, cmd_b)cmds = append(cmds, cmd_c)bench := BenchCmd{ cmds: cmds, waitGroup: sync.WaitGroup{}, lock: sync.Mutex{},}bench.doCmds()fmt.Println(bench.errs)
This is only a preliminary experiment, thinking is not comprehensive, and just imitate api,qiniu should not do so, coupling is lower, may each cmd have their own processing cluster, the pipe
library is temporarily unable to solve, the current limitation is that each cmd must be in a process.