Tianchi Middleware Competition Golang version service mesh ideas sharing

Source: Internet
Author: User
Tags epoll evio etcd k8s


21-day Boutique blockchain free learning, in-depth expert led the way, to help developers easily play the blockchain! >>>


Istio architecture

This time Tianchi middleware performance Competition and the results of the semi-finals are exactly第五名, unexpectedly, as Golang is the "rare species", this time in the top 10 I was lucky to live in the middle of the C-Man and Java big guy.


Istio service mesh architecture

About this preliminary round of "Service Mesh for Dubbo" difficulty relative to the "stand-alone million message queue storage Design" simple, the final result is6983分, because some Golang's small partners in the official race 512 concurrent pressure test when most of the card in the 6000 points mark, Here, I would like to share with you some of my experience in this Golang version and the pits I've stepped on.


Istio architecture diagram

Because the work is too busy, the game only weekend time can be raided, the next I will be able to tidy up the semi-finals, "stand-alone million message queue storage Design" ideas to share to everyone, the personal feeling to achieve the program is also the final team more special.



What ' s Service Mesh?



Service mesh is a different way to implement the services governance process without changing the service itself. By proxy or Sidecar deployment agent, all incoming and outgoing traffic will be intercepted and processed by the agent, so the service governance capability under the micro-service scenario can be done through the agent, which greatly reduces the difficulty and cost of service transformation. And the agent as a medium between two services, can also play the role of protocol conversion, which can be based on different technical framework and communication protocol construction services can also be interconnected, which is difficult to achieve under the traditional micro-service framework.



is an officially provided evaluation framework that consists of 5 docker instances (the blue box) that runs ETCD, Consumer, provider services, and agent agents respectively. Provider is a service provider, consumer is a service consumer, consumer consumer provider provides services. Agents are proxies for consumer and provider services, and each consumer or provider is accompanied by an agent. ETCD is the registry service used to record service registration information. It can be seen that the communication between Consumer and provider is not directly carried out, but through agent agents. This seemingly superfluous link has brought important changes in the evolution of the architecture of MicroServices.






For more information about service mesh, refer to the following articles:


    • What is a Service Mesh?
    • What ' s a service mesh? And why do I need one? (Chinese translation)
    • Talk about a new generation of microservices technology Service Mesh


Requirements for the game title


    • Service Registration and Discovery
    • Protocol conversions (This is also the key to achieving interoperability between different languages and frameworks)
    • Load Balancing
    • Current limit, downgrade, fuse, safety certification (not required)


Of course, Agent Proxy is the most important is the universality, scalability, through the addition of different protocol conversion can support more application services.最后Agent Proxy的资源占用率一定要小,因为Agent与服务是共生的,服务一旦失去响应,Agent即使拥有再好的性能也是没有意义的。



Why Golang?



Personally think about the service mesh selection will be between the CPP and Golang, this should refer to the company's technology stack. If the pursuit of extreme performance or the preferred CPP, this can avoid GC problems. Because service mesh links are longer than traditional RPC, Agent proxies need to be lightweight, stable, and performance-good.



About technology selection why is Golang? It's not just a chance to exercise your golang, of course, for some of the following reasons:


    • Some of the experience of the factory precipitation, such as ant sofa mesh, Sina Motan mesh and so on.
    • K8s, Docker in the field of micro-service is very hot, and after the deployment of agent must rely on k8s, so go is a good choice, high affinity.
    • Go has a co-process, high-quality network library, high-performance aspects should prevail.


Optimization Point Anatomy



The official provides a Java Demo based on the Netty implementation, because it is blocked version, so the performance is not high, of course, this is a boon for Java players, can quickly get started. Other languages are relatively slow to start, and all have to be re-implemented.



No matter what language, everyone's optimization ideas are the same. Here to share the Kirito Xu Jingfeng very detailed thinking summary (Java version): Tianchi Middleware Contest Dubbomesh Optimization Summary (QPS from 1000 to 6850), you can as a reference.



The following diagram basically covers all the optimization work of the entire agent, the green arrows in the diagram are the user can implement their own.





    • All processes become异步非阻塞、无锁, and all requests take the form of an asynchronous callback. This is also the biggest point of ascension.

    • Implement HTTP service parsing yourself.

    • The simplest custom protocol is used for communication between agents.

    • In the network transmissionByteBuffer复用.

    • Communication between the agents is批量打包sent.

       


forblock:for {Httpreqlist[reqcount] = Reqagentreqlist[reqcount] = &agentrequest{interf:req.interf,method: Req.callmethod,paramtype:paramtype_string,param: []byte (Req.parameter),}reqcount++if reqCount = = *config. Httpmergecountmax {break}select {case req = <-workerqueue:default:break Forblock}} '


    • Provider load balancing: Weighted polling,最小响应时间(effect not very obvious)

    • TCP Connection Load balancing: Supports selecting TCP connections by minimum number of requests.

    • Dubbo request批量encode.

    • Optimization of TCP parameters: Turn on Tcp_nodelay (disable Nagle algorithm) and adjust the buffer size for TCP send and read/write.

      if err = syscall.SetsockoptInt(fd, syscall.IPPROTO_TCP, syscall.TCP_NODELAY, *config.Nodelay); err != nil {logger.Error("cannot disable Nagle's algorithm", err)}if err := syscall.SetsockoptInt(fd, syscall.SOL_SOCKET, syscall.SO_SNDBUF, *config.TCPSendBuffer); err != nil {logger.Error("set sendbuf fail", err)}if err := syscall.SetsockoptInt(fd, syscall.SOL_SOCKET, syscall.SO_RCVBUF, *config.TCPRecvBuffer); err != nil {logger.Error("set recvbuf fail", err)}


Network Bitterness History-(pre-heating race 256 concurrent pressure test 4400~4500)



Go because of the association and High-quality network library, the cost of the co-process switching is small, so most of the scenario go recommended network play is that each connection uses the corresponding thread to read and write.






This version of the network model has also achieved relatively objective results, the highest level of QPS in the 4400~4500. A simple summary of this network selection:


    • Go because there is goroutine, can use multi-process to solve concurrency problem.
    • On Linux, the Go Network library is also used Epoll as the lowest data transmission and transmission driver.
    • There is also a "context switch" in the Go network underlying implementation, except that switching is done by the runtime scheduler.


Network History of bitterness-(official race 512 concurrent pressure test)



However, in the official race 512 concurrent pressure measurement when our program did not achieve a steady improvement of the results, about 5500 ~ 5600 or socpu的资源占用率也是比较高的,高达约100%.



Tips for getting a high score analysis:


    • Consumer agent pressure Heavy, to Consumer agent decompression.
    • Due to the poor performance of the consumer, consumer and consumer agents coexist in a Docker instance (4C 8G), only to avoid resource scramble, to achieve extreme performance.
    • The consumer CPU occupies up to about 350% during the pressure measurement process.
    • In order to avoid competing for resources with consumer, the resource utilization of consumer agent needs to be reduced to the extreme.


Through the above analysis, we have identified the core objectives of optimization:尽可能降低Consumer Agent的资源开销.



A. Optimization Scenario 1: Pool + task Queue (OBSOLETE)



This is a relatively simple, common optimization idea, similar to the thread pool. Although the breakthrough, but did not achieve the desired effect, the CPU is still up to about 70~80%. Goroutine Although the cost is very small, after all, high concurrency is still a certain context of the cost of switching, only to find some performance breakthroughs.



经过慎重思考,我最终还是决定尝试采用类似netty的reactor网络模型。 About the architecture of Netty learning in this will not repeat, recommend some colleagues to share the summary of the Flash blog.



B. Optimization scheme 2:reactor network model



Selection before consulting a few good friends, are met with a spit groove. Of course they can't understand that I have less than 50% of the CPU resources can be exploited by the plight, and ultimately resolutely to the alternative road.



After a simple survey, I found an open-source third-party library Evio that looked pretty plausible (Github Star2000, without a PR), but it was a real practice to encounter too many pits, and the functionality was very simple. Can't help feeling that Java has netty really is too happy! The reason for Java's success is that its ecology is so mature that it takes time to hone in on the go language, and there are too few high-quality resources.



Of course not to negate the Evio, it can be a learning network as a good resource. Let's look at a simple feature on GitHub:


evio is an event loop networking framework that is fast and small. It makes direct epoll and kqueue syscalls rather than using the standard Go net package, and works in a similar manner as libuv and libevent.


说明:关于kqueue是FreeBSD上的一种的多路复用机制,推荐学习。



In order to achieve the ultimate performance, I have made a lot of modifications to the Evio:


    • Supports active connections (passive connections are supported by default)
    • Supports multiple protocols
    • Reduced number of invalid wakes
    • Support asynchronous write, increase throughput rate
    • Fix the performance problems caused by many bugs under Linux


After the transformation of the network model is also achieved a good effect, can reach6700+the score, but this is far from enough, but also need to find some breakthroughs.



C. Multiplexing EventLoop



Once again, the optimized network mode is combed (see):






EventLoop can be understood as an IO thread, before which a eventloop is used separately for each network communication c->ca,ca->pa,pa->p.如果入站的io协程和出站的io协程使用相同的协程,可以进一步降低Cpu切换的开销. So I made the last optimization of the network model:复用EventLoopto handle different logic requests by judging the connection types.


Func createagentevent (loops int, workerqueues []chan *agentrequest, Processorsnum UInt64) *events {Events: = &events{ }events. Numloops = loopsevents. Serving = func (SRV Server) (action action) {logger. Info ("Agent server started (loops:%d)", srv. Numloops) return}events. Opened = func (c Conn) (out []byte, opts Options, action action) {if C.getconntype ()! = config. conntypeagent {return GlobalLocalDubboAgent.events.Opened (c)}lastctx: = C.context () If Lastctx = nil {C.setcontext ( &agentcontext{})}opts. Reuseinputbuffer = Truelogger. Info ("Agent opened:laddr:%v:raddr:%v", c.localaddr (), c.remoteaddr ()) return}events. Closed = func (c Conn, err Error) (Action action) {if C.getconntype ()! = config. conntypeagent {return GlobalLocalDubboAgent.events.Closed (c, err)}logger. Info ("Agent closed:%s:%s", C.localaddr (), c.remoteaddr ()) return}events. Data = func (c Conn, in []byte] (out []byte, action action) {if C.getconntype ()! = config. conntypeagent {return GlobalLocalDubboAgent.events.Data (c, in)}If in = = Nil {return}agentcontext: = C.context (). (*agentcontext) Data: = AgentContext.is.Begin (in) for {if Len (data) > 0 {if agentcontext.req = = Nil {agentcontext.req = & Amp Agentrequest{}agentcontext.req.conn = c}} else {break}leftover, err, ready: = Parseagentreq (data, agentcontext.req) if Err! = Nil {action = Closebreak} else if!ready {data = leftoverbreak}index: = agentContext.req.RequestID% PROCESSORSNUMW Orkerqueues[index] <-agentContext.reqagentContext.req = Nildata = leftover}agentcontext.is.end (data) return} Return events}


Multiplexing EventLoop obtained a relatively robust performance improvement, each stage of the EventLoop resources are set to 1, the final 512 concurrent pressure measurement of CPU resource occupancy rate of about 50%.



Some optimization attempts at go language level



The final stage can only be a bit of a lunatic looking for some details, so there are some attempts at the language level:


    • Ringbuffer to replace go channel for task distribution


Ringbuffer has a small improvement over channel performance in scenarios where high concurrency tasks are distributed, but it is a more elegant way for individuals to recommend go channel in terms of engineering.


    • Go comes with the Encoding/json package is based on reflection implementation, performance is a criticism


Use strings to assemble JSON data yourself, so that the more data you test, the more time you save.


    • Goroutine Thread Binding

      runtime.LockOSThread()defer runtime.UnlockOSThread()
    • Modify the Scheduler default time slice size and compile the go language yourself (no effect)


Summarize


    • The sword walks Pifo, spends a lot of time to transform the network, the Kung Fu pays no effort, the result is gratifying.
    • Golang is good enough for high performance and deserves to be studied in depth.
    • 性能优化离不开的一些套路:异步、去锁、复用、零拷贝、批量等。


Finally throw a few want to continue to explore the go network problems, and discuss with you, experienced friends also want to be able to point twos:


    1. In the case of scarce resources, how do you choose a network model that handles high concurrent requests? (assuming concurrent 1w long connections or short connections)
    2. How to choose the size of millions of connections?


Reprint please indicate the source, welcome to follow my public number: Yapp Technology wheels


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.