Go language Construction Tens online high-concurrency message push system practice (from companies)

Source: Internet
Author: User
Tags gopher zookeeper
This is a creation in Article, where the information may have evolved or changed. 1. The penetration rate of go language is more and more high, while everyone's attention to the combat experience of go language is more and more high. The go language is very suitable for distributed systems with high concurrency, complex communication interaction and heavy business logic, and has the advantages of good development experience, stable service at a certain level, and performance satisfying needs. This article is organized from Qihoo  Company's Zhou Yang at Gopher China 2015 Conference (PPT download: "Go language building high concurrent message push system practice PPT (from Qihoo) [ATTACHMENT DOWNLOAD]"), the share of mass online message push system, for example, To explore the problems encountered in using the go language to build high-concurrency message push systems and the various practical techniques that have been summed up. 2, go language in the Basic service development area of the advantage go language in high concurrency, communication interaction complex, heavy business logic Distributed system is very suitable, with a good development experience, a certain amount of service stability, performance to meet the needs and other advantages. Take message push system as an example, at present message pushes the system service in 50+ internal product, the million development platform app, real-time long connection hundreds of millions of magnitude, the day alone billions of magnitude, within 1 minutes can realize million level broadcast, the peak of tens of thousands of magnitude, 400 physical machine, More than 3,000 instances are distributed across 9 separate clusters, each spanning nearly 10 IDC locations at home and abroad. After two years of iteration, the system functions need to do some expansion, support chat scene business, stable support for a variety of chat business app, single channel multi-app reuse, long connection support upstream, support different intensity of callbacks, smart hardware products, to provide customized message push and forwarding services. Machine performance, the system of single-machine in the test environment, if only a long connection (after the system parameter tuning), the data will often depend on the drop rate. In the case of a stable connection, the heartbeat time is not affected, and the internal QPS achieves a 300W long connection in an acceptable state of pressure measurement. The online single machine is actually used up to 160W long connection, divided into two instances. The on-line scene of the QPS is related to egress bandwidth, protocol severity, access network status and business logic, but as long as the factors affecting I/O are turned off and the data is not captured by the encrypted protocol pure performance, the QPS can reach 2~5, but if there is more encryption, the QPS will drop. Go language Construction Tens online high concurrent message push system practice (from companies) _11.png In addition, the message pushes the system to re-logic, the entire system by the picture interaction to complete the push function. The access-side process is mainly provided by the Client SDK access to the dispatcher server, the client in the selection, the dispatcher server will upload some data, according to the relevant state of the incoming services, the IP or domain name of the service will be sent to the relevant client, The client makes policy on IP based on current network conditionsCache and make a long connection to the current service through the cache IP. The housekeeping service has a very heavy logic before the business and architecture disassembly, and it basically has to be progressive with all the services behind it, and it itself carries millions of long connected businesses. The logic of this service mainly focuses on the interaction of internal communication, external communication and external connection. First, the user access long connection, the long-connected service needs to authenticate the identity of the user, but also to support the company's various products, security-related, callback-related business authentication access; second, after the identity to accept authentication, to do the back end of the connector service, memory storage to do a communication, Binds the user to his or her home and bedroom (registration operation), and the single connection involves the interaction logic of unbinding, multiple binding, binding multiple users, and so on. User access may be flash-off, to the Flash (due to network switching caused by the service side did not promptly detect the disconnection situation) before the migration of messages, all kinds of operations in this logic. As a housekeeping service, to interact with the backend coordinator services, because the data is up and down, users may upload various data (such as audio or a simple data stream) when they contact, and return through the coordinator server callback. The relevant access party to the client's upstream data, the service will be a security policy, whitelist, IP throttling policy, and then write to their own zookeeper/keeper to communicate. Another logic on the backend, such as loading some messages when the user enters, may be using the storage access Layer (Saver service), so the Saver service also loads, stores messages, and so on. The message system itself also has business logic, such as the policy of loading messages by product, protocol, including the temporary caching of broadcast information during global broadcasts. All in all, the service is the most important point of the whole, if you refactor in C, though the schema is well-disassembled, it will increase the overhead of some communication because the logic always exists. Go in developing this kind of logic, all the logic is focused on the most front-end, but also in the most interactive communication places, so the go language is very suitable for this kind of heavy logic. The API access layer will have a center service responsible for all of the app's providers, who will do some simple authentication through the center service and then send the message to the inside of the cluster. For example, send a unicast to a user, first request register to obtain this user, center obtains to this user and then communicates with the router service, obtains the registered connection channel identity, the server, then communicates with it and then sends the long connection. Center service compared heavy work such as national broadcast that need to put all theThe task breaks down into a series of subtasks, then calls the connected service, Saver service in all subtasks to get online and offline related users, and then collectively pushes to the room Service, so the whole cluster is under a lot of pressure at that moment. It can be seen that the whole system communication is more complex, and the structure after disassembly also has very heavy logic. Go language Construction Tens online high concurrent message push system practice (from companies) _22.png Although it is logically heavy, the program is basically linear. As you can see, the basic task is equivalent to opening a process for each user. All logic is done within two loops (such as registering an operation). The blocking is blocked by the client to display. Usually, the heartbeat response to be timely, the heartbeat of the main loop to save the heartbeat, then to use non-blocking I/O, through the way of centralized control, management, operation, and then through the asynchronous way back, the key to the whole cycle is to respond to the ping package of services in a timely manner. Therefore, the logic is better, basically concentrated in two of the process, and no matter what time reading code, it is linear. 3, go and C development experience of the comparison go language construction Tens online high concurrent message push system practice (from companies) _33.png when they encounter bottlenecks and do not know how much efficiency they can improve with go, they write the development of C language. Using the C language to use the Oneloop per thread principle, according to the business data processing needs to open a quantitative thread, because each thread's I/O can not block, so to adopt the way of asynchronous I/O, each thread has a eventloop. A thread for tens of thousands of user Service will create a problem, to record a user's current state (registration, loading messages, and coordinator communication) and maintenance, at this time, write the program is in the arrangement of the state, if the program is written by someone else, It is necessary to consider whether the newly added logic will affect the running of the previously arranged combination, and whether the previously running program can continue to run. So, rather than using the go language to reduce the performance after optimization, the dismantling of the structure to reduce the mechanism, nor for the C language to write special heavy logic. 4. Challenges encountered in practice: Go language Construction Tens online high-concurrency message push system practice (from companies) _44.png encountered problems: all the machine memory in 50~60g, the highest 69G, the machine time finally reached gc3-6s. The first version of the system of single-machine 1 million connection is five months, internal communications and external brush data frequency is very low, only some single wave messages, about 200 more than a day, so QPS only a few per second, and broadcast messages one months only two or three. The system's business uses go, with push to release some non-informational content, these instructions cause the entire system load long maintained in a higher QPS. Bottlenecks encountered: Scattered in the I/o,buffe of the association threadR and objects are no longer used; The network environment is not good enough to cause a surge; machine time 2-3 seconds, if the machine time will affect the access to the QPS, in 2-3 minutes will be stuck to a request, if the internal communication is many, each component supports the response, the user may be considered by the business party to retry the timeout, This creates more pressure on the system, and the system goes into a vicious circle, with memory spikes, I/O blocking, and the process exploding. 5, feasible coping style 1 experience a Go Language program development needs to find a balance, not only use the convenience of the process and do the appropriate centralization. When each request becomes a co-process, it is necessary to open some of the coprocessor decoupling logic within each process, then use the task pool to centralize the merge request, the connection pool +pipeline use full duplex feature to improve the QPS. Go language Construction Tens online high concurrent message push system practice (from companies) _55.png first to transform the communication library, in the program directly invoke an I/O operation registration execution, cannot use short connection. Due to the optimization of the system performance parameters, about 100,000 ports are available for normal communication. Although short-link communication itself is not a problem, short connections create many objects (encoding buffer, decoding buffer, the server-side encoding buffer, the request object, the response object, the request object on the service side, the response object). Short connections also use open source RPC, with a variety of buffer will appear problems. The communication library to do a version of the iteration, the second version of the use of some values, equivalent to the surface of the blocked call I/O, but actually from the connection pool to take out a request for the service side to enjoy, and then get response and then put the connection back. Many resources (including buffer, Request, response,sever, client) connection pooling can be reused. Memory reuse for all objects, but it is actually online, so take out a connection to write data such as the server response, response and then read, then the server response time determines the time of the connection. The third edition to write a pipeline operation, pipeline will bring some additional overhead, here pipeline refers to the connection is full duplex multiplexing, anyone can write at any time, the request is blocked on the relevant channel above, from the bottom to assign a connection, Finally this connection is released for other people to write. The whole can use TCP full-duplex features to run the QPS, after centralized processing, RPC library will achieve better effect, start-up companies can choose Grpc. For systems like the message push, there is a problem if you can't control every link. If the code does not write itself, other people's code is also very difficult to use, such as using RPC to determine the type of error, adjust the error type of the simplest case, the returned error is a string, so to analyze the end is aCode problem, network problem, or return an error message to the wave needs to be processed, at this time the business logic layer to make a string of RPC judgment. QPS achieves high performance on RPC, but it can also be optimized, and the difficulty of encoding and decoding on the network connection depends on the needs of the business. After the entire RPC library is able to increase the efficiency reached the bottleneck, the rest is how to reduce the RPC calls. RPC data is full, in the run, the RPC call, the whole block of data to write to the RPC connection, write the connection immediately released to others, if expected to reduce the number of calls, every time to write as many data as possible. On the connection pool to do a task pool based on the business, switch to the task pool (different interface to different task pool), in the task pool to receive some data, and then package the request in the task pool, and finally make an RPC call to multiple data. In this way, the instantaneous on the RPC connection also decreases the number of times, reducing the serial probability. Bulk calls are business-level optimizations, and RPC interfaces support batch processing, but after a batch call, if the QPS requests are small, the number of processes is less. The number of open processes does not improve efficiency. In the case of bad network, blocking, each time you receive, the progress of the process will lead to congestion, if there is flow control, the process will be the memory collapse, which is some of the machine will be a few days of memory explosion can not go down the reason. This way to reduce the number of calls, the system performance is not particularly high, but in the task pool can do flow control, when the queue over a certain length can do the policy, the important interface to retry, not important to throw away. The flow control can be done under RPC, but RPC does not recognize the interface, and it cannot decide whether to discard or define interface operations in the event of a flow control policy. The pool of +pipeline in the task pool can maximize the throughput of the entire system (not QPS). 2 Experience two go language development pursues the limits of cost optimization, and cautiously introduces common scenarios for high-performance services in other language areas. Focus on the tradeoffs of memory pools, object pooling for code readability and overall efficiency. This procedure will increase the serial degree in some cases. With memory must be locked, no lock with the principle of operating an additional cost, the program's readability will more and more like C language, every time to malloc, each place to Free,free before the reset, all kinds of operation will find the problem after completion. The optimization strategy here is to mimic the data frame made by the memorycache, and then make a memory pool in the form of a mock-up. Go language construction Tens on-line high concurrent message push system practice (from companies) _66.png the left side of the array is actually a list of chunks of memory by size. In the protocol solution period is not known the length, the need to dynamically calculate the length, so the size of the application is not enough, put this piece back to apply for a bucket. Adding a memory pool reduces the overhead of some machines, but the readability of the program is severely reduced. The object pool policy itself has a sync library API, you need to do cleanup when applying for the object, including clearing the channel, preventing data from being disabled, and increasing overhead. In fact, most of the CPU time is free, only when the broadcast is relatively high, plus these two strategies, the program's serial degree, Memory machine time is longer, but the QPS does not necessarily rise. 6, with go characteristics of the operation to rely on the go language to do some regular operation and maintenance work requires some common sense. On-line processing is to see if the co-process on F has a process omission, high blocking. Because they are sometimes invisible, they do a unified management and visualization of the online instance monitoring. Go language provides a combination of tools to do some more convenient debugging mechanism. 1th, the profiling visualization, can be found in the history of the problem when the peak, the number of threads, you can compare two times on the line after the process to what state. For example, the operation of the time to do an analysis of the group, and then the part of the product into a separate cluster, found that the cluster is always more than the other cluster 4 to 5 memory (the program is the same), the direct opening of the diagram is very clearly displayed. In a buffer, this cluster is significantly larger because two years ago a strategy was made to prevent re-copying. The logic was written for each product opened in buffer, opened 1 million. This cluster is an open-source platform with tens of thousands of apps, which are obviously not the same when the value is provided, and its buffer is larger. All kinds of problems can be monitored by the profiling, co-process, local machine time and related quantity provided by the go language. In addition, communication visualization, long connection call is basically RPC call, RPC Library, Redis Library, MySQL library to force, the entire system can be controlled. So to the RPC library, Redis Library to do a variety of code embedded, to count its QPS, network bandwidth consumption, various error conditions. Then, by means of various pressure measurement, it is found that the optimization to be done has an effect on the performance. If a system is not evaluated, it cannot be optimized, and some potential problems can be found if it is evaluated. Communication visualization is the code that implants itself in the RPC library and the Redis library. In fact, the choice of RPC library is not important, it is important to be able to transform and monitor it. Visualization can also be done with pressure measurement. Because the pressure measurement can not be out of real-time data, the choice of 100 machines, one for the pressure measurement, through the background to see various performance parameters, and then through the structure of the RPC library to determine the data. After the pressure measurement, each process to summarize statistical data, the number of QPS in the business, the protocol version, the time to establish the connection and the number of connections per second, the performance parameters of these details determine the potential problems of the system, therefore, the pressure test platform is best to do statistical data functions.of the team did a simple test of the background, you can select some machines for the pressure measurement. A machine is being measured because of network problems and the CPU line of the machine itself is unable to detect the problem. Therefore, it is best to select more than 10 machines in the pressure measurement, each machine open 10-segment connection to do the pressure measurement. Go language Build millionsOn-line high concurrent message push system practice (from companies) _77.png operations to split the line, can reduce the machine time, but the operation of the pressure to become larger. By the way to solve the process equivalent to transfer the machine to the various processes, although the machine time is short, but many times, so the problem has not been solved. Open multi-process can save time, but the lag time and volume become gradual. The system can do a horizontal split according to the various resources used, split by business (assistant, defender, browser), function split (push, chat, embedded product) and IDC split (ZWT, BJSC, Bjdt, BJCC, SHGT, SHJC, SHHM, Amazon Singapore), which brings management costs after disassembly, and introduces (Zookeeper+deployd)/(keeper+agent) management of each node. Normally, OPS uses zookeeper to manage the dynamic profiles of each process. The second part is equivalent to profiling data, with the background to each process request, real-time monitoring of each interface, address book data also through the background to request, then keeper node to configure, the background also to configure. This function can be abstracted, theoretically expect the client has an SDK, the central node has a keeper, and then the configuration file can be managed, profiling, their own writing of the various library information collected, and then aggregated, put to local data or folders, through the interface to provide services to the background. Services are started over the network, and management is focused on keeper instead of in the background and keeper, so keeper synchronization will consider using something open source. the team wrote some tools to support some map structures in the form of key-value for normal profiles, which is equivalent to writing a convert tool. The rest is used profiling, which is equivalent to communicating with keeper and nodes, so the profiling will be very high. Keeper startup is equivalent to using an agent to start the process, and then specify the keeper central node port to pass the information over, when the keeper exactly matching the node can be sent to the past, if not equipped to lose. 7, the lecture ppt download this content according to company's Zhou Yang in the Gopher China conference technology to share the collation, hoped to have the help to everybody. The presentation of the PPT version of the download please: "Go language building high concurrent message push system practice PPT (from Qihoo) [Attachment DOWNLOAD]". 754 reads  
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.