This is a creation in Article, where the information may have evolved or changed.
1. Preface
The penetration rate of go language is more and more high, at the same time everyone has more and more attention to go language combat experience. The go language is very suitable for distributed systems with high concurrency, complex communication interaction and heavy business logic, and has the advantages of good development experience, stable service at a certain level, and performance satisfying needs.
This article is organized from Company's Zhou Yang at Gopher China 2015 Conference (PPT download: "Go language building high concurrent message push system practice PPT [ATTACHMENT DOWNLOAD]"), the share of mass online message push system, for example, To explore the problems encountered in using the go language to build high-concurrency message push systems and the various practical techniques that have been summed up.
2. The advantage of Go language in the field of basic service development
The go language is very suitable for distributed systems with high concurrency, complex communication interaction and heavy business logic, and has the advantages of good development experience, stable service at a certain level, and performance satisfying needs. Take message push system as an example, at present message pushes the system service in 50+ internal product, the million development platform app, real-time long connection hundreds of millions of magnitude, the day alone billions of magnitude, within 1 minutes can realize million level broadcast, the peak of tens of thousands of magnitude, 400 physical machine, More than 3,000 instances are distributed across 9 separate clusters, each spanning nearly 10 IDC locations at home and abroad.
After two years of iteration, the system functions need to do some expansion, support chat scene business, stable support for a variety of chat business app, single channel multi-app reuse, long connection support upstream, support different intensity of callbacks, smart hardware products, to provide customized message push and forwarding services.
Machine performance, the system of single-machine in the test environment, if only a long connection (after the system parameter tuning), the data will often depend on the drop rate. In the case of a stable connection, the heartbeat time is not affected, and the internal QPS achieves a 300W long connection in an acceptable state of pressure measurement. The online single machine is actually used up to 160W long connection, divided into two instances. The on-line scene of the QPS is related to egress bandwidth, protocol severity, access network status and business logic, but as long as the factors affecting I/O are turned off and the data is not captured by the encrypted protocol pure performance, the QPS can reach 2~5, but if there is more encryption, the QPS will drop.
11.png (88.23 KB, download count:)
Download attachments to albums
Uploaded 6 months ago
In addition, the message pushes the system to re-logic, the entire system by the picture interactive complete push function. The access-side process is mainly provided by the Client SDK access to the dispatcher server, the client in the selection, the dispatcher server will upload some data, according to the relevant state of the incoming services, the IP or domain name of the service will be sent to the relevant client, The client caches the IP policy based on the current network situation, and then makes a long connection to the cache IP with the current service.
The housekeeping service has a very heavy logic before the business and architecture disassembly, and it basically has to be progressive with all the services behind it, and it itself carries millions of long connected businesses. The logic of this service mainly focuses on the interaction of internal communication, external communication and external connection.
First, the user access long connection, the long-connected service needs to authenticate the identity of the user, but also to support the company's various products, security-related, callback-related business authentication access; second, after the identity to accept authentication, to do the back end of the connector service, memory storage to do a communication, Binds the user to his or her home and bedroom (registration operation), and the single connection involves the interaction logic of unbinding, multiple binding, binding multiple users, and so on.
User access may be flash-off, to the Flash (due to network switching caused by the service side did not promptly detect the disconnection situation) before the migration of messages, all kinds of operations in this logic. As a housekeeping service, to interact with the backend coordinator services, because the data is up and down, users may upload various data (such as audio or a simple data stream) when they contact, and return through the coordinator server callback. The relevant access party to the client's upstream data, the service will be a security policy, whitelist, IP throttling policy, and then write to their own zookeeper/keeper to communicate. Another logic on the backend, such as loading some messages when the user enters, may be using the storage access Layer (Saver service), so the Saver service also loads, stores messages, and so on. The message system itself also has business logic, such as the policy of loading messages by product, protocol, including the temporary caching of broadcast information during global broadcasts.
All in all, the service is the most important point of the whole, if you refactor in C, though the schema is well-disassembled, it will increase the overhead of some communication because the logic always exists. Go in developing this kind of logic, all the logic is focused on the most front-end, but also in the most interactive communication places, so the go language is very suitable for this kind of heavy logic.
The API access layer will have a center service responsible for all of the app's providers, who will do some simple authentication through the center service and then send the message to the inside of the cluster. For example, send a unicast to a user, first request register to obtain this user, center obtains to this user and then communicates with the router service, obtains the registered connection channel identity, the server, then communicates with it and then sends the long connection. Center Service heavy work such as national broadcast, all tasks need to be broken down into a series of sub-tasks, and then in all sub-tasks call connected service, Saver service to get online and offline related users, and then collectively pushed to the room Service, So the whole cluster was under a lot of pressure at that moment. It can be seen that the whole system communication is more complex, and the structure after disassembly also has very heavy logic.
22.png (105.87 KB, download count:)
Download attachments to albums
Uploaded 6 months ago
Although it is logically heavy, the program is basically linear. As you can see, the basic task is equivalent to opening a process for each user. All logic is done within two loops (such as registering an operation). The blocking is blocked by the client to display. Usually, the heartbeat response to be timely, the heartbeat of the main loop to save the heartbeat, then to use non-blocking I/O, through the way of centralized control, management, operation, and then through the asynchronous way back, the key to the whole cycle is to respond to the ping package of services in a timely manner. Therefore, the logic is better, basically concentrated in two of the process, and no matter what time reading code, it is linear.
3, go and C development experience of the comparison
33.png (171.85 KB, download times:)
Download attachments to albums
Uploaded 6 months ago
When they encounter bottlenecks and do not know how much efficiency they can improve with go, they write the development of C language. Using the C language to use the Oneloop per thread principle, according to the business data processing needs to open a quantitative thread, because each thread's I/O can not block, so to adopt the way of asynchronous I/O, each thread has a eventloop. A thread for tens of thousands of user Service will create a problem, to record a user's current state (registration, loading messages, and coordinator communication) and maintenance, at this time, write the program is in the arrangement of the state, if the program is written by someone else, It is necessary to consider whether the newly added logic will affect the running of the previously arranged combination, and whether the previously running program can continue to run. So, rather than using the go language to reduce the performance after optimization, the dismantling of the structure to reduce the mechanism, nor for the C language to write special heavy logic.
4. Challenges encountered in practice
the challenges encountered:
44.png (102.52 KB, download count:)
Download attachments to albums
Uploaded 6 months ago
problems encountered:
All machine memory is in 50~60g, at the highest 69G, machine time finally reach gc3-6s. The first version of the system of single-machine 1 million connection is five months, internal communications and external brush data frequency is very low, only some single wave messages, about 200 more than a day, so QPS only a few per second, and broadcast messages one months only two or three. The system's business uses go, with push to release some non-informational content, these instructions cause the entire system load long maintained in a higher QPS.
bottlenecks encountered:
- Scattered in the thread I/o,buffer and objects are no longer used;
- Unrestrained use of the association, the network environment is not good cause a surge;
- Machine time 2-3 seconds, if the machine time will affect the access to the QPS, in 2-3 minutes will be stuck a request, if the internal communication, each component supports the response, the user may be considered by the business party to retry, so that the system caused more pressure, the system will enter a vicious circle;
- Memory spikes, I/O blocking, and co-process surges.
5, the feasible way to respond
1 Experience One
Go Language Program development needs to find a balance, both the convenience of the association and the appropriate centralization of processing. When each request becomes a co-process, it is necessary to open some of the coprocessor decoupling logic within each process, then use the task pool to centralize the merge request, the connection pool +pipeline use full duplex feature to improve the QPS.
55.png (43.94 KB, download number: 8)
Download attachments to albums
Uploaded 6 months ago
First of all to transform the communication library, in the program directly call an I/O operation registration execution, cannot use short connection. Due to the optimization of the system performance parameters, about 100,000 ports are available for normal communication. Although short-link communication itself is not a problem, short connections create many objects (encoding buffer, decoding buffer, the server-side encoding buffer, the request object, the response object, the request object on the service side, the response object). Short connections also use open source RPC, with a variety of buffer will appear problems.
The communication library to do a version of the iteration, the second version of the use of some values, equivalent to the surface of the blocked call I/O, but actually from the connection pool to take out a request for the service side to enjoy, and then get response and then put the connection back. Many resources (including buffer, Request, response,sever, client) connection pooling can be reused. Memory reuse for all objects, but it is actually online, so take out a connection to write data such as the server response, response and then read, then the server response time determines the time of the connection. The third edition to write a pipeline operation, pipeline will bring some additional overhead, here pipeline refers to the connection is full duplex multiplexing, anyone can write at any time, the request is blocked on the relevant channel above, from the bottom to assign a connection, Finally this connection is released for other people to write. The whole can use TCP full-duplex features to run the QPS, after centralized processing, RPC library will achieve better effect, start-up companies can choose Grpc. For systems like the message push, there is a problem if you can't control every link. If the code is not written by itself, it is very difficult for other people's code to easily use it, such as using RPC to determine the type of error, adjust the error type of the simplest case, the return error is a string, so to analyze whether it is a coding problem, network problems, or the wave returned an error message needs to be processed, At this point the business logic layer is going to make a string judgment on RPC.
QPS achieves high performance on RPC, but it can also be optimized, and the difficulty of encoding and decoding on the network connection depends on the needs of the business. After the entire RPC library is able to increase the efficiency reached the bottleneck, the rest is how to reduce the RPC calls. RPC data is full, in the run, the RPC call, the whole block of data to write to the RPC connection, write the connection immediately released to others, if expected to reduce the number of calls, every time to write as many data as possible.
On the connection pool to do a task pool based on the business, switch to the task pool (different interface to different task pool), in the task pool to receive some data, and then package the request in the task pool, and finally make an RPC call to multiple data. In this way, the instantaneous on the RPC connection also decreases the number of times, reducing the serial probability. Bulk calls are business-level optimizations, and RPC interfaces support batch processing, but after a batch call, if the QPS requests are small, the number of processes is less. The number of open processes does not improve efficiency. In the case of bad network, blocking, each time you receive, the progress of the process will lead to congestion, if there is flow control, the process will be the memory collapse, which is some of the machine will be a few days of memory explosion can not go down the reason. This way to reduce the number of calls, the system performance is not particularly high, but in the task pool can do flow control, when the queue over a certain length can do the policy, the important interface to retry, not important to throw away. The flow control can be done under RPC, but RPC does not recognize the interface, and it cannot decide whether to discard or define interface operations in the event of a flow control policy. The pool of +pipeline in the task pool can maximize the throughput of the entire system (not QPS).
2 Experience Two
Go language development pursues the limits of cost optimization, and cautiously introduces common scenarios for high-performance services in other language areas.
Focus on the tradeoffs of memory pools, object pooling for code readability and overall efficiency. This procedure will increase the serial degree in some cases. With memory must be locked, no lock with the principle of operating an additional cost, the program's readability will more and more like C language, every time to malloc, each place to Free,free before the reset, all kinds of operation will find the problem after completion. The optimization strategy here is to mimic the data frame made by the memorycache, and then make a memory pool in the form of a mock-up.
66.png (64.77 KB, download count:)
Download attachments to albums
Uploaded 6 months ago
The array on the left is actually a list, which blocks the memory by size. In the protocol solution period is not known the length, the need to dynamically calculate the length, so the size of the application is not enough, put this piece back to apply for a bucket. Adding a memory pool reduces the overhead of some machines, but the readability of the program is severely reduced.
The object pool policy itself has a sync library API that cleans up the object when it is requested, including clearing the channel, preventing data from being disabled, and increasing overhead. In fact, most of the CPU time is free, only when the broadcast is relatively high, plus these two strategies, the program's serial degree, Memory machine time is longer, but the QPS does not necessarily rise.
6, with go characteristics of operation and maintenance
It takes some common sense to do some regular operations based on the go language. On-line processing is to see if the co-process on F has a process omission, high blocking. Because they are sometimes invisible, they do a unified management and visualization of the online instance monitoring. Go language provides a combination of tools to do some more convenient debugging mechanism. 1th, the profiling visualization, can be found in the history of the problem when the peak, the number of threads, you can compare two times on the line after the process to what state. For example, the operation of the time to do an analysis of the group, and then the part of the product into a separate cluster, found that the cluster is always more than the other cluster 4 to 5 memory (the program is the same), the direct opening of the diagram is very clearly displayed. In a buffer, this cluster is significantly larger because two years ago a strategy was made to prevent re-copying. The logic was written for each product opened in buffer, opened 1 million. This cluster is an open-source platform with tens of thousands of apps, which are obviously not the same when the value is provided, and its buffer is larger. All kinds of problems can be monitored by the profiling, co-process, local machine time and related quantity provided by the go language.
In addition, communication visualization, long connection call is basically RPC call, RPC Library, Redis Library, MySQL library to force, the entire system can be controlled. So to the RPC library, Redis Library to do a variety of code embedded, to count its QPS, network bandwidth consumption, various error conditions. Then, by means of various pressure measurement, it is found that the optimization to be done has an effect on the performance. If a system is not evaluated, it cannot be optimized, and some potential problems can be found if it is evaluated. Communication visualization is the code that implants itself in the RPC library and the Redis library. In fact, the choice of RPC library is not important, it is important to be able to transform and monitor it.
Visualization can also be done with pressure measurement. Because the pressure measurement can not be out of real-time data, the choice of 100 machines, one for the pressure measurement, through the background to see various performance parameters, and then through the structure of the RPC library to determine the data. After the pressure measurement, each process to summarize statistical data, the number of QPS in the business, the protocol version, the time to establish the connection and the number of connections per second, the performance parameters of these details determine the potential problems of the system, therefore, the pressure test platform is best to do statistical data functions. of the team did a simple test of the background, you can select some machines for the pressure measurement. A machine is being measured because of network problems and the CPU line of the machine itself is unable to detect the problem. Therefore, it is best to select more than 10 machines in the pressure measurement, each machine open 10-segment connection to do the pressure measurement.
77.png (97.86 KB, download count:)
Download attachments to albums
Uploaded 6 months ago
Operation and maintenance of the line to split, can reduce the machine time, but the operation of the pressure to become larger. By the way to solve the process equivalent to transfer the machine to the various processes, although the machine time is short, but many times, so the problem has not been solved. Open multi-process can save time, but the lag time and volume become gradual. The system can do a horizontal split according to the various resources used, split by business (assistant, defender, browser), function split (push, chat, embedded product) and IDC split (ZWT, BJSC, Bjdt, BJCC, SHGT, SHJC, SHHM, Amazon Singapore), which brings management costs after disassembly, and introduces (Zookeeper+deployd)/(keeper+agent) management of each node.
Normally, OPS uses zookeeper to manage the dynamic profiles of each process. The second part is equivalent to profiling data, with the background to each process request, real-time monitoring of each interface, address book data also through the background to request, then keeper node to configure, the background also to configure. This function can be abstracted, theoretically expect the client has an SDK, the central node has a keeper, and then the configuration file can be managed, profiling, their own writing of the various library information collected, and then aggregated, put to local data or folders, through the interface to provide services to the background. Services are started over the network, and management is focused on keeper instead of in the background and keeper, so keeper synchronization will consider using something open source. the team wrote some tools to support some map structures in the form of key-value for normal profiles, which is equivalent to writing a convert tool. The rest is used profiling, which is equivalent to communicating with keeper and nodes, so the profiling will be very high. Keeper startup is equivalent to using an agent to start the process, and then specify the keeper central node port to pass the information over, when the keeper exactly matching the node can be sent to the past, if not equipped to lose.
7, lecture ppt download
This article is based on the technology of the company Zhou Yang in the Gopher China Conference, hoping to help you. The presentation of the PPT version of the download please: "Go language building high concurrent message push system practice PPT [Attachment DOWNLOAD]".
8. More articles about push technology
"iOS Push service APNs Detailed: design ideas, technical principles and defects, etc."
"Android Side message Push Summary: The principle of implementation, heartbeat, problems encountered, etc."
Literacy Patch: Understanding the MQTT Communication protocol
"A full Android push demo based on MQTT Communication Protocol"
Interview with IBM Technical Manager: the development and status of the MQTT protocol
"Ask for Android message push: GCM, XMPP, mqtt Three kinds of scenarios of the pros and cons"
The analysis of mobile real-time message push technology
Literacy stickers: Talking about the principles and differences of real-time messaging push in iOS and Android background
Absolute dry: The key point of push service technology based on Netty to realize mass access
Mobile IM Practice: Google message push service (GCM) research (from)
Why, Im tools such as QQ do not use GCM service push message? 》
Technology practice sharing of large-scale high concurrency architecture for Aurora Push Systems
"From HTTP to Mqtt: A practice overview of app data communication based on location services"
"Meizu 25 million long-connected real-time messaging push architecture technology practice sharing"
An interview with Meizu architect: The experience of real-time message push system with massive long connections
"In-depth talk about Android message push this little thing"
"Message push practices for implementing hybrid mobile apps based on WebSocket (with code examples)"
"A secure and extensible subscription/push service implementation approach based on long connections"
"Practice sharing: How to build a high-availability mobile messaging push system?" 》
"Go language constructs high concurrent message push system practice "
>> more Similar articles ...
(Original link: Click here to enter)