In the previous simple and reliable UDP transmission, the Basic reliability was achieved, but the dynamic traffic control was not implemented. At that time, it was for simplicity and quick implementation. Of course, simple computing servers can also support more users. For example, a single thread process on the server may support tens of thousands of user connections. A few small-volume businesses ran for a few months and did not find any obvious problems.
Later, after migrating a service, the gray-scale test showed that the client was slow in obtaining a copy of data, much slower than the original use of TCP. After comparison, we found that this service data was compressed when it was used for TCP transmission. After the data was migrated to reliable UDP, it was not compressed, and many packages of one hundred kb and two hundred kb were not compressed. Therefore, the Client Version is released to compress the data. After the compression was launched, the overall improvement was significantly improved. After many packages were compressed, they were dozens of K or smaller. However, some large packages still have 150 kb after compression. the user's network is good and may be completed in almost one second. However, in most cases or in Medium-user network environments, there are many differences between such large packages, in the old mode, the TCP link is completed in 3 seconds or 3 or 4 seconds, while the reliable UDP link is completed in more than 10 seconds. Such a poor situation is still a mess, unacceptable...
At that time, in mid-December, I asked for leave for the last two weeks of October, and I bought the ticket. No way, the problem is there, and the business is moving forward. The original plan is to ensure that Q4 can be launched stably. Therefore, even if you have taken a vacation, you still have to go to the company to work overtime, hoping to optimize it to acceptable before you leave. In addition, because of the rush to use, it can only be optimized from the server side. If you make adjustments to the overall protocol, it may take the last two months for the client to be released.
The detailed debug logs of several clients and servers are analyzed (log collection is not easy, and some logs are collected by other colleagues at home, ejia wide or not well-known ADSL Network ), it is found that apart from the retransmission of a few packets after the Nack, batch retransmission is a long retransmission after the timer times out, and no retransmission is performed after the Ack is received by the client. The client's regular Ack is slow, and the slow ack server does not use this feedback after receiving it. The retransmission is certainly slow. Therefore, the first optimization is to slide the Sliding Window forward after receiving the ACK, and re-transmit the packets that are not confirmed by the ACK immediately. After the first optimization is done, the network environment is improved by several seconds.
It is also found from the log that each time data is sent, packets within the permitted range in the sliding window are delivered at one time. However, after the client receives dozens of packets, the subsequent packets are lost. Assume that 500 UDP packets are sent at a time. If a packet contains 500 bytes, the data about 500*500 bytes = KB is sent. The bandwidth is equivalent to 2 Mbps by multiplying 8. Many users have less than 2 Mbps of network bandwidth, and many users share bandwidth of kbps at ordinary times. Therefore, the bandwidth of dozens or nearly hundreds of users exceeds the bandwidth of the user end, it will be lost by the carrier's network equipment. So naturally, I think of the second optimization. When the server sends data, it will stop sending 100 data records (or 50 data records) and wait for the next time.ProgramTransfer to continue sending. This will not exceed the bandwidth of ordinary users, resulting in fewer packet loss and higher efficiency. After this is done, the speed is increased by several seconds. (Of course, the ideal way to improve is to modify both the client and server, adjust the Protocol, make ack a little agile, and then control the sending rate only through the sliding window .. however, the modification to the client is currently not feasible. In the case that the Protocol is not modified, the server can only do so)
The program has another disadvantage. Although it controls the sending rhythm, the interval between two sending operations is one second, and the time interval is too long. Historical reasons, ProcessCodeThe timer granularity is second. After analyzing the code for several times, I think it is still possible to make timer more refined through minor changes, of course, the most feared thing is that the timer of multiple classes in other modules is wrong after the change... the basic principle of this part of timer implementation is that every cycle reads the current time and records it in seconds. If it is different from the previous one, it indicates that it has been over one second, then execute the timer check program and call various timer execution functions upon timeout.
You can improve the accuracy by a little. As long as the read and record time is in milliseconds, the accuracy still cannot be accurate to 1 millisecond. Because of a program loop, you also call epoll_wait () it may take several dozen milliseconds to wait, so the timer precision achieved in this way cannot be lessEpoll_wait () Timeout time (the program is probably setTimeout in dozens of milliseconds ). So I finally set the precision to 100 ms. Timer initializes the newly added interface and sets the timeout time in milliseconds. The old interface is retained in seconds. In this way, in addition to reliable UDP programs, other module code does not need to be moved (of course, their time is only fine to 1 second ). The compatibility is most convenient, and the possibility of errors is greatly reduced.
After the optimization, the service statistics show that the transmission efficiency is very close to the previous TCP link. In most cases, the data of the service can be obtained within three seconds, and the gap is only a few percentage points. I am also on vacation with peace of mind.
Bytes -------------------------------------------------------------------------------------------------
For more blog posts, please subscribe to RSS. For more Weibo posts, please follow @ Qianli lone row nerd