Watch out! The TCP native Client connects to the native server

Source: Internet
Author: User
Tags unix domain socket redis server

Last week, when we were performing a performance test, we found a problem.
We have a Redis server on our servers, listening on the 1234 port of 0.0.0.0, and another process in this machine is frequently initiated to the service side.Short Connection, resulting in two questions:
1. A large number of connections to the TIME_WAIT state;
2. The CPU utilization of the process initiating the connection is close to 100%.
These two results seriously affect the performance of our gateway, before analyzing the specific reasons, first of all, to do a promotion, that is: Local computer connection, the preferred UNIX domain socket instead of tcp!

The reason does not need data analysis, only theoretical analysis is enough, if you want to do the Linux kernel stack IP layer processing and soft interrupt scheduling have enough understanding, of course, these are simple.
First, let's take a look at question 1. Time_wait not much to say, as long as any one end of the active disconnection, then it may eventually enter the TIME_WAIT state, specifically whether it will enter the Linux depends on a few factors, first, you have no two ends open timestamps, if opened, Do you have to open recycle on the server, if it is turned on, then the time_wait socket will quickly disappear, that is to say, want to let recycle function, you must open timestamps. If there is no timestamps, then there will be a large number of sockets for the TIME_WAIT state.
In the implementation of the Linux kernel protocol stack, all the data streams connected to this machine will eventually be routed to loopback, and if you do not have the source IP address, the source/destination IP address is 127.0.0.1! If the service port is fixed, it will eventually accept 65535-1 connections, minus 1 because the server is already bind the service port, so the client cannot bind again. This is reasonable, because a service can only accept 65,535 connections or 65,534 connections for a specific IP address in accordance with the uniqueness of the four-tuple, but the problem is that if the demand is huge, it obviously doesn't meet the requirements, you know, as a server, it's about the total maximum number of concurrent connections, The possibility of initiating more than 60,000 connections on a machine is not very good, so TCP is reasonable in most cases, the use of 16bit port number just fine, because the protocol head can not be too large, otherwise the load rate will be smaller, which is clearly required by the network transmission, however, the machine even when the computer, do not need network transmission, You would certainly think that there are many needs to be satisfied, but TCP is not suitable for such occasions.
This machine connected to the machine, no network transmission delay, the throughput limit is limited to the use of local resources, so concurrent 100,000 or even more demand is reasonable, but TCP does not meet, because it only 16bit port number, the target port fixed, and only 65,534 connections. How to solve it? We know 127.0.0.0/8 belong to loopback, we can use different source IP address, if want to do this, there are two choices, That is either the client bind source IP is 127.x.y.z, or snat into 127.x.y.z, so that you can accept a huge amount of connection requirements. But this is not the final solution, why use TCP? TCP is designed for network transmission, its flow control to deal with different hosts, the control of the vagaries of the network, in this machine, these are not problems, so this machine connected to the machine, it is best to use native sockets, such as UNIX domain sockets.
Look again at question 2, a connection to the local TCP packet finally reached the loopback xmit send function, which simply dispatched a soft interrupt processing on this CPU, and then scheduled to execute after the next interrupt, there is a great chance to be in the context of the current sending process, that is to say , the sending process sends operations in its context, at which point the soft interrupt borrows its context to trigger the receive operation, and then the lock's overhead is obvious, because a large number of TW sockets inserts and deletes require frequent lock hash tables, which are fully billed to the name of the sending process , is not fair.
Note that in the Linux kernel, SOFTIRQ is executed in two contexts, in any context after a hardware outage, in the context of one kernel thread per CPU, which is billed to the SI percentage of the top command, which is billed to any interrupted process.

Watch out! The TCP native Client connects to the native server

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.