TCP parameter optimization under Windows system

Source: Internet
Author: User
Tags ranges connection reset

1 TCP status of the connection

Let's start by describing the status of the TCP connection during the setup and shutdown process. The TCP connection process is the transformation of the State, and the factors that cause the state to transition include user invocation, specific packets, and timeouts, as shown in the following states:

  • CLOSED : The initial state, which indicates that there is no connection.
  • LISTEN : A socket on the server side is listening for connection requests from a remote TCP port.
  • syn_sent : waits for a confirmation message after sending a connection request. When the client socket makes a connect connection, the SYN packet is sent first, then enters the syn_sent state, and then waits for the server side to send the 2nd packet in the three-time handshake.
  • syn_received : A connection request is received with a loopback acknowledgment message and a peer connection request, and then waits for confirmation information. This is usually an intermediate state in the three-time handshake process that establishes a TCP connection, indicating that the server-side socket receives a SYN packet from the client and responds.
  • established : indicates that the connection has been established and can be transferred.
  • fin_wait_1 : The party that actively closes the connection waits for the other party to return an ACK packet. If the socket actively shuts down the connection in the established state and sends the FIN packet to the other party (indicating that no data needs to be sent), it enters the fin_wait_1 state, waits for the other party to return the ACK packet, and can then read the data, but cannot send the data. Under normal circumstances, regardless of the other party in what state, should immediately return the ACK packet, so the fin_wait_1 state is generally difficult to see.
  • fin_wait_2 : The party that actively closes the connection receives the ACK packet returned by the other party and waits for the other party to send the fin package. When the socket in the FIN_WAIT_1 state receives the ACK packet returned by the other party, it enters the fin_wait_2 state. Because the socket in the fin_wait_2 state needs to wait for each other to send the FIN packet, all is often seen. If you receive a packet with fin and ACK at the same time in the Fin_wait_1 state, you will go directly to the TIME_WAIT state without going through the fin_wait_2 state.
  • time_wait : The party that actively closes the connection receives a FIN packet sent by the other party and returns an ACK packet (indicating that the other party also no longer has data to be sent and cannot read or send the data thereafter). Then wait long enough (2MSL) to ensure that the other party receives the ACK packet (taking into account the potential of the missing ACK packet and the impact of the stray packet), and finally returns to the closed state, releasing the network resources.
  • close_wait : indicates that the side of the passive shutdown connection is waiting to close the connection. After receiving the fin packet sent by the other party (indicating that the other party no longer has data to send), the corresponding return ACK packet, and then enter the Close_wait state. In this state, if your own data is not sent, you can continue to send to the other side, but no longer read the data until the data is sent.
  • Last_ack : the side of the passive shut-down connection can send a FIN packet to the other party after the data has been sent in the Close_wait state (indicating that it no longer has data to be sent), and then waits for the other party to return the ACK packet. When the ACK packet is received, it returns to the closed state, releasing network resources.
  • CLOSING : A rare exception condition is compared. Under normal circumstances, send fin packets should be received (or received) the other party's ACK packet, and then received the other side of the fin package, and the closing status indicates that after sending fin packet and did not receive the other's ACK package, but has received the other side's fin package. There are two situations that can lead to this state: first, if both sides close the connection at the same time, it is possible for both sides to send fin packets simultaneously, and if the ACK packet is lost and the other's fin packet is sent out quickly, the fin will arrive before the ACK.

TCP the state of the connection transitions as shown

2 TCP How connections are closed

Establishing a TCP connection requires three handshakes, while closing the connection requires four handshakes and is divided into active and passive shut down. This is because the TCP connection is full, and I close your connection and it doesn't mean that you shut down my connection, so both sides must shut down separately. When a party completes its data sending task, it can send fin packets to terminate the connection in this direction, indicating that no data needs to be sent; the party receiving the fin package cannot read the data again, but can still send the data. Take the client active shutdown connection as an example:

    1. The client sends a FIN packet to the server, indicating that the client actively closes the connection and then enters the fin_wait_1 state, waiting for the server to return an ACK packet. After that, the client cannot send data to the server, but can read the data.
    2. After the server receives the fin packet, it sends an ACK packet to the client and then enters the close_wait state, after which the server cannot read the data again, but can continue to send data to the client. After the client receives the ACK packet returned by the server, it enters the fin_wait_2 state, waiting for the server to send fin packets.
    3. After the server finishes sending the data, the fin packets is sent to the client, then enters the Last_ack state, waits for the client to return the ACK packet, and the server cannot read the data or send the data.
    4. After the client receives the FIN packet, it sends an ACK packet to the server, then enters the time_wait state, Waits long enough (2MSL) to ensure that the server receives the ACK packet, and finally returns to the closed state, releasing the network resource. When the server receives the ACK packet returned by the client, it returns to the closed state and frees the network resources.

TCP connections are set up to shutdown and need to undergo the following state migrations (assuming the client initiates the connection and actively shuts down the connection):

    • Client

Fin_wait_2, Time_wait, Fin_wait_1, established, Syn_sent, CLOSED,

    • Server

Close_wait, Last_ack, established, syn_received, LISTEN, Clodes,

3 . To Server with the Client the Impact

After a detailed understanding of the status and shutdown of the TCP connection, we will find that the TIME_WAIT state is a pit-daddy presence! The party that actively closes the connection will enter the TIME_WAIT state after sending the last ACK packet, waiting for 2MSL time before the network resource can be freed, regardless of whether the other party receives it. The MSL is the maximum Segment Lifetime (the maximum lifetime of a packet), which is the longest time a packet can survive on the Internet, and the packet will disappear in the network if it exceeds that time. The operating system typically sets 2MSL to 4 minutes, at least 30 seconds, so the time_wait state is generally maintained at 30 seconds to 4 minutes. This is the TCP/IP protocol is necessary, is designed by the TCP/IP designer, that is, unable to solve. There are two main reasons for the existence of the TIME_WAIT state:

    1. reliably implement TCP termination of the full-duplex connection. when closing a TCP closed connection, the final ACK packet is issued by the active shut-down party, and if the ACK packet is lost, the passive shutdown will re-send the fin packet, so the active party must maintain the status information to allow it to re-send the ACK packet. If this status information is not maintained, then the active party will return to the closed state and respond to the RST packet to the passive side of the fin packet, while the passive shut-down party interprets the package as an error (the socketexception in Java will be thrown connection reset). Therefore, to realize the normal termination of TCP full-duplex connection, it must be able to handle the loss of any packet in the four handshake protocol, the active shutdown must maintain the state information into the TIME_WAIT state.
    2. Make sure that the stray packets disappear in the network, preventing the packets in the last connection from getting lost and re-appearing, affecting the new connection. TCP packets may be lost due to router anomalies, during the lost, the packet sender may resend the packet due to timeouts, the lost packets will be sent to the destination after the router is restored, the lost packet is called lost Duplicate. After shutting down a TCP connection, if a new TCP connection is established using the same IP address and port immediately, it is possible that the previous connection's stray packets will reappear after the previous connection is closed, affecting the newly established connection. To avoid this situation, the TCP protocol does not allow the use of a connection's IP and port in the TIME_WAIT state to initiate a new connection, and only after 2MSL of time, to ensure that all stray packets in the last connection have disappeared in the network before a new connection can be established securely.

For the client, each connection needs to occupy one port, and the system allows less than 65,000 available ports (which can be achieved after TCP parameter optimization). Therefore, if the client initiates excessive connections and actively shuts down (assuming no ports are being reused or connecting multiple servers), there will be a large number of connections in the TIME_WAIT state after shutting down, waiting for 2MSL of time before releasing network resources (including ports). The client will not be able to create a new connection due to a lack of available ports.

For server (especially for servers with high concurrent short connections), server-side connections to client are using the same port, which is the listening port, each connection is differentiated by a five-tuple, including the source IP address, the source port, the Transport Layer Protocol number (protocol type), Destination IP address, destination port, so theoretically, the server is not limited by the number of system ports. However, the server has a limit on the number of connections on each port, and it uses a hash table to record each connection on the port and is limited by the maximum number of open file descriptors. Therefore, if the server actively shuts down the connection, there will also be a large number of connections in the TIME_WAIT state after shutting down, waiting for 2MSL of time before releasing network resources (including connection records and file descriptors on the hash table). As a result, the server will not be able to accept new connections due to the limitations of the hash table and file descriptors, resulting in a sharp decline in performance, which will continue to cause severe fluctuations. There are three ways to respond to this situation:

    1. Attempts to have the client actively shut down the connection, because the concurrency of each client is relatively low, resulting in no performance bottleneck.
    2. Optimizes the server's system TCP parameters to balance its network resource maximum, consumption speed, and recovery speed.
    3. Rewriting the TCP protocol and re-implementing the underlying code is difficult, and the stability and security of the system can be compromised.

4. TcpWindowSize

The value of TcpWindowSize represents the window size of TCP. The TCP receive Window (TCP data receive buffer) defines the maximum number of bytes that can be sent by the sending side in a state that does not receive acknowledgment information from the receiving end. The larger the value, the less acknowledgment information is returned, and the better the corresponding communication between the sending and receiving ends. This low value reduces the likelihood that the sender will time out while waiting for the receive side to return the acknowledgement, but this increases network traffic and reduces the effective throughput. TCP dynamically adjusts the integer multiples of a maximum segment length of MSS (Maximum Segment Size) between the sending and receiving ends. MSS determines when the connection is started, because the TCP Receive window is adjusted to an integer multiple of MSS, and the proportion of the TCP data segment is increased at full length in the data transmission, therefore the network throughput rate is increased.

By default, TCP will attempt to optimize the window size based on MSS, starting at 16KB with a maximum value of 64KB. The maximum value of the TcpWindowSize is typically 65535 bytes (64KB), the Ethernet maximum segment length is 1460 bytes, and the maximum integer of 1460 below 64KB is 62420 bytes, so you can set TcpWindowSize to 62420 in the registry. As a performance-optimized value for high-bandwidth networks. Here's how:

Browse to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters registry sub-key, Under the Parameters subkey, create or modify the REG_DWORD value named TcpWindowSize, which ranges from 0 to 65535, and the value is set to 62420.

5 tcp1323opts 

To make more efficient use of high-bandwidth networks, you can use a much larger TCP window size than the TCP windows above, which is a new feature in Windows 2000 and Windows Server 2003, called TCP Window Scaling, It increases the previous limit of 65535 bytes (64KB) to 1073741824 bytes (1GB). On connections that have a high product value for bandwidth and latency (such as satellite connections), you may need to increase the size of the window to more than 64KB. With TCP Window Scaling, the system can allow for the transmission of larger amounts of data to be acknowledged between messages, increasing network throughput and performance. The time required to send and receive round-trip traffic is known as the loopback time (RTT). TCP Window Scaling is only really effective if both sides of the TCP connection are turned on. TCP has a timestamp option that increases the estimated value of the RTT value by more frequent calculations, which is particularly useful for estimating the RTT value of a connection over a long-range WAN, and for more precise adjustment of the TCP re-send time-out. The timestamp in the TCP header provides two zones, one record begins to re-send the time, and the other record receives the time. Timestamps are especially useful for TCP Window Scaling, which means that large packets of data are sent before the acknowledgement is received, and the activation timestamp is only 12 bytes higher on the head of each packet, with minimal impact on network traffic. Data integrity and data throughput maximizing which is more important is an issue that needs to be evaluated. In some environments, such as video streaming, a larger TCP window is required, which is the most important, and data integrity is ranked second. In this environment, TCP Window scaling can not open timestamps. This attribute is valid when both the send and receive sides activate TCP Window scaling and timestamps. However, if a timestamp is added to the package, after Nat, if the previous same port is used and the timestamp is greater than the timestamp in the SYN issued by the connection, it will cause the server to ignore the SYN, showing that the user cannot complete the TCP handshake 3 times. Initially, a small TCP window is generated, and then the window size is incremented according to the internal algorithm. Here's how:

Browse to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters registry sub-key, Create or modify the REG_DWORD value named Tcp1323Opts under the Parameters subkey, which means: 0 (default) disables TCP window scaling and timestamp; 1 means only TCP window scaling is enabled ; 2 means only timestamps are enabled; 3 means both TCP Window scaling and timestamps are enabled. After the tcp1323opts is set to activate TCP Window scaling, you can increase the value of the registry key TcpWindowSize above by up to 1GB, and for best performance, the value here is best set to a multiple of MSS, with a recommended value of 256960 bytes.

6 TCP Control block Table

For each TCP connection, the control variable is stored in a block of memory called the TCP Control block (TCB). The size of the TCB table is controlled by the registry key MaxHashTableSize. In a system with many active connections, setting a larger table can reduce the time the system locates the TCB table. Partitioning on the TCB table can reduce contention for access to the table. By increasing the number of partitions, TCP performance is optimized, especially on multiprocessor systems. The registry key Numtcbtablepartitions controls the number of partitions, which by default is the number of processors squared. The TCB is usually provisioned in memory to prevent TCP from repeatedly reconnecting and disconnecting, and the TCB is repeatedly re-locating a waste of time, which facilitates memory management, but also limits the number of TCP connections allowed at the same time. The registry key MaxFreeTcbs determines the number of connections before the TCB in the idle wait state is re-usable, and is often set above the default value in the NT schema to ensure that there is sufficient pre-provisioned TCB. A new feature has been added starting with Windows 2000 to reduce the likelihood of running out of provisioned TCB. If there are more connections in the waiting state than the settings in Maxfreetwtcbs, all connections that wait longer than 60 seconds are forced to shut down and be enabled again later. This attribute is not used to optimize performance after merging to Windows MaxFreeTcbs and Windows Server 2003. Specific operation:

Browse to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters registry sub-key, Create or modify the REG_DWORD value named MaxHashTableSize under the Parameters subkey, which ranges from 1 to 65536, and must be 2 N, with a default of 512 and a recommended setting of 8192. The REG_DWORD value named Numtcbtablepartitions is then created or modified under the Parameters subkey, and the range is from 1 to 65536 and must be 2 N, the default is the number of processors squared, and is recommended to be set to 4 times times the number of processor cores.

7 TcpTimedWaitDelay

The value of TcpTimedWaitDelay indicates the time that the system must wait before releasing a closed TCP connection and reusing its resources. This interval is the TIME_WAIT state mentioned in the previous blog (2MSL, twice times the maximum life cycle of the packet). If the system shows that a large number of connections are in the TIME_WAIT state, it can result in a severe decrease in concurrency and throughput, and by reducing the value of the key, the system is able to release closed connections more quickly, thereby providing more resources for new connections, especially for servers with high concurrent short connections.

The default value for this item is 240, which is to wait 4 minutes to release the resource, and the system supports a minimum value of 30, which means that the wait time is 30 seconds. Specific operation:

Browse to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters registry sub-key, Create or modify the REG_DWORD value named TcpTimedWaitDelay under the Parameters subkey, which ranges from 0 to 300, and it is recommended that the value be set to 30.

8 MaxUserPort

The value of MaxUserPort indicates the maximum port number that TCP/IP can allocate when an application requests an available port from the system. If the system shows an exception when a connection is established, it may be due to an insufficient number of anonymous (ephemeral) ports, especially if the system opens a large number of ports to connect to a Web service, database, or other remote resource.

The default value for this entry is 5000 for the decimal, which is the minimum allowable value for the system. Windows defaults to the number of port numbers reserved for anonymous (ephemeral) ports from 1024 to 5000. For higher concurrency, it is recommended to set this value to at least 32768 or even to a theoretical maximum of 65534, especially for clients that emulate a high-concurrency test environment. Specific operation:

Browse to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters registry sub-key, Create or modify a REG_DWORD value named MaxUserPort under the Parameters subkey, which ranges from 5000 to 65534 with a default value of 5000, which is recommended to be set to 65534.

9 . Dynamic Reserve

The value of the dynamic reserve allows the system to automatically adjust its configuration to accept large bursts of connection requests. If a large number of connection requests are received at the same time, the dynamic reserve automatically increases the number of pending connections supported by the system (that is, the number of waiting connections that the client has requested and the server has not yet processed), and the total number of TCP connections includes the number of connections and the number of waiting connections. This can reduce the number of connection failures. When the system has insufficient processing power and the number of pending connections that are supported, the client connection request is rejected directly.

By default, Windows does not enable dynamic storage, and can be turned on and set up with the following actions:

Browse to the Hkey_local_machine\system\currentcontrolset\services\afd\parameters registry subkey, and under the Parameters subkey, create or modify the REG_DWORD value for the following name.

    • EnableDynamicBacklog, a value of 1, indicates that dynamic reserve is turned on.
    • MinimumDynamicBacklog, with a value of 128, indicates that the number of supported minimum pending connections is 128.
    • MaximumDynamicBacklog, with a value of 2048, indicates that the maximum number of pending connections supported is 2048. For servers with high concurrent short connections, it is recommended that the maximum value be set to 1024 and above.
    • DynamicBacklogGrowthDelta, with a value of 128, indicates that the number of supported pending connections is incremented by 128, that is, when the quantity is low, from 128 until the set maximum is reached, such as 2048.

Ten KeepAliveTime

The value of the KeepAliveTime controls the frequency at which the system attempts to verify that the idle connection is still intact. If the connection is inactive for a period of time, the system sends a signal to remain connected and responds if the network is healthy and the receiver is active. Consider reducing this value if you need to be sensitive to the loss of the receiver, which means that you need to find out more quickly if the receiver is missing. If the number of idle connections that have been inactive for a long time is high, but there are fewer cases of losing the receiver, you may need to increase the value to reduce the overhead.

By default, if an idle connection has no activity within 7200000 milliseconds (2 hours), the system sends a message that remains connected. It is generally recommended to set this value to 1800000 milliseconds, so that lost connections are detected within 30 minutes. Specific operation:

Browse to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters registry sub-key, Under the Parameters subkey, create or modify the REG_DWORD value named KeepAliveTime to set the appropriate number of milliseconds for the value.

One KeepAliveInterval

The value of KeepAliveInterval indicates that the system repeats the frequency of "keep-connected" signals when the other party does not receive a response to the "keep-connected" signal. In the absence of any response, the connection is discarded when the number of consecutive "Keep-connected" signals exceeds the value of TcpMaxDataRetransmissions (described below). If the network environment is poor, allowing for longer response times, consider increasing the value to reduce overhead, or consider reducing the value or TcpMaxDataRetransmissions value if you need to verify that the receiver is missing as soon as possible.

By default, the system waits 1000 milliseconds (1 seconds) before resending a "keep-connected" signal without receiving a response, and can be modified to suit specific needs:

Browse to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters registry sub-key, Under the Parameters subkey, create or modify the REG_DWORD value named KeepAliveInterval to set the appropriate number of milliseconds for the value.

A TcpMaxDataRetransmissions

The value of TcpMaxDataRetransmissions indicates the number of times that the system has been re-sent on an existing connection to a data segment that has not been answered with the TCP data being re-sent. If the network environment is poor, you may need to increase the value to maintain effective communication, ensure that the receiver receives the data, or, if the network environment is good, or if the data is lost due to the loss of the receiver, you can reduce the time and cost of verifying that the receiver is lost.

By default, the system will resend the data segment that does not return an answer 5 times, can be modified according to the specific needs, specific actions:

Browse to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters registry sub-key, Under the Parameters subkey, create or modify the REG_DWORD value named TcpMaxDataRetransmissions, which ranges from 0 to 4294967295 with a default of 5, which is set according to the actual situation.

- tcpmaxconnectretransmisstions

The value of Tcpmaxconnectretransmisstions indicates the number of times the TCP connection was re-sent and the non-acknowledgment connection request (SYN) was re-sent before TCP exited. For each attempt, the retry timeout is twice times the success of the re-send. The default timeout in Windows Server 2003 is 2, and the default time-out is 3 seconds (in registry key TcpInitialRtt). The time-outs in slower WAN connections can be increased correspondingly, and different optimization settings may be available in different environments, and they need to be tested in real-world environments. Do not set the timeout time too large otherwise there will be no network connection timeout. Specific operation:

Browse to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters registry sub-key, Under the Parameters subkey, create or modify the REG_DWORD value named Tcpmaxconnectretransmisstions, which ranges from 0 to 255 with a default of 2, which is set according to the actual situation. The REG_DWORD value named TcpInitialRtt is then created or modified under the Parameters subkey, which is also set according to the actual situation.

- tcpackfrequency

The value of TcpAckFrequency indicates how often the system sends a reply message. If the value is 2, the system sends an answer after 2 fragments are received, or when 1 fragments are received but no other fragments are received within 200 milliseconds, and if the value is 3, the system sends an answer after 3 fragments are received. Or send an answer when 1 or 2 segments are received but no other fragments are received within 200 milliseconds, and so on. If you want to shorten the response time by eliminating the answer delay, we recommend that you set the value to 1. In this case, the system immediately sends an answer to each segment, and if the connection is primarily used to transfer large amounts of data, and a 200 millisecond delay is not important, then the value can be reduced to reduce the cost of the answer.

By default, the system sets the value to 2, which is to answer every other segment. The valid range of this value is 0 to 255, where 0 means using the default value of 2, can be modified according to the specific needs, specific actions:

Browse to Hkey_local_machine\system\currentcontrolset\services\tcpip\parameters\interfaces\xx (xx determined by the network adapter) registry subkey, Create or modify the REG_DWORD value named TcpAckFrequency under the XX subkey, the range is from 1 to 13, the default value is 2, set this value according to the number of fragments you want to send back, it is recommended that the gigabit network is set to 5, and the gigabit network is set to 13.

TCP parameter optimization under Windows system

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.