Linux kernel TCP/IP parameter Analysis and tuning

Source: Internet
Author: User
Tags ack ad server

Reproduced in: http://www.itxuexiwang.com/a/liunxjishu/2016/0225/167.html?1456482565

As shown is the three stages of TCP. 1,TCP three handshake. 2,TCP data transfer. 3,tcp's four waves.

SYN: (Synchronous sequence number, Synchronize Sequence Numbers) This flag is valid only when the three handshake is established. Represents a new TCP connection request.

ACK: (acknowledgement number) is a confirmation flag for the TCP request, and the colleague prompts the peer system to have successfully connected all the data.

FIN (end flag, finish) is used to end a TCP session, but the corresponding port is still open and ready to accept new data.
The TCP state of the server side and client side of the 11 phases is resolved separately.

1), LISTEN: First, the server needs to open a socket for monitoring, the status of LISTEN. /* The socket is listening for incoming connections. Listening for connection requests from remote TCP ports */

2), Syn_sent: The client calls connect through the application to make active open. The client TCP sends a SYN to request a connection. The state is then set to Syn_sent. /*the Socket isactively Attempting toestablish a connection. Wait for a matching connection request after sending a connection request */

3), SYN_RECV: The server should issue an ACK to confirm the client's SYN, and send itself to the client a syn. The status is then set to SYN_RECV/* A connection request has been received fromthenetwork. Wait for confirmation of the connection request after receiving and sending a connection request */(this process is very short, it is difficult to see this state with Netstat)

4), established: represents an open connection, both can be made or have interacted with the data. /* The socket has anestablishedconnection. Represents an open connection, data can be sent to the user */

5), Fin_wait1: Active shutdown (active close) end application calls Close, and its TCP sends a FIN request to actively close the connection before entering the FIN_WAIT1 state./* The socket is closed, Andtheconnection is shutting down. Waiting for a connection interrupt request from a remote TCP, or confirmation of a previous connection interrupt request */(FIN_WAIT1 only appears on the active shutdown side, in fact the real meaning of fin_wait_1 and fin_wait_2 states is to wait for each other's FIN messages. The difference between the two states is: The fin_wait_1 state is actually when the socket in the established state, it would like to actively close the connection, send a FIN message to the other side, when the socket is entered into the fin_wait_1 state. And when the other party responds to the ACK message, then into the fin_wait_2 state, of course, under the actual normal circumstances, regardless of the circumstances of each other, should immediately respond to the ACK message, so fin_wait_1 state is generally more difficult to see, and Fin_wait_ 2 states can also sometimes be seen with netstat. )

6), close_wait: Passive shutdown (passive close) After TCP receives FIN, an ACK is issued in response to the FIN request (its receive is also passed as a file terminator to the upper-level application) and enters the close_wait. /* The remote end Hasshut down and waitingfor the socket to close. Waiting for a connection interrupt request from a local user */

7), Fin_wait2: Active closed end received ACK, entered the fin-wait-2./* Connection is closed, and the socket is waiting forashutdown from the remote E nd. Waiting for connection interrupt request from remote TCP */

8), Last_ack: After a period of passive shutdown, the application that receives the file terminator will call close to close the connection. This causes its TCP to also send a FIN, which waits for the ACK of the other side. It entered the last-ack./* The remote end has a shut down, andthe sockets is closed. Waiting foracknowledgement. Wait for confirmation of a connection interrupt request that was originally sent to remote TCP */#p # page Title #e#

9), Time_wait: After the active shut-off side receives the fin, TCP sends the ACK packet, and enters the time-wait state. /* The socket iswaiting after close Tohandle packets still in the network. Wait enough time to ensure that the remote TCP receives a connection interrupt request confirmation */(the main line is on the active shutdown, indicating that the Fin message, and send out the ACK message, and so on after 2MSL can return to closed usable state. )

10), CLOSING: relatively rare./* Both sockets Areshut down but Westill don ' thave all our data sent. Wait for the remote TCP acknowledgement of the connection interruption */

11), CLOSED: Passive closed end after receiving the ACK packet, it entered the state of CLOSED. The connection ends./* The socket is notbeing used. No connection Status */

The formation of the TIME_WAIT state occurs only on the side of the active shutdown connection.
When the active shut-off party receives the fin request of the passive shut-off party, it sends a successful ACK to the other side, changes its state from Fin_wait2 to time_wait, and must wait twice times more MSL (Maximum Segment Lifetime, The MSL is the time that a datagram can exist in internetwork) before both sides can change the status to closed to close the connection. The current time in Rhel is 60 seconds to maintain time_wait status.

Three-time handshake state Change for TCP:
1. Client:syn->server
The client sends a SYN to the server, at which point the state becomes syn_sent.
2. Server:syn + ack–>client
The server receives the SYN packet and sends an ACK to the client, at which point the server side state listen-> SYN_RECV
3. Client:ack-Server
The client receives the SYN and ACK of the server, at this point the server side state: LISTEN->SYN_RECV-Established
Client Side Status Syn_sent–>established

Kernel parameters involved during the first handshake:

Net.ipv4.tcp_syn_retries=5
· (The maximum number oftimes initial SYNs for an active TCP connection attempt would beretransmitted.    This value should is Higherthan 255. The DefaultValue is 5, which corresponds to Approximately180seconds.)

The second handshake involves the parameters:


In this process, the kernel has a queue parameter that accepts the SYN sent by the client and queues the SYN, and if the queue is full, the new request is not accepted and waits for the last ACK to be sent, provided there is enough memory. This parameter is:

Net.ipv4.tcp_max_syn_backlog
· (The maximum number of queued connectionrequests which has still not received an acknowledgement fromthe connecting CL  Ient.  If this number is exceeded, Thekernel would begin dropping requests. The default value of increased to 1024x768 when the memory present in the system is adequate or greater (>=  128MB), and reduced to-thosesystems with very low memory (<= 32Mb). It isrecommended that if the needs to being increased above 1024,tcp_synq_hsize in Include/net/tcp.h being modified to K Eeptcp_synq_hsize*16<=tcp_max_syn_backlog, and the kernel berecompiled.)
The default is 1024, the memory is large enough, and the high concurrent server recommendation is increased to Net.ipv4.tcp_max_syn_backlog = 16384.

Second, the second is Syn-ack retransmission, when the server to the client to send Syn+ack does not get the corresponding, the server will retransmit, control the parameters of this process is

Tcp_synack_retries
·  (The maximum number of times a syn/ack segment for apassive TCP connection would be retransmitted. Thisnumber should not being higher than 255.)
The default value is 5, the corresponding time is 180 seconds, it is recommended to modify the #p# page title #e#
Tcp_synack_retries = 1

Third, the SYN cookie is the TCP server side of the three handshake protocol to make some modifications, specifically to prevent SYN flood attack a means. The principle is that when the TCP server receives the TCP SYN packet and returns the Tcpsyn+ack packet, it does not allocate a dedicated data area, but calculates a cookie value based on the SYN packet. When the Tcpack packet is received, the TCP server checks the validity of the TCP ACK packet against that cookie value. If it is legal, then the dedicated data area is allocated for processing future TCP connections. The corresponding kernel parameters are:

Net.ipv4.tcp_syncookies = {0|1}
        (Enable tcp    syncookies.  The kernel must be    compiled  wit H config_syn_cookies.  send      out syncookies when  the      syn  BACKLOG&N Bsp queue    of a socket overflows.  the    syncookies featureattempts to protect a socket from a SYN flood    attack. this  should      be  used  as    a  last resort,  if    &nbsp ; at  all.  this      is a violation of the TCP protocol, andconflicts and other areas  &nbsp ; Of TCP such as TCP extensions. it    can cause problems for clients and relays.    It is  not  recommended  as a tunin G mechanism for heavilyloaded    servers-to-help with overloaded  or misconfigured  conditions.  for    recommended  alternatives  see    Tcp_max_syn_backlog,      Tcp_synack_retries, Andtcp_abort_on_overflow.)
·
Tcp_syncookies is used in conjunction with Tcp_max_syn_backlog to prevent SYN flood attacks.


The kernel parameters involved in the process of transferring data in the middle:

Net.ipv4.tcp_keepalive_intvl=15
Net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_time=120

These three parameters are if the server side and client side has not transmitted data, after 120 seconds, the first probe, 15 seconds after the second probe, until the detection 3 times to abandon the connection.
Four waves of state change:
Client (initiating shutdown):
1.client:fin (M)->server
Client sends a FIN to the server, requests shutdown, client by established-fin_wait1

2.server:ack->client
Server sends ACK acknowledgement after receiving FIN, server has established->close_wait
The client receives an ACK from the server, and the fin_wait1->fin_wait2 continues to wait for the server to send data

3.server:fin (N)->client
Server-side status changed to established->close_wait->last_ack

4.client:ack (n+1)->server
The client receives FIN, and the status is timed out by ESTABLISHED-&GT;FIN_WAIT1-&GT;FIN_WAIT2-&GT;TIME_WAIT[2MSL]->closed
The server side becomes established->close_wait->last_ack->closed.

The above refers to a noun, 2MSL (Maximum Segment Lifetime)
· The TIME_WAIT state isalso called the 2MSL WAIT state.
·    Every implementation mustchoose a value for the maximum segment Lifetime (MSL). It is the maximum amount of time any segment can exist in the network before being discarded.
· RFC793 Specifies the Mslas 2 minutes. Common implementation values, however, are 30seconds, 1 minute, or 2 minutes. Recall that the limit on lifetime of the IP datagram are based on the number of hops, not a timer.
· Given an MSL for animplementation, the rule Is:when TCP performs a active close, and sends the final ACK, that con Nection must stay in the TIME_WAIT state for twice the MSL. #p # pagination Title #e#
· This lets TCP resend Thefinal ack with case this ACK was lost (in which case, the other Endwill time out and retransmit Its final FIN).
· An effect of this 2MSLwait is and the TCP connection is in the 2MSL wait, thesocket pair defining that Connec tion cannot be reused.
· Any delayed segments thatarrive for a connection while it's in the 2MSL wait is discarded. Since the connection defined by the socket pair in the 2MSL wait cannot is reused, when we do establish a valid connect Ion we know that delayed segments from a earlier incarnation of Thisconnection cannot be misinterpreted as being PA RT of the Newconnection.
· The client, who performsthe active close, enters the 2MSL wait.    The server does not. Thismeans If we terminate a client, and restart the client immediately, the new client cannot reuse the same local port Number.
· Servers, however, Usewell-known ports. If we terminate a server that have a connectionestablished, and immediately try to restart the server, the server can Not assign it well-known port number to its end point.

The simple point of understanding is that the initiative to send the fin at the end of the last send ACK to the server after a certain time elapsed. The purpose of the time_wait (also 2MSL) state is to prevent the final client from losing an ACK, so that the server is in the Last_ack timeout to re-send fin. Configuring server parameters for 2MSL of time, what we need is a time_wait connection that can be reused and can be shut down quickly.

The parameters for controlling the rapid recovery and reuse are:

Net.ipv4.tcp_tw_reuse=1
Net.ipv4.tcp_tw_recycle=1
Note If the Lvs-nat server is not recommended, turn on the above parameters.
If you find that the server has a large number of time_wait connections, you can reduce the Tcp_fin_timeout parameter (default 60), if this problem occurs, it is usually accompanied by the local port is occupied, but also need to expand the port range:

Net.ipv4.tcp_fin_timeout=20
· How many seconds towait fora final FIN packet before the socket is forcibly closed. This is strictly a violation of the TCP specification and required to prevent denial-of-service (DoS) attacks. The default value in2.4 kernels is and down from the in2.2.
·
net.ipv4.ip_local_port_range=1024 65534

and the maximum value of time_wait:

net.ipv4.tcp_max_tw_buckets=20000
·  The maximum number ofsockets in Time_wait state allowed in the system.  This limit exists only to prevent simple denial-of-service attacks.  The default value of Nr_file*2 is adjusted depending on the memory in the system. If This number isexceeded, the socket was closed and a warning is printed.
A time_wait that exceeds this value is shut down.

TCP buffering Parameters
Net.ipv4.tcp_mem= ' 873800 8388608 8388608 '

Defines the memory space used by the TCP stack, with the minimum, default, and maximum values, respectively;

· Low: TCP does not consider freeing memory when TCP uses a number of memory pages that are below this value. That is, there is no memory pressure below this value. (Ideally, this value should match the 2nd value assigned to TCP_WMEM-this 2nd value indicates that the maximum page size is multiplied by the maximum number of concurrent requests divided by the page size (131072 * 300/4096).) )
· Pressure: When TCP uses more memory pages than this value, TCP attempts to stabilize its memory usage, enters pressure mode, and exits the pressure state when memory consumption falls below the low value. (Ideally this value should be the maximum amount of total buffer size that TCP can use (204800 * 300/4096).) ) #p # page Title #e#
· High: Allows all tcpsockets to queue buffer datagrams for the amount of pages. (If this value is exceeded, the TCP connection will be rejected, which is why you should not make it too conservative (512000 * 300/4096).) In this case, the value provided is very large, it can handle many connections, is expected 2.5 times times, or so that the existing connection can transmit 2.5 times times the data. )
· In general, these values are calculated based on the amount of system memory at system startup.

net.ipv4.tcp_rmem= ' 4096 87380 8388608 '
Defines the TCP protocol stack used to receive buffered memory space;
The first value is the minimum value, even if the current host memory space is tight, it is necessary to ensure that the TCP protocol stack at least this size of space available;
The second value is the default value, which overrides the size of the receive buffer defined for all protocols in Net.core.rmem_default;
The third value is the maximum value, which is the maximum memory space that can be used for TCP receive buffering;

net.ipv4.tcp_wmem= ' 4096 65536 8388608 '

Defines the TCP stack used to send buffered memory space;

Some of the other parameters
net.ipv4.tcp_max_orphans=262144
·  The maximum number oforphaned (not attached to any user file handle) is TCP sockets allowed in the system.  When the this number is exceeded, theorphaned connection are reset and a warning is printed.    This limitexists-prevent simple denial-of-service attacks.  Lowering this limit was not recommended. Network conditionsmight require you to increase the number of orphans allowed, butnote so each orphan can eat up t      o ~64k of Unswappablememory.  The default initial value is set equal to Thekernel parameter nr_file. This initial default was adjusted depending on the memory in the system.
The maximum number of tcpsockets that the system can handle that does not belong to any process. If this amount is exceeded, then the connection that is not part of any process is immediately reset and a warning message is displayed. The reason to set this limit is simply to resist those simple DoS attacks, and do not rely on this or artificially reduce the limit. This value should be increased if the memory is large.
The maximum number of TCP sockets in the system are not associated with any one of the user file handles, and if this number is exceeded, the orphan connection is immediately reset and a warning message is printed;
This limit is only to prevent a simple Dos attack, can not rely too much on it or artificially reduce the value, if need to modify, in order to ensure that enough memory available, should increase this value;
The greater the #这个数值越大越好, the stronger the attack resistance

Before the company encountered a incident, related to the Ad Server backend server parameters, when encountered network drops, TCP table is occupied, the corresponding parameters adjusted (default is 65536):

net.ipv4.ip_conntrack_max= 196608
net.ipv4.netfilter.ip_conntrack_max= 196608


The parameters listed here are the usual parameters of the old boys ' teacher production:
Net.ipv4.tcp_syn_retries = 1
Net.ipv4.tcp_synack_retries = 1
Net.ipv4.tcp_keepalive_time = 600
Net.ipv4.tcp_keepalive_probes = 3
NET.IPV4.TCP_KEEPALIVE_INTVL =15
Net.ipv4.tcp_retries2 = 5
Net.ipv4.tcp_fin_timeout = 2
Net.ipv4.tcp_max_tw_buckets = 36000
Net.ipv4.tcp_tw_recycle = 1
Net.ipv4.tcp_tw_reuse = 1
Net.ipv4.tcp_max_orphans = 32768
Net.ipv4.tcp_syncookies = 1
Net.ipv4.tcp_max_syn_backlog = 16384
Net.ipv4.tcp_wmem = 8192 131072 16777216
Net.ipv4.tcp_rmem = 32768 131072 16777216
Net.ipv4.tcp_mem = 786432 1048576 1572864
Net.ipv4.ip_local_port_range = 1024 65000
Net.ipv4.ip_conntrack_max = 65536
net.ipv4.netfilter.ip_conntrack_max=65536
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established=180#p# Pagination Title #e#
Net.core.somaxconn = 16384
Net.core.netdev_max_backlog = 16384

The optimization of the kernel parameters is to see the specific application scenarios and hardware parameters of the business to do dynamic adjustment, listed here is only the usual optimization parameters, according to the parameters of each definition, understanding, and then according to their production environment.

Linux kernel TCP/IP parameter Analysis and tuning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.