Linux kernel, TCP/IP, socket parameter tuning

Source: Internet
Author: User
Tags rfc cpu usage

/proc/sys/net Directory

All TCP/IP parameters are located in the/proc/sys/net directory (note that modifications to the contents of the/proc/sys/net directory are temporary and any modifications are lost after the system restarts), such as the following important parameters:

parameters (Path + files)


Default value

Optimized values


The default TCP data Receive window size (in bytes).




The largest TCP data Receive window (bytes).




The default TCP data Send window Size (in bytes).




The largest TCP data Send window (bytes).




The maximum number of packets that are allowed to be sent to a queue when each network interface receives a packet at a rate that is faster than the rate at which the kernel processes these packets.




Defines the length of the maximum listening queue for each port in the system, which is a global parameter.




Represents the size of the maximum buffer allowed for each socket.




Determine how the TCP stack should reflect memory usage, and each value is in a memory page (usually 4KB). The first value is the lower limit for memory usage, and the second value is the upper limit of the applied pressure that the memory pressure pattern begins to use for the buffer, and the third value is the upper limit for memory usage. At this level, messages can be discarded, thereby reducing the use of memory. For larger BDP, these values can be increased (note that the units are in memory pages and not bytes).

94011 125351 188022

131072 262144 524288


Defines the memory used by the socket for automatic tuning. The first value is the minimum number of bytes allocated for the socket receive buffer, the second value is the default value (the value is overwritten by Rmem_default), and the buffer can grow to this value if the system load is not heavy; the third value is the maximum number of bytes in the Receive buffer space (the value is Rmem_ Max overwrite).

4096 87380 4011232

8760 256960 4088000


Defines the memory used by the socket for automatic tuning. The first value is the minimum number of bytes allocated for the socket send buffer, the second value is the default value (the value is overwritten by Wmem_default), and the buffer can grow to this value if the system load is not heavy; the third value is the maximum number of bytes in the Send buffer space (this value is Wmem_ Max overwrite).

4096 16384 4011232

8760 256960 4088000


The interval of time (in seconds) that TCP sends keepalive probe messages to confirm that the TCP connection is valid.




When the probe message is not responding, the time interval (in seconds) for the message to be re-sent.




The maximum number of KeepAlive probe messages sent before the TCP connection is determined to fail.




Enable selective answer (1 for enable), improve performance by selectively answering packets received by a random order, let the sender send only the missing segment, (for WAN communication) This option should be enabled, but will increase CPU usage.




Enabling the forwarding answer enables selective response (SACK) to reduce congestion, which should also be enabled.




The TCP timestamp (which increases by 12 bytes in the TCP header) enables the calculation of RTT to be enabled in a more precise way with a specific gravity timeout (refer to RFC 1323), and this option should be enabled for better performance.




enabling window scaling defined by RFC 1323, to support TCP windows exceeding 64KB, must be enabled (1 for Enable), and TCP Windows will take effect only when both sides of the 1GB,TCP connection are enabled.




Indicates whether the TCP synchronization label (Syncookie) is turned on, the kernel must have the Config_syn_cookies key turned on to compile, and the synchronization label prevents a socket from overloading when there are too many attempts to connect.




Indicates whether to allow sockets (time-wait ports) in the time-wait state to be used for new TCP connections.




Time-wait sockets can be recycled more quickly.




For the socket disconnected on this side, TCP remains in the Fin-wait-2 state for the time (in seconds). The other person may be disconnected or have not ended the connection or the unpredictable process has died.




Represents the local port number that the TCP/UDP protocol allows to use

32768 61000

1024 65000


The maximum number that can be saved in the queue for connection requests that have not yet been confirmed by the other. If the server is overloaded frequently, try increasing this number.




This option should be disabled if the TCP/IP stack is allowed to accommodate low latency under high throughput conditions.



Enables the sender-side congestion control algorithm, which maintains the evaluation of throughput and attempts to optimize the overall utilization of bandwidth, which should be enabled for WAN traffic.



Enabling binary increase congestion for fast, long-distance networks allows for better use of links that operate at GB speed, which should be enabled for WAN traffic.


/etc/sysctl.conf file

/etc/sysctl.conf is an interface that allows you to change a running Linux system. It contains advanced options for the TCP/IP stack and virtual memory system, which can be used to control the Linux network configuration, because the/proc/sys/net directory content is temporary, it is recommended to add the TCPIP parameter modification to the/etc/sysctl.conf file, and then save the file, Use the command "/sbin/sysctl–p" to make it effective immediately. Specific modifications to the scheme are referred to above:

Net.core.rmem_default = 256960

Net.core.rmem_max = 513920

Net.core.wmem_default = 256960

Net.core.wmem_max = 513920

Net.core.netdev_max_backlog = 2000

Net.core.somaxconn = 2048

Net.core.optmem_max = 81920

Net.ipv4.tcp_mem = 131072 262144 524288

Net.ipv4.tcp_rmem = 8760 256960 4088000

Net.ipv4.tcp_wmem = 8760 256960 4088000

Net.ipv4.tcp_keepalive_time = 1800


Net.ipv4.tcp_keepalive_probes = 3

Net.ipv4.tcp_sack = 1

Net.ipv4.tcp_fack = 1

Net.ipv4.tcp_timestamps = 1

net.ipv4.tcp_window_scaling = 1

Net.ipv4.tcp_syncookies = 1

Net.ipv4.tcp_tw_reuse = 1

Net.ipv4.tcp_tw_recycle = 1

Net.ipv4.tcp_fin_timeout = 30

Net.ipv4.ip_local_port_range = 1024 65000

Net.ipv4.tcp_max_syn_backlog = 2048


There are two main interfaces to the tunable kernel variables: The sysctl command and the/proc file system, and all the process-independent information in Proc is ported to SYSFS. The sysctl parameter of the IPV4 protocol stack is mainly,, and the corresponding/proc file system is/proc/sys/net/ipv4 and/proc/sys/net/core. Only the kernel contains a specific property at compile time, and the parameter appears in the kernel.

For kernel parameters should be carefully adjusted, these parameters usually affect the overall performance of the system. The kernel initializes specific variables at startup based on the resource conditions of the system, which typically satisfies the usual performance requirements.

The application communicates with the remote host through the socket system call, and each socket has a read-write buffer. The read buffer holds the data sent by the remote host, and if the buffer is full, the data is discarded, the write buffer period holds the data to be sent to the remote host, and if the write buffer is slow, the system's application is blocked when writing the data. It is known that the buffer is of a size.

default size of socket buffer :
/proc/sys/net/core/rmem_default Correspondence Net.core.rmem_default
/proc/sys/net/core/wmem_default Correspondence Net.core.wmem_default
Above is the default read-write buffer size for each type of socket, but for a particular type of socket you can set a separate value to override the default value size. For example, the TCP type socket can be overwritten with/proc/sys/net/ipv4/tcp_rmem and Tcp_wmem.

Socket buffer Maximum:
/proc/sys/net/core/rmem_max Correspondence Net.core.rmem_max
/proc/sys/net/core/wmem_max Correspondence Net.core.wmem_max

/proc/sys/net/core/netdev_max_backlog Correspondence Net.core.netdev_max_backlog
This parameter defines the maximum number of messages in the input queue of the device when the rate at which the interface receives the packet is greater than the rate of the kernel processing packet.

/proc/sys/net/core/somaxconn Correspondence Net.core.somaxconn
The maximum accept queue backlog that can be specified by the Listen system call is discarded when the queued request connection is larger than the value.

/proc/sys/net/core/optmem_max Correspondence Net.core.optmem_max
The secondary buffer size for each socket.

Tcp/ipv4 Kernel Parameters:
The Socke protocol and address type are specified when the socket is created. The TCP socket buffer size is controlled by his own control rather than by the core kernel buffer.
/proc/sys/net/ipv4/tcp_rmem Correspondence Net.ipv4.tcp_rmem
/proc/sys/net/ipv4/tcp_wmem Correspondence Net.ipv4.tcp_wmem
The above is the TCP socket read/write buffer settings, each item has three values, the first value is the minimum buffer, the middle value is the default value of the buffer, the last is the maximum value of the buffer, although the value of the buffer is not limited by the value of the core buffer, However, the maximum value of the buffer is still limited to the maximum value of the core.

The kernel parameter also includes three values to define the scope of memory management, the first value means that when the page number is below this value, TCP does not consider him as memory pressure, the second value is the number of pages reached in the pressure area of the memory, the third value is all TCP The maximum number of page sockets is allowed to be used, after which the subsequent message is discarded. Page is the amount of memory that is globally allocated for the socket in the system, in pages.

The structure of the socket is as follows:

/proc/sys/net/ipv4/tcp_window_scaling Correspondence net.ipv4.tcp_window_scaling
Manages the window scaling characteristics of TCP because the length of the receive buffer declared in the TCP header is 26 bits, so the window cannot be larger than 64K, and if it is greater than 64K, the window scaling is turned on.

/proc/sys/net/ipv4/tcp_sack Correspondence Net.ipv4.tcp_sack
Manages the selective response of TCP, which allows the receiving end to transmit the missing serial number in the byte stream, reduces the number of segments that need to be re-transmitted when the segment is lost, and sack is useful when the segment is lost frequently.

/proc/sys/net/ipv4/tcp_dsack Correspondence Net.ipv4.tcp_dsack
is an improvement to the sack that detects unnecessary retransmission.

/proc/sys/net/ipv4/tcp_fack Correspondence Net.ipv4.tcp_fack
The sack protocol is perfected and the congestion control mechanism of TCP is improved.

Connection Management for TCP:
/proc/sys/net/ipv4/tcp_max_syn_backlog Correspondence Net.ipv4.tcp_max_syn_backlog
Each connection request (SYN message) needs to be queued until the local server receives the variable, which is the TCP syn queue length that controls each port. If the connection request is extra, the request is discarded.

/proc/sys/net/ipv4/tcp_syn_retries Correspondence Net.ipv4.tcp_syn_retries
The control kernel re-sends the corresponding number of times to an input syn/ack segment, and the low value can better detect the connection failure of the remote host. Can be modified to 3

/proc/sys/net/ipv4/tcp_retries1 Correspondence Net.ipv4.tcp_retries1
This variable sets how many retries are required before giving up the response to a TCP connection request.

/proc/sys/net/ipv4/tcp_retries2 Correspondence Net.ipv4.tcp_retries2
Control the number of times that the kernel re-sends data to a remote host that has established a connection, and the low value can detect a connection that is not valid to the remote host earlier, so the server can release the connection more quickly and can be modified to 5

Retention of TCP connections:
/proc/sys/net/ipv4/tcp_keepalive_time Correspondence Net.ipv4.tcp_keepalive_time
If the connection is always idle within the number of seconds specified by this parameter, the kernel initiates a probe to that host to the client

This parameter, in seconds, specifies the time interval for the kernel to send probe pointers to the remote host

/proc/sys/net/ipv4/tcp_keepalive_probes Correspondence Net.ipv4.tcp_keepalive_probes
This parameter specifies the number of probe pointers sent by the kernel in order to detect the survival of the remote host, and if the number of probe pointers is already in use, the client is still not responding, which is to conclude that the client is unreachable, close the connection to the client, and release the associated resources.

/proc/sys/net/ipv4/ip_local_port_range Correspondence Net.ipv4.ip_local_port_range
Specifies the range of local ports available for the TCP/UDP.

Recycling of TCP connections:
/proc/sys/net/ipv4/tcp_max_tw_buckets Correspondence Net.ipv4.tcp_max_tw_buckets
This parameter sets the number of time_wait for the system and is cleared immediately if the default value is exceeded.

/proc/sys/net/ipv4/tcp_tw_reuse Correspondence Net.ipv4.tcp_tw_reuse
This parameter sets Time_wait reuse, which allows the connection in time_wait to be used for a new TCP connection

/proc/sys/net/ipv4/tcp_tw_recycle Correspondence Net.ipv4.tcp_tw_recycle
This parameter sets a quick recycle of time_wait in a TCP connection.

/proc/sys/net/ipv4/tcp_fin_timeout Correspondence Net.ipv4.tcp_fin_timeout
Sets the wait time for Time_wait2 to enter closed.

The maximum number of routes allowed by the kernel.

Forwarding messages between interfaces

Maximum number of hops that a message can pass

Virtual Memory parameters:

    before Linux kernel 2.6.25 through Ulimit-n (Setrlimit ( Rlimit_nofile)) Set the maximum open file handle per process cannot exceed Nr_open (1024*1024), which is more than 100 W (unless the kernel is recompiled), and after 25, the kernel exports a SYS interface to modify the maximum value/proc/sys /fs/nr_open. The shell cannot be changed directly, because Pam has set the upper limit from limits.conf when logging in, the Ulimit command can only play within the range below the upper limit.

View the socket status in Linux:
Cat/proc/net/sockstat # (This is IPv4 's)

Sockets:used 137
Tcp:inuse Orphan 0 tw 3272 alloc mem 46
Udp:inuse 1 Mem 0
Raw:inuse 0
Frag:inuse 0 Memory 0
Sockets:used: Total amount of all protocol sockets used
Tcp:inuse: The number of TCP sockets that are being used (listening). Its value ≤NETSTAT–LNT | grep ^tcp | Wc–l
Tcp:orphan: Number of TCP connections with no primary (not part of any process) (useless, number of TCP sockets to be destroyed)
TCP:TW: Number of TCP connections waiting to be closed. Its value equals Netstat–ant | grep time_wait | Wc–l
Tcp:alloc (Allocated): The number of TCP sockets that have been allocated (established, requested to sk_buff). Its value equals Netstat–ant | grep ^tcp | Wc–l
TCP:MEM: Socket buffer usage (unknown). Measured in SCP, the speed at 4803.9kb/s: its value =11,netstat–ant the corresponding 22 port of the recv-q=0,send-q≈400)
Udp:inuse: Number of UDP sockets in use
FRAG: Number of IP segments used



Linux kernel, TCP/IP, socket parameter tuning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.