Linux kernel parameters in a detailed

Source: Internet
Author: User
Tags ack

Kernel parameters

Long-term updates

Syn_recv

Server received SYS, not yet issued Syn+ack
1.net.ipv4.tcp_synack_retries
The default value 5,linux corresponds to 1+2+4+. 32=2^6-1=63s

2.net.ipv4.tcp_syncookies
should be set to 1 to prevent SYN Flood.
A TCP connection in SYN_RECV is called a semi-connection and is stored in a SYN queue. A large number of SYN_RECV will cause the queue to overflow, and subsequent requests will be discarded directly by the kernel, the SYN flood attack.
After Syncookies is turned on, when the SYN queue is full, TCP creates a special sequence number (also called a cookie) through the original address port, destination address port and timestamp, and if the attacker does not respond, If it is a normal connection, send the Syncookie back, and the server can make a connection via a cookie (even if it is not in the SYN queue).
do not use tcp_syncookies to handle the normal heavy load connection , because Syncookies is a compromise version of the TCP protocol, not rigorous. Three TCP parameters can be adjusted for a normal request.
1.tcp_max_syn_backlog reducing the number of retries
2.tcp_max_syn_backlog increasing the number of SYN connections
3.tcp_abort_on_overflow processing but to deny connection

Close_wait

The passive closed party receives fin after sending an ACK in the close_wait state. This state is more often than not in time close (), the default will last 2 hours.

Time_wait

Actively close one side, the socket will wait for 2MSL time in time_wait.
1.net.ipv4.tcp_fin_timeout
The default value 60,tcp remains in the Fin_wait2 state for a period of time, and is directly closed after the timeout, so lowering the tcp_fin_timeout helps reduce the number of time_wait. Note: Although shutdown (SHUD_WR) is also in Fin_wait2 state, timeouts do not work.

2.net.ipv4.tcp_tw_recycle
The default value of 0 opens the fast time_wait socket collection.
If Tcp_timestamps is on, the latest timestamp for each connection is cached, and if the subsequent request timestamp is less than the cached timestamp, it is considered invalid and the corresponding packet is discarded. Therefore, if it is a NAT or load balancing environment, packet discards can occur.

3.net.ipv4.tcp_tw_resue
Default value 0, whether to reuse a time_wait state socket for a new connection
This option is more secure than tcp_tw_recycle, and from a protocol point of view, reuse is safe.
Re-use conditions for online search:

The 1.tcp_timestamps option must be turned on (the client must also be turned on)
2. The condition of reusing time_wait is that more than 1s is received after the last packet

4.net.ipv4.tcp_timestamps
The default value 1,tcp increases the timestamp, avoids the wrapping of the serial number, more accurate RTT calculations, and should enable this option for better performance.

5.net.ipv4.tcp_max_tw_buckets
The upper bound of the default value of 180000,time_wait.

TCP Optimizations

1.net.ipv4.tcp_moderate_rcvbuf
The default value of 1, whether to automatically adjust the TCP receive buffer size, when programmed to set the SO_SNDBUF,SO_RCVBUF will make the kernel does not automatically adjust these connections.
2. Net.ipv4.tcp_adv_win_scale
The default value of 2, will take out the 1/(2^tcp_adv_win_scale) cache to do the application read cache, then the largest accept sliding window can only reach the read cache of 3/4.

3.net.ipv4.tcp_rmem
4096 87380 6291456
The first is to accept the minimum byte of the buffer
The second is to accept the initial value of the buffer length used to initialize the sock sk_rcvbuf, replacing the Rmem_default
The third one is to accept the maximum buffer length to adjust the sk_rcvbuf of the sock

4.net.core.rmem_default
Default Accept Window size (bytes) for all protocols
Acceptable window (Rwnd) reasonable value depends on BDP (product of bandwidth and delay), assuming bandwidth 100Mbps, delay 100ms
bdp= (100MBPS/8) * (100/1000) = 1.25M
Due to the Tcp_adv_win_scale additional overhead, the reasonable value of the cache is:
bdp/(1-1/(2^tcp_adv_win_scale))

5.net.ipv4.tcp_window_scaling
The default value of 1, whether Windows scaling is enabled, to support windows that exceed 64KB, must be enabled.

6.net.ipv4.tcp_sack
The default value of 1, whether sack (selective ack) is turned on, reports the data fragments received, and resolves a problem where fast retransmission does not know whether subsequent data will be re-transmitted.

Congestion window

The Congestion window (CWnd) is an internal parameter of the sending side to avoid network congestion,
The cwnd,linux2.6 kernel is initially 3*mss by slow-start, in order to improve transmission efficiency, Linux3.0 later is 10*MSS, can be modified by the following command

[[Email protected]Ten-9- A-239~]# IP Route showDefault via10.9.0.1Dev eth010.9.0.0/ -Dev eth0 proto kernel scope link src10.9.22.239 172.17.0.0/ -Dev Docker0 proto kernel scope link src172.17.42.1[[Email protected]Ten-9- A-239~]# IP route change default via 10.9.0.1 dev eth0 initcwnd[[Email protected]Ten-9- A-239~]# IP Route showDefault via10.9.0.1Dev eth0 InitcwndTen10.9.0.0/ -Dev eth0 proto kernel scope link src10.9.22.239 172.17.0.0/ -Dev Docker0 proto kernel scope link src172.17.42.1
Out of socket memory

Reason

1. There are many orphans (orphan) sockets
2.TCP socket runs out of allocated memory
TCP sockets use page count, linux default 4096bytes

getconf PAGESIZE

1./proc/net/ipv4/tcp_mem
365664 487552 731328
Kernel does not intervene when less than 365664 page is used
When using more than 487552 page, kernel enters "memory pressure"
When TCP uses more than 731328, it reports: out of socket memory

2./proc/net/sockstat

3071300346620000

You can view the current socket memory status by comparing Tcp_mem.

orphan (orphan) socket

1.fin_wait1 and Last_ack State connection is orphan socket
2.fin_wait2 joins the TW status statistic instead of orphan socket
3.close_wait neither if orphan statistics nor to join TW statistics

Fs

1.fs.file-max
Maximum number of files that can be opened by the system
Fs.file-max represents the file handle that the system level can open, ulimit-n controls the number of user processes that can be opened
Set how much more appropriate, recommended: Memory/10k

‘{printf("%d",$2/10)}‘

2.FS.FILE-NR read-only files
1984 0 65535
The first represents the number of file handles currently allocated, and the third represents the maximum number of file handles allocated by the system (same as File-max)
FILE-NR is generally less than lsof | Wc-l, but the order of magnitude is consistent.

Vm

1.vm.dirty_background_ratio
Default Value 10
The Kdmflush daemon is responsible for synchronizing all file system-related pages to disk, and Kdmflush starts writing back when the number of dirty pages in memory exceeds 10%.

Kswapd

The KSWAPD daemon is responsible for ensuring that the memory remains free of free space, and it monitors the Pages_high and page_low standards in the kernel if the free memory space is less than PAGES_LOW,KSWAPD starts scanning and attempts to reclaim 32 pages at a time until the free memory is greater than pages_ High
KSWAPD perform the following actions:

1. If the page does not change, it puts the page in the free list
2. If the page changes and is written back by the file system, it writes the page to disk
2. If the page changes and is not written back by the file system, it writes the page to the swap device

Buff: The size of the physical memory buffer used for read () and write ()
Cache: The number of physical memory mapped to the process address space

1. When a large amount of data is read from disk into memory (BI), the cache value will continue to increase
SWAPD value in 2.vmstat is increasing, indicating that KSWAPD is writing dirty pages to swap space (so)
The decreasing value of 3.buff means that KSWAPD is constantly recovering memory

Reference articles
    1. Those things of TCP (http://coolshell.cn/articles/11564.html)
    2. Communication System Experience (http://maoyidao.iteye.com/blog/1744309)
    3. Http://www.quora.com/Whats-orphaned-sockets-and-how-can-I-prevent-them
    4. http://blog.csdn.net/russell_tao/article/details/18711023
    5. http://www.speedguide.net/articles/linux-tweaking-121
    6. http://huoding.com/2013/11/21/299

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Linux kernel parameters in a detailed

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.