Linux kernel parameter diagram

Source: Internet
Author: User
Tags ack semaphore unpack dmesg

http://blog.csdn.net/maimang1001/article/details/34941471

https://www.suse.com/documentation/sles11/book_sle_tuning/data/sec_tuning_network_buffers.html

Http://dirtysalt.info/os.html

Linuxtable of Contents
    • 1. Vmlinuz
    • 2. TCP IO
      • 2.1. Create connection
      • 2.2. Packet Reception
      • 2.3. Packet transmission
      • 2.4. Congestion control
    • 3. Kernel Panic
    • 4. Disk IO
    • 5. Program return value issues
    • 6. DP8 NIC Problem
    • 7. Modify Resource Limits
    • 8. CPU temperature is too high
    • 9. Sync Hangup
    • 10. Replace the GLIBC
    • 11. Allow sudo to not execute on TTY
    • SSH Proxy
    • 13. Modify the maximum number of open file handles
    • Apt-get Hang
    • 15. Use the optional lock to achieve the semaphore
1 Vmlinuz

The Vmlinuz is a bootable, compressed kernel. "VM" stands for "Virtual Memory". Linux supports virtual memory, unlike older operating systems such as DOS, which have 640KB memory limitations. Linux is able to use hard disk space as virtual memory, hence the name "VM". Vmlinuz is an executable Linux kernel, which is located in/boot/vmlinuz, which is generally a soft link. Vmlinux is an uncompressed kernel, Vmlinuz is a compressed file of Vmlinux.

There are two ways to build a vmlinuz. The first is to compile the kernel through "make zimage" created, and then through: "Cp/usr/src/linux-2.4/arch/i386/linux/boot/zimage/boot/vmlinuz" generated. Zimage is suitable for small cores, it exists for backwards compatibility. Second, the kernel compiles by command make Bzimage, and then through: "Cp/usr/src/linux-2.4/arch/i386/linux/boot/bzimage/boot/vmlinuz" generated. Bzimage is a compressed kernel image, it should be noted that bzimage is not compressed with bzip2, the BZ in Bzimage is prone to misunderstanding, BZ said "Big Zimage". B in Bzimage is the meaning of "big".

Zimage (Vmlinuz) and Bzimage (Vmlinuz) are compressed with gzip. Not only are they a compressed file, they are also embedded with the Gzip decompression code at the beginning of these two files. So you can't unpack vmlinuz with Gunzip or GZIP–DC. The kernel file contains a miniature gzip to decompress the kernel and boot it. The difference is that the old Zimage unzip the kernel to low-end memory (the first 640K), bzimage the kernel to high-end memory (1M or more). If the kernel is small, you can use one of the zimage or Bzimage, and the two modes of booting the system run the same. The large kernel uses bzimage and cannot use Zimage.

2 TCP IO

Http://www.ece.virginia.edu/cheetah/documents/papers/TCPlinux.pdf

2.1 Create Connection

The following is a summary of the relevant parameters of the Linux TCP queue:

Simply look at the connection setup process, the client sends the SYN packet to the server, the server replies Syn+ack, and the connection to the SYN_RECV state is saved to the semi-connection queue. The client returns an ACK packet to complete the three handshake, and the server moves the connection to the established State into the accept queue, waiting for the app to call accept (). You can see that establishing a connection involves two queues:

    • A half-connection queue that holds a connection to the SYN_RECV state. The queue length is set by Net.ipv4.tcp_max_syn_backlog.
    • The Accept queue, which holds the connection to the established state. The queue Length is min (net.core.somaxconn, backlog). # Listen (SOCKFD, backlog)

In addition, in order to respond to the SYN flooding (that is, the client only sends the SYN packet to initiate the handshake without responding to the ACK completion of the connection establishment, filling the server side of the half-connection queue so that it cannot handle the normal handshake request), Linux implements a mechanism called the SYN cookie. With Net.ipv4.tcp_syncookies control, set to 1 to open. Simply put, the SYN cookie is the connection information encoded in the ISN (initial sequence number) returned to the client, then the server does not need to save a semi-connection in the queue, but instead of the client subsequently sent ACK back to the ISN to restore the connection information, To complete the connection, avoid the half-connection queue being attacked by the SYN packet filling up. For the lost client handshake, ignore it.

2.2 Packet Reception

The whole process is as follows:

  • Linux uses SK_BUFF data structures to describe packet.
  • NIC detects packet arrival, allocates SK_BUFF data structure from kernel Memory (sk_buffs), calls DMA engine to put packet into Sk_buff data structure #note: NIC Detection has packet arrival and has packet sent, is not triggered but active poll way to complete
  • Will Sk_buff and join rx_ring this ring_buffer inside. If the rx_ring is full, then packet will be discarded.
  • When the DMA engine finishes processing, the NIC processes the interrupt notification kernel to the CPU .
  • The kernel passes this packet to the IP layer for processing. The IP layer needs to assemble the information into an IP packet. If the IP packet is TCP, then put it in the socket backlog. If the socket backlog is full, then the IP packet is discarded. copy packet data to IP buffer to form IP packet #note: Once this step is complete, the IP layer can release the SK_BUFFER structure
  • TCP layer removes TCP packet from the socket backlog, copy IP packet tcp recv buffer to form TCP packet
  • TCP recv buffer is given to application layer processing, copy TCP recv buffer to app buffer to form app packet
  • Where the kernel parameters are
    • /PROC/SYS/NET/IPV4/TCP_RMEM # TCP recv buffer size
    • /proc/sys/net/core/netdev_max_backlog # The size of the socket backlog, separated from the backlog area called by the accept system.

The following are extracted from the article

    • Summary of parameters related to Linux TCP queue
    • Https://www.suse.com/documentation/sles11/book_sle_tuning/data/sec_tuning_network_buffers.html

Linux has added recv buffer Auto-tuning mechanism after 2.6.17, the actual size of recv buffer will automatically float between the minimum and maximum values, in order to find the balance point of performance and resources, so it is not recommended to manually set the recv buffer to a fixed value in most cases.

When Net.ipv4.tcp_moderate_rcvbuf is set to 1 o'clock, the automatic throttling mechanism takes effect, and the recv buffer for each TCP connection is specified by the following 3-tuple array: Net.ipv4.tcp_rmem = (min, default, max). Initially recv buffer is set to <default>, and the default value overrides the Net.core.rmem_default setting. The recv buffer is then dynamically adjusted according to the actual situation between the maximum and minimum values. In the case where the buffered dynamic tuning mechanism is turned on, we set the maximum value of the Net.ipv4.tcp_rmem to BDP (Bandwidth-delay Product).

When Net.ipv4.tcp_moderate_rcvbuf is set to 0, or the socket option is set SO_RCVBUF, the buffered dynamic throttling mechanism is turned off. The default value of recv buffer is set by Net.core.rmem_default, but if Net.ipv4.tcp_rmem is set, the default value is overwritten. The maximum value for recv buffer can be set by system call setsockopt () to Net.core.rmem_max. It is recommended that the default value of the buffer be set to the BDP when the buffering dynamic adjustment mechanism is turned off.

The auto-tuning mechanism of the send-side buffering has been implemented very early, and is unconditionally turned on without parameters to set. If TCP_WMEM is specified, Net.core.wmem_default is overwritten by Tcp_wmem. The send buffer automatically adjusts between the minimum and maximum values of the Tcp_wmem. If the call to SetSockOpt () sets the socket option SO_SNDBUF, the auto-throttling mechanism of the send-side buffering is turned off, Tcp_wmem is ignored, and the SO_SNDBUF maximum value is limited by Net.core.wmem_max.

2.3 Packet Transmission

The whole process is as follows:

    • The application layer will copy the data into the TCP send buffer, which will block if there is not enough space. copy app buffer to TCP send buffer as app packet
    • TCP layer waits for TCP send buffer to exist data or is required to make an ACK when the assembly IP packet is pushed to the IP layer copy TCP send buffer to IP send buffer as TCP packet
    • The IP layer requests sk_buffer from kernel memory, wraps the IP data into packet data, and then plugs into the Qdisc (Txqueuelen control Queue Length) inside (pointer). If the queue is full then there will be blocking and feedback to the TCP layer block. copy IP send buffer to packet data as IP packet
    • NIC driver if Qdisc has data detected, call NIC DMA engine to send packet. The NIC interrupts the CPU to release packet data memory to kernel memories after the send is complete
    • Among the kernel parameters are:
      • /proc/sys/net/ipv4/tcp_wmem, this is very similar to Rmem.
      • #note: With the above analogy, the relevant parameters are Net.core.wmem_default and Net.core.wmem_max.

#note: With the help of Wangyx, the Qdisc queue Length parameter Txqueuelen This configuration was found under Ifconfig. Txqueuelen = 1000.

?  ~  ifconfigeth0      Link encap:ethernet  HWaddr 12:31:40:00:49:d1          inet  addr:10.170.78.31 bcast : 10.170.79.255  mask:255.255.254.0          inet6 addr:fe80::1031:40ff:fe00:49d1/64 scope:link up          Broadcast RUNNING multicast  mtu:1500  metric:1          RX packets:13028359 errors:0 dropped:0 overruns:0 frame:0          TX packets:9504902 errors:0 dropped:0 overruns:0 carrier:0          collisions:0 txqueuelen:1000          RX bytes:2464083770 ( 2.4 GB)  TX bytes:20165782073 (20.1 GB)          interrupt:25

The following excerpt: A summary of Linux TCP queue-related parameters

The Qdisc (Queueing discipline) is located between the IP layer and the ring buffer of the network card. We already know that ring buffer is a simple FIFO queue that keeps the drive layer of the NIC simple and fast. The QDISC implements advanced traffic management features, including traffic classification, prioritization, and traffic shaping (rate-shaping). You can use the TC command to configure Qdisc.

The queue length of the Qdisc is set by Txqueuelen, and the queue length of the receiving packet differs from the kernel parameter Net.core.netdev_max_backlog control, and the Txqueuelen is associated with the NIC

2.4 Congestion Control

    • The initial state is slow start
    • CWnd (congestion window) congestion windows that represent the maximum number of packets sent at a time.
    • Ssthresh (slow start threshold) slow start threshold.
    • MSS (maximum segment size) is the largest section size and is related to the MTU of the transport network.
    • Why is a multi-TCP connection chunked download faster than a single-connection download?
3 Kernel Panic

4 Disk IO
    • Linux IO Stack Diagram
    • Linux Storage Stack Diagram
5 Program return value issues

First look at the following Java program

/* Coding:utf-8 * Copyright (C) DIRLT */public class x{public  static void Main (string[] args) {    system.exit (1); 
   }}

This Java program is then called by Python to determine the print value

#!/usr/bin/env python#coding:utf-8#copyright (C) dirltimport osprint os.system (' Java X ')

The return value is not 1 but 256, and the explanation is this

A 16-bit number, whose low byte are the signal number that killed the process, and whose high byte are the exit status (if T He signal number is zero); The high bit of the low byte is set if a core file was produced.

But the following Python program uses echo $? To determine a return value of 0 instead of 256

#!/usr/bin/env python#coding:utf-8#copyright (C) dirltcode=256exit (code)
6 DP8 Nic problem

At that time, DP8 network traffic from a very large value to a very small value, check the/proc/net/netstat, the following statistical values DP8 and other machine gap is large (1-2 of the magnitude difference):

    • Tcpdirectcopyfromprequeue
    • Tcphphitstouser
    • Tcpdsackundo
    • Tcplossundo
    • Tcplostretransmit
    • Tcpfastretrans
    • Tcpslowstartretrans
    • Tcpsackshiftfallback

After that, the following clues are found on DMESG:

[Email protected]:~$ DMESG | grep eth0[15.635160] eth0:broadcom netxtreme II BCM5716 1000base-t (C0) PCI Express f[15.736389] bnx2:eth0:using msix [15.738263] Addrconf (NETDEV_UP): Eth0:link is not ready[37.848755] bnx2:eth0 NIC Copper link was up, Mbps full DUPL ex[37.850623] addrconf (netdev_change): Eth0:link becomes ready[1933.934668] bnx2:eth0:using msix[1933.936960] ADDRCO NF (NETDEV_UP): Eth0:link is not ready[1956.130773] bnx2:eth0 NIC Copper link was up, at Mbps full duplex[1956.132625]  Addrconf (netdev_change): Eth0:link becomes ready[4804526.542976] bnx2:eth0 NIC Copper link is down[4804552.008858] bnx2: Eth0 NIC Copper Link is up, at Mbps full duplex

The log [4804552.008858] bnx2:eth0 NIC Copper Link is up, and the full duplex of Mbps indicates that the speed of the NIC on the DP8 is recognized as up to Mbps.

The possible causes are as follows:

    • Network cable, Crystal Head quality is too poor or aging, crystal Head did not press well, resulting in poor or short-circuit cable contact, you can re-pressure crystal head or replace the network cable, recommended the quality of reliable six types of cable six type crystal Head
    • Local connection-Right-property-configuration-advanced-speed and duplex, where error is set, instead of auto-sensing or 1000Mbps full-duplex
    • A hardware device such as a switch or router that is connected to a network card fails, or the device is hundred trillion (thousand and hundred connected, thousands of hundred backwards compatible)
    • Electromagnetic field interference can also become hundred trillion, so that network cable as far as possible do not wear tube with the Wire (Forum members tchack Friendship)

Our network cable are provided by the World XX Union , the quality should be good, there are two cases need priority to exclude.

    • Network cable Problem (test method: Change the root network cable to try)
    • The port of the switch DP8 connection is broken (test method: Switch the DP8 's network cable to the port of an exchange)
7 Modifying resource limits

Temporary changes can be modified by ulimit, or you can modify the file/etc/security/limits.conf to permanently modify

Hadoop-nofile 102400hadoop-nproc 40960
8 CPU temperature is too high

This is the problem I encountered on the Ubuntu pc, the obvious feeling is that the running speed is slow. Then the following log appears in the syslog:

May  2 18:24:21 umeng-ubuntu-pc kernel: [1188.717609] Cpu1:core temperature/speed normalmay  2 18:24:21 Umeng-ubu NTU-PC kernel: [1188.717612] cpu0:package temperature above threshold, CPU clock throttled (total events = 137902) May
   2 18:24:21 umeng-ubuntu-pc kernel: [1188.717615] cpu2:package temperature above threshold, CPU clock throttled (total Events = 137902) May  2 18:24:21 umeng-ubuntu-pc kernel: [1188.717619] cpu1:package temperature above threshold, CPU C Lock throttled (Total events = 137902) May  2 18:24:21 umeng-ubuntu-pc kernel: [1188.717622] Cpu3:package temperature Above threshold, CPU clock throttled (total events = 137902)
9 Sync hangup
    • Kill-kill fails to KILL process:http://lists.freebsd.org/pipermail/freebsd-questions/2008-september/182821.html
    • Linux-kernel Archive:Bug:sync ' s hangup forever in call_rwsem_down_read_failed:http://lkml.indiana.edu/hypermail/ Linux/kernel/1011.2/04099.html
10 replacing glibc
    • Linux-how to recover after deleting the symbolic link libc.so.6? -Stack overflow:http://stackoverflow.com/questions/12249547/how-to-recover-after-deleting-the-symbolic-link-libc-so-6

@2013-05-23 Https://docs.google.com/a/umeng.com/document/d/12dzJ3OhVlrEax3yIdz0k08F8tM8DDQva1wdrD3K49PI/edit There is a problem with the glibc version, but there is a problem with the DP45 operation.

My sequence of operations plan is this:

    1. Copy the Dp20 glibc to your own directory/home/dp/dirlt/libc-2.11.so
    2. GLIBC backup will be dp45. Mv/lib64/libc-2.12.so/lib64/libc-2.12.bak.so (add that there is a soft link under lib64, libc.so.6-libc-2.12.so, this file should be used by the program to find)
    3. Cp/home/dp/dirlt/libc-2.11.so/lib64/libc-2.12.so

However, after 2, the CP is not available, and LS and other commands can not be used. The reason is very simple, because libc.so.6 does not have a corresponding file after 2, and cp,ls These basic commands depend on this dynamic link library.

~ $ ldd/bin/cplinux-vdso.so.1 =  (0x00007fff9717f000) libselinux.so.1 =/lib/x86_64-linux-gnu/ Libselinux.so.1 (0x00007f5efb804000) librt.so.1 =/lib/x86_64-linux-gnu/librt.so.1 (0x00007f5efb5fc000) Libacl.so.1 =/lib/x86_64-linux-gnu/libacl.so.1 (0x00007f5efb3f3000) libattr.so.1 =/lib/x86_64-linux-gnu/ Libattr.so.1 (0x00007f5efb1ee000) libc.so.6 =/lib/x86_64-linux-gnu/libc.so.6 (0x00007f5efae2f000) libdl.so.2 = >/lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5efac2a000)/lib64/ld-linux-x86-64.so.2 (0x00007f5efba2d000) libpthread.so.0 =/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5efaa0d000)

@2013-08-03

A copy of the C library is found in an unexpected directory | blog:http://blog.i-al.net/2013/03/a-copy-of-the-c-library-was-found-in-an-unexpected-directory/

The link above gives a way to upgrade the GLIBC

    • sudo su-root # First switch to the root account below
    • MV Libc.so Librt.so/root # to move the glibc and other related so to the root account, the main do not move the soft connection file.
    • Ld_preload=/root/libc.so:/root/librt.so Bash # This time if bash is not found glibc etc so, you need to use Ld_preload to preload
    • Apt-get Install # Installs and upgrades glibc under this bash using Apt-get.
11 allow sudo to not execute on TTY

Modify the/etc/sudoers file, comment out

Defaults Requiretty
SSH proxy

Http://serverfault.com/questions/37629/how-do-i-do-multihop-scp-transfers

    • The destination machine is D, the port is 16021, the user is X
    • The springboard machine is T, the port is 18021, the user is Y
    • Client needs to establish a trust relationship with [email protected] and [email protected]
    • Method A
      • The link is established from T and D and the forwarding port p is configured, and all data interactions with T:P will be forwarded to d:16021
      • Execute Ssh-l "*:5502:d:16021" on t [email protected] # forwarding port is 5502
        • -O serveraliveinterval=60 # I just thought the unit should be s. In this way, every 60s can do some keepalive communication with the server, ensure that there is no data communication for a long time, the connection will not be disconnected.
      • Ssh-p 5502 [email protected] or scp-p 5502 <file> [email protected]:<path-at-d>
    • Method B
      • SCP can specify Proxycommand mate D on NC command completion
      • Scp-o proxycommand= "Ssh-p 18021 [email protected] ' NC D 16021 '" <file> [email protected]:<path-at-d>
13 Modifying the maximum open file handle number
    • http://blog.csdn.net/superchanon/article/details/13303705
    • Http://unix.stackexchange.com/questions/127777/how-to-configure-the-process-open-file-limit-of-a-user
    • Https://www.kernel.org/doc/Documentation/sysctl/fs.txt

First, you need to modify the system limit

    • /proc/sys/fs/file-max # Maximum number of open file handles for all processes
    • /proc/sys/fs/nr_open # Maximum number of open file handles for a single process
    • /PROC/SYS/FS/FILE-NR # System currently open file handle number

Then modify user (process) usage caps

    • /etc/security/limits.conf
    • Ulimit
Apt-get Hang

When using Ubuntu Apt-get, there may be some abnormal conditions, we directly terminate the apt-get. But this time apt-get the software itself out of an abnormal state, resulting in not being able to start apt-get. If you look at the process, there are some suspicious processes.

[email protected]:~$ ps aux | grep "apt" root      3587  0.0  0.0  36148 22800?        Ds   Oct08   0:00/usr/bin/dpkg--status-fd--unpack--auto-deconfigure/var/cache/apt/archives/sgml-data_ 2.0.4_all.debroot      9579  0.0  0.0  35992 22744?        Ds   Oct19   0:00/usr/bin/dpkg--status-fd--unpack--auto-deconfigure/var/cache/apt/archives/iftop_ 0.17-16_amd64.debroot     25957  0.0  0.0  36120 22796?        Ds   Nov05   0:00/usr/bin/dpkg--status-fd--unpack--auto-deconfigure/var/cache/apt/archives/iftop_ 0.17-16_AMD64.DEB/VAR/CACHE/APT/ARCHIVES/IOTOP_0.4-1_ALL.DEBDP       30586  0.0  0.0   7628  1020 PTS/2    s+   08:59   0:00 grep--color=auto Apt

The parent process of these processes is the INIT process, and the state is uninterruptible sleep, there is no way to terminate the kill-9, the only way to reboot machine to solve this problem. On this question can see stackoverflow above answer how to stop ' uninterruptible ' process on Linux? -Stack Overflow Http://stackoverflow.com/questions/767551/how-to-stop-uninterruptible-process-on-linux

    • Simple answer:you cannot. Longer answer:the uninterruptable Sleep means the process is not being woken up by signals. It can be is only woken to what it's waiting for. When I get such situations eg. With CD-ROM, I usually reset the computer by using Suspend-to-disk and resuming.
    • The D state basically means that the process was waiting for disk I/O, or other block I/O that can ' t be interrupted. Sometimes this means the kernel or device was feverishly trying to read a bad block (especially from an optical disk). Sometimes it means there ' s something else. The process cannot is killed until it gets out of the the D state. Find out what it's waiting for and fix. The easy-on-the-reboot. Sometimes removing the disk in question helps, but so can be rather dangerous:unfixable catastrophic hardware failure I f you don ' t know, what do you ' re doing (Read:smoke coming out).
15 using an optional lock to achieve the semaphore
typedefstructsema{lock_tLockIntCountQueueQ;}sema_t;voidInit_sema (sema_t*SemaIntINIT_CNT) {init_lock (&sema->lock); Init_queue (&sema->q); sema->count=init_cnt;}voidp (sema_t* sema) {lock ( &sema->lock); sema->count--; if (Sema->count < 0) {Q.push (Current_process_context ()); Unlock (&sema->lock) ; Swtch (); //switch to another process. return;} Unlock (&sema->lock);} void v (sema_t* sema) {lock (&sema->lock); sema->count++; if (sema->count <= 0) {pcs_ctx* ctx = Q.pop (); unlock (&sema->lock); Enqueue (&running_queue, CTX); return; } unlock (&sema->lock);              

Linux kernel parameter plot

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.