Common error and analysis of Linux network
is to want to break the error and monitoring, but found that the two are almost coupled, through monitoring items to find errors, the cause of the error is also dependent on monitoring items. Simply merge the two together.
For a lot of errors, in fact, even if you see the error message is not clear exactly where the error, or ambiguous, or even misleading. The best way is "Show me the Code".
Here, a brief introduction to the network-related debugging, viewing methods, of course, including the error related content.
There are a number of ways to view the status of the network, such as the previously described network Status View command netstat VS. SS, which also uses the netlink mechanism; and PROCFS. In comparison, the former is more efficient, but the latter is more convenient to view.
Only the relevant implementation of the PROCFS is described here, and the relevant implementation of the kernel is in net/core/net-procfs.c, so if you are unfamiliar with the contents of the file under/proc/net, you can view the statistics in the kernel directly.
View Drop Packets
Network drops can be a variety of possibilities, such as the switch, the traffic on the upper and lower ports is full or the link is problematic, then the packet may be discarded by the switch, load balancing devices, including hardware devices and software load balancing.
In this case, we only look at the possible substitution of this machine.
The operating system can't handle it, discarding the data.
There are two cases, one is the network card found that the operating system does not handle, lost packets, you can read the following files:
$ cat/proc/net/dev
Each network interface has a row of statistics, and the 4th column (errs) is the number of packets that receive the error, and the 5th column (drop) is the quantity that is received but discarded.
The second part is the traditional non-NAPI interface to implement the network card driver, each CPU has a queue, when the number of packets cached in the queue exceeds Net.core.netdev_max_backlog, the NIC driver will discard the packet, see the following file:
$ cat/proc/net/softnet_stat
Each CPU has a row of statistics, and the second column is the number of packets dropped by the corresponding CPU.
The application does not work, the operating system drops
Two counters are logged in the kernel:
Listenoverflows: When the socket's listen queue is full, when a new connection request is added, the application is too late to process;
Listendrops: In addition to the above, when there is not enough memory to allocate a socket-related data structure for a new connection, 1 is added, and of course there are other exceptions that increase by 1.
Corresponding to the 21st column (listenoverflows) and 22nd column (Listendrops) in the following file, can be viewed by the following command:
$ Cat/proc/net/netstat | awk '/tcpext/{print $21,$22} '
If you use the netstat command, you will see "times the listen queue of a socket overflowed" and "SYNs to listen sockets ignored" corresponding to the number in front of the line when you drop the packet, if the value is 0 does not output the corresponding row.
Out of memory
look directly at the code, in the kernel, the corresponding code is as follows.
Bool tcp_check_oom (struct sock *sk, int shift) { bool too_ Many_orphans, out_of_socket_memory; too_many_orphans = tcp_too_many_ Orphans (sk, shift); out_of_socket_memory = tcp_out_of_memory (SK); if (Too_many_orphans) net_info_ Ratelimited ("too many orphaned sockets\n"); if (out_of_socket_ Memory) net_info_ratelimited ("out of memory -- consider tuning tcp_mem\n "); return too_many_orphans | | out_of_socket_memory;} Static int tcp_out_of_resources (Struct sock *sk, bool do_reset) { struct tcp_sock *tp = tcp_sk (SK); int shift = 0; /* if peer does not open window for long time, or did not transmit * anything for long time, penalize it. */ if ((S32) (tcp_time_stamp - tp-> Lsndtime) > 2*tcp_rto_max | | !do_reset) shift++; /* if some dubious ICMP arrived, penalize even more. */ if (Sk->sk_err_soft) shift++; if (Tcp_check_oom (sk, shift)) { /* catch exceptional cases, when connection requires reset. * 1. Last segment was sent recently. */ if ((S32 ) (Tcp_time_stamp - tp->lsndtime) <= tcp_timewait_len | | /* 2. window is closed. */ (!tp->snd_wnd && !tp->packets_out)) do_reset = true; if (Do_reset) tcp_send_active_reset (Sk, GFP_ATOMIC); tcp_done (SK); net_inc _STATS_BH (Sock_net (SK), linux_mib_tcpabortonmemory); return 1; } return 0;}
As shown above, there may be two scenarios in which there is insufficient memory:
There are too many orphan sockets, which is often the case for some front-end servers.
The memory allocated to TCP is indeed small, resulting in low memory.
Not enough memory
This is better troubleshooting, just need to see how much memory is actually allocated to TCP, and now how much memory to use. It is important to note that the usual configuration items are in Bytes, which is used in Pages, typically 4K.
First, look at how much memory is allocated to TCP.
$ cat/proc/sys/net/ipv4/tcp_mem183474 244633 366948
In a nutshell, three values indicate a value that goes into no pressure, pressure mode, upper memory limit, and an error when the last value is reached.
Next, look at the memory currently in use.
$ cat/proc/net/sockstatsockets:used 855tcp:inuse Orphan 1 TW 0 alloc mem 3udp:inuse mem 10udplite:inuse 0RAW : inuse 1frag:inuse 0 Memory 0
The mem indicates how many Pages are used, and if it is small compared to the TCP_MEM configuration, it could be caused by orphan sockets.
Orphan sockets
Let's start by introducing what is orphan sockets, which simply means that the socket is not associated with any one of the file descriptions typeface. For example, when the app calls Close () closes a link, the socket becomes orphan, but the sock remains for a period of time until the end of the TCP protocol.
The orphan socket is actually useless for the application, so the kernel wants to minimize the number of orphan. For short requests such as HTTP, the probability of orphan occurring is larger.
for the maximum number of orphan allowed by the system, and the current number of orphan can be viewed as follows:
$ cat /proc/sys/net/ipv4/tcp_max_orphans32768$ cat /proc/net/sockstat  N. 16-05-30 14:11... TCP:&NBSP;INUSE&NBSP;37&NBSP;ORPHAN&NBSP;14&NBSP;TW&NBSP;8&NBSP;ALLOC&NBSP;39&NBSP;MEM&NBSP;9...&NBSP, .....
You may find that the number of orphan in Sockstat is much smaller than the number of Tcp_max_orphans.
In fact, you can see from the code that there is actually an offset shift that has a range of [0, 2].
Static inline bool tcp_too_many_orphans (Struct sock *sk, int shift) { struct percpu_counter *ocp = sk->sk_prot->orphan_count; int orphans = percpu_counter_read_positive (OCP); if (Orphans << shift > sysctl_tcp_max_orphans) { orphans = percpu_counter_sum_positive (OCP); if (Orphans << shift > sysctl_tcp_max_orphans) return true; } return false;}
That is to say, in some scenarios will be punished orphan, the number of orphan 2x or even 4x, which explains the above problems.
If this is the case, then you can adjust the Tcp_max_orphans value appropriately according to the specific situation.
Summarize
In addition to the fact that there may be a real memory shortage, there may be a penalty for the kernel, resulting in orphan false positives.
PROCFS File System
This article from the "New One" blog, reproduced please contact the author!
Common error and analysis of Linux TCP network