Operation not permitted for 72 hours

Source: Internet
Author: User
Tags stack trace
0. Problem and description

When we test the product, we find that the following error occurs in the main process vpnd:
2013-07-18 13:05:13 www.1.com/192.168.200.220:65527 write udpv4 []: operation not permitted (code = 1)
2013-07-18 13:05:13 www.1.com/192.168.200.220:65527 write udpv4 []: operation not permitted (code = 1)
2013-07-18 13:05:13 www.1.com/192.168.200.220:65527 write udpv4 []: operation not permitted (code = 1)
2013-07-18 13:05:13 www.1.com/192.168.200.220:65527 write udpv4 []: operation not permitted (code = 1)
2013-07-18 13:05:13 www.1.com/192.168.200.220:65527 write udpv4 []: operation not permitted (code = 1)

This type of error is reported "but not often, not absolutely". I tested it for two days last weekend and did not report an error. However, it would not work on Monday! This tough problem is the worst.
When I troubleshoot the problem, the first step is Google. If there is no result, I will go deep into it according to my own style. If there is more depth, I will go deeper. Once there is a solution, I will post it on all my blogs. In this way, when someone else tries to Google it, there will be one more answer. The knowledge of the internet age is accumulated in this way.
After Google for a long time, there is no creative conclusion. A lot of descriptions about this error are iptables drop data packets. In fact, I really want to find an answer that has nothing to do with iptables. Intuition tells me that once I find the answer, the error is basically the cause (because my iptables rules didn't drop anything at all ), but I cannot find it! You can only send it by yourself. It started in 72 hours! In the past three days, I got late at night for two consecutive days and almost didn't want to talk to anyone during the day. If it wasn't because I had been drinking for my homework last night, it would take me 10 hours to cut it down, the 72 hours are worth remembering.
1. It turns out to be another reason, not iptables.

Ftrace is a good thing. As I have mentioned earlier, it is very simple. It is supported by general releases. So I tried the following actions to get debugging information:

mount -t debugfs debug /sys/kernel/debug

Then let's see what can be used for me. After reading available_tracers and having no function, we couldn't tune the call stack of the kernel function. So we looked at the event and found kfree_skb. We know that once a packet is discarded, it must be free, therefore, it can use:

echo skb:kfree_skb >/sys/kernel/debug/tracing/set_event

However, once a data packet successfully reaches the user's socket or is successfully sent from the network card, it also needs to be free, so we need to filter out the following two types of free, but how to filter? If trace is enabled at this time, CAT/sys/kernel/debug/tracing/trace will find a lot of output similar to the following:
<...>-14297 [002] 1286.492153: kfree_skb: skbaddr = ffff88007c1e0100 protocol = 2048 location = $ kfree_skb call address
What should we do? Because tracestack is not available, you can only find the above "kfree_skb call address" in/proc/kallsyms. Note, /proc/kallsyms cannot match the above "kfree_skb call address" because kallsyms contains the address of the kernel function, while kfree_skb is only a call in the kernel function, therefore, you need to find the nearest function under "kfree_skb call address.
According to the above filtering method, we found the normal kfree_skb again and again, and then echo it to the following file:
/Sys/kernel/debug/tracing/event/SKB/kfree_skb/Filter
Finally, I got a long filter expression:

echo "location!=0xffffffffa026d04a && location!=0xffffffffa026c82a &&location!=0xffffffff812d6a6f && location !=0xffffffff812a092c &&location!=0xffffffff812aa03e && location != 0xffffffff812f726f &&location!=0xffffffffa01ee3af &&location!=0xffffffff812aaf82 && location!=0xffffffff812965a3 &&location != 0xffffffff812f622c &&location!=0xffffffff812c8324 && location!=0xffffffff812aace1 && location!=0xffffffff81294ef7 && location!=0xffffffff812a12b8 &&location!=0xffffffffa026d04a" >/sys/kernel/debug/tracing/event/skb/kfree_skb/filter 

Then start trace again:

echo 1 >/sys/kernel/debug/tracing/event/skb/kfree_skb/enable

In order to keep the information output, I wrote the following command, which is not well written because I just want to quickly locate the problem and don't care about the specification at all:

while true; do echo $(date); cat trace|grep -v FUNCTION|grep -v "tracer: nop"|grep -v "#              | |       |"|grep -v "#";>trace;sleep 4;done

The purpose of the above command is to continuously output the latest debugging information and delete the old one after each output.
Then there is a long wait, waiting for the problem to reappear. The hour-to-hour wait... finally appeared, with a large number of results:
<Vpnd>-XX [002] 2876.582821: kfree_skb: skbaddr = ffff88007c1e0100 protocol = 2048 location = ffffffff812b2887
Query kallsyms and find that it is actually nf_hook_slow, and this function is the core function of Netfilter. Is it really caused by iptables? I still don't believe it! But there is no more detailed information. It would be nice if we could open the stack !!
2. More detailed information is required.

Obviously, the kernel without function trace support cannot obtain more detailed information through ftrace. In fact, I want to re-locate the problem and obtain the stack when nf_hook_slow is called. Re-Compilation of the kernel will certainly solve the problem, but the scene is too ambitious, so I tried to hack myself, so I wrote the following module:

# Include <Linux/types. h> # include <Linux/module. h> # include <Linux/smp_lock.h> // # include <Linux/netfilter. h> module_license ("GPL"); module_author ("Zhaoya <marywangran@126.com>"); module_description ("xtables: DNAT from IP-pool "); // replace nf_hook_slowint (* bytes) (u_int8_t, unsigned int, struct sk_buff *, struct net_device *, struct net_device *, INT (* okfn) (struct sk_buff *), INT) = 0xffffffff8126 A910; char code [200] = {0}; int (* _ nf_hook_slow) (u_int8_t, unsigned int, struct sk_buff *, struct net_device *, struct net_device *, INT (* okfn) (structsk_buff *), INT); int evaluate (u_int8_t PF, unsigned int hook, struct sk_buff * SKB, struct net_device * indev, struct net_device * outdev, INT (* okfn) (struct sk_buff *), int hook_thresh) {// call the original nf_hook_slow int ret = _ nf_hook_slow (PF, Hook, SKB, Indev, outdev, okfn, hook_thresh); // print the stack if (ret = 1) {dump_stack ();} return ret ;} static int _ init pool_target_init (void) {// 0xffffffffff8126a910 is the address of the original nf_hook_slow. // The code must be used to overwrite the code and save it elsewhere. char * ADDR = callback; // because 0xffffffff8126a910 needs to be rewritten to prevent the kernel from being set to read-only, // you need to modify the table item char * PTE = 0xfffffffff126a161; lock_kernel (); unsigned long value = * PTE; value = value | 0x02; * PTE = value; // replace the original letter The number is stored in the Code. The length of 200 is the address of the original nf_hook_slow. // the difference between the length of and the address of the function below is a little greater, because the Code must have a return, therefore, it is okay to copy multiple copies. Memcpy (Code, orig_nf_hook_slow, sizeof (CODE); // use the new nf_hook_slow to overwrite the address of the original function memcpy (ADDR, new_nf_hook_slow, 40 ); // Save the new address of the original function _ nf_hook_slow = Code; unlock_kernel (); Return 0;} static void _ exit pool_target_exit (void) {char * ADDR = 0xffffffff8126a910; lock_kernel (); memcpy (ADDR, code, sizeof (CODE); unlock_kernel ();} module_init (pool_target_init); module_exit (pool_target_exit );

The code is messy. Because the original Nat module is changed (it is easy to use its makefile), the naming is also unreasonable. Loading, error, panic !!
...
Till now there is still no courage to compile the new kernel
...
The anger reached the point of crash and finally decided to re-compile the kernel!
3. reconfigure and compile the kernel and all modules

Start at two o'clock P.M. and reconfigure the kernel. The new configuration is as follows:
<Config_frame_pointer = y
<Config_have_ftrace_nmi_enter = y
<Config_ftrace_nmi_enter = y
<Config_function_tracer = y
<Config_function_graph_tracer = y
<Config_ftrace_syscils = y
<Config_stack_tracer = y
<Config_workqueue_tracer = y
<Config_dynamic_ftrace = y
<Config_function_profiler = y
<Config_ftrace_mcount_record = y

In this way, function trace is supported! Then make-kpkg is compiled. After three hours, the compilation has not been completed, and my virtual machine really cannot work. When I got off work, I hurried home. I had never been so early before. I can change my mood on the road. I found that my psychological quality was good. On the way, I closed the story of the Romans: the last fight, and I worships Constantine!
After dinner, I started a thrilling journey. I connected the company's machine through VPN and found that the compilation was completed. So I submitted it to SVN at about nine o'clock, prepare to use the new kernel for debugging. Because I am excited to drink half a bottle of wine, it is dangerous to do anything after drinking, not just to drive. When upgrading the kernel, grub was also replaced. Reboot was not successful, and the company's machine did not support IPMI Remote Boot. This is a sad reminder... I can only go to bed early!
4. New ftrace and stack trace

This morning, I arrived at the company, configured the new environment, upgraded the new kernel, and finally got a glimpse of it. After the problem is fixed, I printed the following information:
<...>-27369 [002] 2876.582821: kfree_skb: skbaddr = ffff88007c1e0100 protocol = 2048 location = ffffffff812ad914
<...>-27369 [002] 2876.582825: <stack trace>
=> Kfree_skb
=> Nf_hook_slow
=> T.889
=> Ip_output
=> Ip_local_out
=> Ip_push_pending_frames
=> Udp_push_pending_frames
=> Udp_sendmsg
<...>-27369 [002] 2876.583069: kfree_skb: skbaddr = ffff88007c1e0c00 protocol = 2048 location = ffffffff812ad914
<...>-27369 [002] 2876.583074: <stack trace>
=> Kfree_skb
=> Nf_hook_slow
=> T.889
=> Ip_output
=> Ip_local_out
=> Ip_push_pending_frames
=> Udp_push_pending_frames
=> Udp_sendmsg
<Idle>-0 [002] 11979.770335: kfree_skb: skbaddr = ffff88002f3f5200 protocol = 2048 location = ffffffff812ad914
<Idle>-0 [002] 11979.770338: <stack trace>
=> Kfree_skb
=> Nf_hook_slow
=> Ip_local_deliver
=> Ip_rcv_finish
=> Ip_rcv
=> Netif_receive_skb
=> Br_handle_frame_finish
=> Br_nf_pre_routing_finish
... A large number of br_nf_pre_routing_finish stacks are discarded by Netfilter after they reach ip_local_deliver.
At first, I thought all the data packets sent by vpnd were dropped. Now it seems that the data packets received by bridge will also be dropped. (in this case, the process is the kernel itself, namely the idle process no. 0. In fact, it is not an idle, all kernel threads in the new kernel are in the idle Task Force column. From the above stack, the hook points corresponding to ip_local_deliver and ip_output should be input and postrouging respectively, I didn't add any iptables drop policies to these two chains. In view of the in-depth understanding of Netfilter, I think it is caused by some hook functions hidden in the kernel, in order to find out which implicit hook function is, it is necessary to follow the code. Because ftrace is used to output a large amount of information, it will encounter the problem of schörörno, so it is decided to browse the code statically.
I thought packet loss was caused by the vpnd process at first, but from the result of ftrace, a large number of packet loss was caused by softirq OF THE ilde process, it is obvious that the packet receiving process causes packet loss. Another problem is to view the/proc/NET/snmp file, it is found that the outdiscards field is increased when the problem is reproduced, that is, operation not permitted, while the increase of outdiscards is called only in the specified place. Before the problem is fully analyzed, the two questions indicate two problems. One is that the ip_local_out function does have packet loss, and the other is that a global problem may cause this packet loss, because not only packet loss occurs when data is sent.
Note:If you look at the above stack, you will find that the br_nf_pre_routing_finish-br_handle_frame_finish-netif_receive_skb is different from the actual, this is because I modified nf_bridge, my goal is to let the bridge to call iptables prerouting, then, based on Mark, determine whether to perform bridge redirect.
5. traverse the output and postrouting of Netfilter

The process of traversing the code is painful. It is nothing more than finding the hidden hook function in the kernel, under which circumstances the data packet will be dropped, and at the input and postrouting points. Finally, _ nf_conntrack_confirm caught my attention because the confirm is called when the data packet leaves the protocol stack. This is to add the new State conntrack to the conntrack hash, exactly this confirm is also in the input and postrouting points, these two points are exactly the point where the data packet leaves the protocol stack! Fortunately, the drop:

Int _ nf_conntrack_confirm (struct sk_buff * SKB ){... // if the same Conn is found in the two-way hash table, drop hlist_nulls_for_each_entry (H, N, & net-> CT. hash [hash], hnnode) if (nf_ct_tuple_equal (& CT-> tuplehash [ip_ct_dir_original]. tuple, & H-> tuple) goto out; hlist_nulls_for_each_entry (H, N, & net-> CT. hash [repl_hash], hnnode) if (nf_ct_tuple_equal (& CT-> tuplehash [ip_ct_dir_reply]. tuple, & H-> tuple) goto out ;... out: nf_ct_stat_inc (net, insert_failed); spin_unlock_bh (& nf_conntrack_lock); Return nf_drop ;}

Browsing the code till now, because I have found something that can be debugged, and if that cannot be done, continue to read the code. This is:
Nf_ct_stat_inc (net, insert_failed );
This is a kernel counter named insert_failed. Therefore, the procfs net/STAT directory is used for observation.
6. Locate the insert_failed counter.

View/proc/NET/STAT/nf_conntrack and find the following facts:
XXXX-SSL-GATEWAY:/proc/net # Cat STAT/nf_conntrack
Entries searched found new invalid ignore Delete delete_list insertInsert_failedDrop early_drop icmp_error expect_new expect_create expect_delete
00002236 00692e45 024a9466 001e1713 4928ab5 00008007 001cad74 001c9b8d 001e087b 00000e7f 00000000 00000000 000004a1 00000000 00000000 00000000
00002236 0067 bda-0241465c 001db7e0 00008853 00007756 001f1fe9 001f1265 001daab200000d1900000000 00000000 0000049b 00000000 00000000 00000000
00002236 00656b68 02332e99 001d3312 00008026 00007795 001d9498 001d8a18 001d2575 00000d8f 00000000 00000000 00000482 00000000 00000000
00002236 0062d343 02261842 001 cccef 20177ca 792a 001c4ac9 001c3e58 001cbff6 00000ceb 00000000 00000000 000004a5 00000000 00000000 00000000
# The problem is reproduced within the time range.
XXXX-SSL-GATEWAY:/proc/net # Cat STAT/nf_conntrack
Entries searched found new invalid ignore Delete delete_list insertInsert_failedDrop early_drop icmp_error expect_new expect_create expect_delete
201721c9 006b559d 0259312e 001ecf50 4158e14 00008023 001d83fa 001d7298 001ec0b8 00000e7f 00000000 00000000 000004a3 00000000 00000000 00000000
201721c9 0069b52e 024fdc33 001e6eb1 20178be5 00007760 001fb89d 001faa7c 001e618100000dab00000000 00000000 000004a1 00000000 00000000
201721c9 006765ca 02416a16 001de7ab 2017835f 201777a8 001e49e8 001e3f81 001dda05 00000d98 00000000 00000000 00000485 00000000 00000000 00000000
201721c9 0064d066 0233c2d7 001d7e55 20177f83 2017792c 001cfbb9 001cef3c 001d715c 00000ceb 00000000 00000000 000004a5 00000000 00000000 00000000
Note the increasing number of insert_failed counters. What is the description? It means that the _ nf_conntrack_confirm function drops the data packet.
The problem is located. Here is how to solve it! Let's take a look.
7. Discard ftrace and start using the conntrack Tool

At this point, ftrace is no longer needed, and ftrace is no longer needed, because I am very confident in the following work, and I will also use YD to solve the problem. Turn off ftrace to prevent schör's cat from getting surprised!

echo 0 >/sys/kernel/debug/tracing/event/skb/kfree_skb/enableumount /sys/kernel/debug

To debug conntrack, only conntrack-tools are available. Therefore, run the following command:

conntrack -E -e DESTROY

In this way, you can monitor the conn destruction event. Another obstacle was encountered. Due to the excessive number of system connections, an error was reported immediately:
Warning: We have hit enobufs! We are losing events.
This message means that the current netlink socket buffer size is too small.
Please, check -- buffer-size in conntrack (8) manpage.
Conntrack v1.0.0 (conntrack-tools): operation failed: no buffer space available
Now, I wrote another command:

while true;do echo $(date);cat ip_conntrack|grep 192.168.200.220|grep udp;echo ;echo;sleep 2;done

Filter out the IP addresses and Protocols related to operation not permitted for continuous display. In the instant of problem reproduction:
Thursday, July 18, 2013 13:02:15 CST
UDP 17 179 src = 192.168.200.220 DST = 192.168.101.242 sport = 16796 dport = 61194 packets = 161510 bytes = 27474312 src = 192.168.101.242 DST = 192.168.200.220 sport = 61194 dport = 16796 packets = 140004 bytes = 36911621 [assured] Mark = 100 secmark = 0 use = 2

Thursday, July 18, 2013 13:02:18 CST
Thursday, July 18, 2013 13:02:22 CST
Conntrack suddenly disappears. This is not the case. It will only expire in 179 seconds. How can it be deleted? The next step is to look at the code again and see where to call the delete conntrack action, but I don't think it is necessary because I already know the answer. In our product, in the early stage, the conntrack-F command was called to solve a problem...
In fact, even if I don't know how to call conntrack-F, I can still find someone actually called conntrack-f. By observing the above ftrace output, we will find that each time the drop caused by udp_sendmsg, there will be a large number of drops caused by br_nf_pre_routing_finish, which is the same in turn, as a result, I once thought about whether or not either of the two has led to the occurrence of the other. Later I found that they have nothing to do with each other. What can this prove? ---------------This proves that a large number of conntracks must be deleted at the same time.------------If a large amount of automatic information in the kernel is deleted at the same time, it must be a defect and the designer will not perform such a smooth task. In general, the current one is discarded. This indicates that some users have manually called the delete operation through the interface provided by the system. The culprit is conntrack-f!
8. No way. Just think about it.

The problem has not been solved yet. You need to think about how to cause the data packet to be dropped in _ nf_conntrack_confirm? In this way, we can confirm that we can eliminate other factors as much as possible. There is no way to debug this. You have to think about it, but the premise is that you need to be extremely proficient in netfilter's conntrack processing. I will give a timing diagram to illustrate the problem:
Note _ nf_conntrack_confirm: "If the conn marked as new finds the same in the two-way hash table, drop"

The connection is bidirectional, but only one of the conntrack is the key to the problem. It seems that the stateful conntrack of TCP is perfect.
This problem is really hard to reproduce. I tried to manually call conntrack-F several times, not every time, or even several times in a row. The problem can be reproduced only when it is stuck at those time points. Therefore, this is a "but not always, not absolutely" problem !!
We can find that this problem is more likely to occur in a multi-processor environment, because in a multi-processor environment, the reproduction of the problem is not only dependent on the UDP sending Process

Schedule also depends on the execution of network packet receiving processes on other processors.
9. Embarrassing Solution

Why? Is the problem solved? Yes, the problem is solved, but the solution is really embarrassing, that is, the conntrack-F is removed, too TMD has no technical content. However, for tech enthusiasts, I have learned a lot in 72 hours, and I understand that I should try not to complicate simple problems in the future, another question about what went wrong is actually "What functions have I added?" any changes that have occurred to the problematic period may lead to various strange problems.
Locating and solving a bug is really embarrassing, but it is important for a technical infatuation to know why.
10. Experience

I have no experience. The only experience is that when I add a function, I must thoroughly understand all the technical details used by this function. Otherwise, new problems may be introduced! You must never be confused about technical issues, or you must thoroughly understand them!
11. About the title

The reason why this article is named is that it is easier for others to search for this article, because I spent a lot of time to solve the problem of Google at first, no matter what the experts in China and abroad, no focus. Therefore, I hope that this article will be helpful to you 72 hours ago. After all, I also hoped to find the answer on the Internet at first, so I don't want to work hard with such a chance, I tried to find the result myself. I believe that this is the majority of people, so I hope everyone can continue to try this way. After all, it is a lazy person who creates the world. Lazy people are not lazy, and lazy people do not focus on new things, but when he had to re-create the wheel, his goal was not to let others-other lazy people re-create the wheel!
Life within 12.72 hours

I feel like I am a dual-core. I can stay nervous for 72 hours to troubleshoot this problem. However, I have other advantages:
1. The conclusion of the 9th book "The Story of the Romans" is the last fight and the sorrow of Dai krixian.
2. I checked the related documents of Constantine and understood why he ended the Roman Empire and created the Middle Ages.
3. Relying on the advantages of the geographical location can be better than the time and people in the day.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.