The kernel module fails to unmount the module by calling kernel_thread to create the kernel thread.

Source: Internet
Author: User

First, let's introduce the general process of creating a thread: In the initialization function of the module, call netlink_kernel_create to register its own netlink protocol, and return the result. The function that receives the netlink message is fcluster_netlink_recv, the actual initialization operation is performed only after the netlink message is received. The thread creation operation is also performed in fcluster_netlink_recv, as shown in: the statement for calling kernel_thread is: [cpp] kernel_thread (fcluster_rcv_handoff, NULL, 0); according to this process, the system prompts "module is in use" When detaching a module ". you can run the lsmod command to view your own modules and find that the number of references is 1 (only when the number of module references is 0), but you cannot see which module is being referenced. After multiple repeated tests, the problem occurs when the kernel thread is created. Because if you put the call to kernel_thread in the fcluster_init function, the number of references to the module is 0, and the module can be detached. Now we can determine that the reason why the module cannot be detached is that the kernel thread creation time is incorrect by calling kernel_thread or the specified parameter is incorrect. In order to locate the problem, add debugging code in different cases to find the problem. In the first test, kernel_thread is placed in fcluster_init. The code for printing module reference counting is added in fcluster_rcv_handoff (kernel thread processing function), fcluster_init, and fcluster_netlink_recv, it is found that the reference counts for these three locations are 0, 1, and 1. In this case, you can uninstall the module. In the second test, kernel_thread is placed in fcluster_netlink_recv, and the Code for printing the module reference count is added to fcluster_rcv_handoff (kernel thread processing function), fcluster_init, and fcluster_netlink_recv, it is found that the reference counts for these three locations are 1, 1, and 1. In this case, the module cannot be detached. This is strange. Why is the reference count in fcluster_init and fcluster_netlink_recv all 1, but the reference count of the module is different when kernel threads are created in different functions? That is to say, after the kernel thread is created in fcluster_init and returned, the reference count of the module changes to 0. However, after the kernel thread is created and returned in fcluster_netlink_recv, the reference count of the module is still 1, to find out the cause of the problem, the source code is the best tool, so I immediately went to the kernel source code to see the load_module function, do_fork function, netlink_sendmsg function. Why do we need to look at several functions? Because the fcluster_init function must be called during the module loading process, the system of the loading module calls the sys_init_module function mainly by calling load_module; kernel_thread mainly refers to the kernel thread created by calling do_fork. netlink_sendmsg is the sending function of netlink packets. This function will call the fcluster_netlink_recv function (this function is registered with netlink_sock-type netlink_rcv member. First, start with load_module, and finally find out how to set the reference count of the module to 1 in module_unload_init called by load_module, and keep the reference of the module during initialization, prevents uninstalling a module (this is almost impossible, but the initialization function of the module may sleep ). After load_module completes hard module loading (/* Do all the hard work */), sys_init_module calls the initialization function provided by the module through the do_one_initcall function, therefore, in fcluster_init (my module's initialization function), the reference count of the module will be 0, but when will the reference count of the module change to 0? In sys_init_module, module_put is called to reduce the reference count of the module. Therefore, if the call of kernel_thread is placed in fcluster_init, the reference count of the module remains 1 after the fcluster_init return module is loaded, the test results also show that, but it is still insufficient, because we do not know whether the do_fork function will maintain the reference count of the module as 1 due to the call timing problem. Next, let's take a look at the netlink_sendmsg function. This process is relatively simple. The general call process is shown in. In this process, there is no operation to add 1 to the module reference count, it can be determined that the reference count of the module is already 1 before netlink_sendmsg is called. It is only time to create a socket at the user layer. Check out the netlink_create function, which is indeed in this function, some code is as follows: [cpp] if (nl_table [protocol]. registered & try_module_get (nl_table [protocol]. module) module = nl_table [protocol]. module. The module is my kernel module. For details, see the netlink_kernel_create function. The release of my module reference will occur only when the socket is released. The cause is that the socket created at the user layer is not released. For details, see the netlink_release function, at this point, you can find the cause of the problem. When the user layer calls the socket function to create a socket of The netlink protocol type provided by me, it calls the netlink_create function and Adds 1 to the reference count of the module I registered. When the user layer calls sendmsg to send netlink packets, it calls netlink_sendmsg and then the fcluster_netlink_recv function. If I call kernel_thread in fcluster_netlink_recv, the kernel thread is created based on the current user process, the files opened by the current user process include the created socket (this socket will add 1 to the reference count of my module when it is created ), the flag parameter specified when the kernel thread is created is 0, so that the newly created kernel thread will add 1 to the reference count of this socket (see the copy_files function ). When the user layer closes the socket, because the reference count of the socket is not 0, the socket will not be released, and the reference count of my module will not be reduced, leading to the failure of uninstalling the module. To verify my reasoning, Run "lsof-p 5599" (5599 is the id of the kernel thread) to verify that the kernel module can be detached: the following figure shows that a sock-type row is added when the kernel module cannot be detached. How can this problem be solved? Either close all open files in the newly created kernel thread, or specify the CLONE_FILES flag when creating the kernel thread, or call the daemonize function # define CLONE_FILES 0x00000400/* set if open files shared between processes */In the handler of the kernel thread to solve this problem, see the copy_files function. It took me a lot of time to solve this problem, but through this process, I learned about the module loading process, the transmission and receipt of netlink packets, the creation of netlink sockets, and the creation of kernel threads, so I still benefited a lot. When the time is up, I will sort out the load_module and netlink analysis and paste it to share with you. It is hard to say that this problem is difficult to describe. So if you have any questions, please contact us.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.