The mode of user space and kernel spatial data exchange under Linux _

The mode of user space and kernel spatial data exchange under Linux __linux

Last Update:2018-07-30 Source: Internet

Author: User

Tags sendmsg dmesg

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This series includes two articles, which describe in detail nine ways of user space and kernel spatial data exchange under Linux system, including kernel startup parameters, module parameters and Sysfs, Sysctl, System calls, NetLink, Procfs, Seq_file, Debugfs and RELAYFS, and give specific examples to help readers master the use of these technologies.
This article is the first of the series, which describes kernel startup parameters, module parameters and Sysfs, Sysctl, System calls, and NetLink, and illustrates how they are used in conjunction with the example programs given.

First, the introduction

In general, the kernel and application have different address spaces on a multitasking system that uses virtual memory technology, so data exchange between the kernel and applications and between applications and applications requires specialized mechanisms for implementation, and it is well known that Inter-process Communication (IPC) mechanism is designed to realize the data exchange between application and application, while most readers may have a better understanding of interprocess communication, there may be little understanding of the data exchange mechanism between the application and the kernel, and this article will describe in detail the various ways in which the Linux kernel and application are exchanged for data. Includes kernel startup parameters, module parameters and Sysfs, Sysctl, System calls, NetLink, Procfs, Seq_file, Debugfs, and RELAYFS.

Second, kernel boot parameters

Linux provides a way to transfer startup parameters to it by bootloader, which enables kernel developers to transfer data to the kernel to control kernel startup behavior.

The usual way to do this is to define a function that analyzes the parameters and then use the kernel-supplied macro __setup to register it with the kernel, which is defined in linux/init.h, so you must include the header file to use it:

__setup ("Para_name=", Parse_func)

Para_name is the parameter name, Parse_func is the function that analyzes the value of the parameter, it is responsible for converting the value of the parameter to the value of the corresponding kernel variable and setting the kernel variable. The kernel provides a function get_option and get_options for the analysis of integer parameter values, which is used to analyze the case that the parameter value is an integer, while the latter is used to analyze a series of integers separated by a comma, and for the case of a string of parameter values, the developer is required to customize the corresponding analytic function. The kernel program in the source code package KERN-BOOT-PARAMS.C illustrates the use of three of cases. The program lists the arguments as an integer, a comma-separated integer string, and a string of three cases. To test the program, the reader needs to copy the program to a directory in the source directory tree of the kernel to be used, in order to avoid confusion with other parts of the kernel, the author recommends creating a new directory under the root directory of the kernel source tree, as Examples, then copy the program to the examples directory and rename it to Setup_example.c, and create a Makefile file for the directory:

Obj-y = SETUP_EXAMPLE.O

Makefile only this line is enough, and then you need to modify the source tree root directory of the Makefile file line, put the following line

Core-y          : = usr/

Amended to

Core-y          : = usr/examples/

Note: If the reader creates a new directory and the renamed file name differs from the above, you need to modify the location of the Makefile file mentioned above. By doing so, you can build a new kernel in accordance with the kernel build steps, and after you have built the kernel and set up LILO or GRUB as a startup entry for the kernel, you can start the kernel, and then use LILO or grub editing to add the following parameter string to the kernel's startup parameter line:

setup_example_int=1234 setup_example_int_array=100,200,300,400 Setup_example_string=thisisatest

Of course, the parameter string can also be written directly to the kernel command-line argument string in the Lilo or grub configuration file that corresponds to the new kernel. Readers can use other parameter values to test the feature.

The following is the output from the author's system using the above parameter row:

setup_example_int=1234
setup_example_int_array=100,200,300,400
Setup_example_int_array includes 4 Intergers
Setup_example_string=thisisatest

Readers can use

DMESG | grep setup

To view the output of the program.

Third, module parameters and Sysfs

Kernel subsystems or device drivers can be directly compiled into the kernel, can also be compiled into a module, if compiled into the kernel, you can use the method described in the previous section to pass parameters to them through the kernel boot parameters, if compiled into a module, you can pass the parameter when inserting the module through the command line, or at run time, Set up or read module data by SYSFS.

SYSFS is a memory-based filesystem, and in fact it provides a way to open the kernel data structures, their attributes, and the connection between attributes and data structures to the user state, which is tightly integrated with the Kobject subsystem, based on RAMFS,SYSFS. So the kernel developer does not need to use it directly, but rather the subsystems of the kernel use it. To use SYSFS to read and set kernel parameters, the user can read and set the parameters that the kernel opens to the user through the SYSFS by using a file operation only if the SYSFS is loaded:

$ mkdir-p/sysfs
$ mount-t Sysfs Sysfs/sysfs

Note, do not confuse Sysfs and Sysctl, Sysctl is the kernel of some control parameters, the purpose is to facilitate the user to control the behavior of the kernel, and SYSFS is only the kernel of the Kobject object hierarchy and attributes open to the user view, so the majority of the SYSFS is read-only , the module as a kobject is also exported to the SYSFS, module parameters are exported as module properties, the kernel implementation for the use of the module provides a more flexible way, allowing users to set module parameters in the SYSFS visibility and allow users to write the module when the user set these parameters in the SYSFS , then the user can view and set the module parameters through SYSFS, allowing the user to control the module behavior while the module is running.

For modules, variables declared as static can be set from the command line, but to be visible under SYSFS, you must explicitly declare them by macro Module_param, which has three parameters, the first is the parameter name, the variable name that is already defined, and the second argument is the variable type. Available types are byte, short, ushort, int, uint, long, ulong, charp and bool or invbool, respectively corresponding to type C char, short, unsigned short, int, UN Signed int, long, unsigned long, char * and int, users can also customize type XXX (if the user defines param_get_xxx,param_set_xxx and param_check_xxx). The third parameter of the macro is used to specify access rights, and if 0, the parameter will not appear in the Sysfs file system, allowing access to be a combination of s_irusr, S_iwusr,s_irgrp,s_iwgrp,s_iroth, and S_iwoth, respectively, corresponding to the user-read , user write, user group read, user group write, other user read and other user write, so the access permission setting of file is consistent.

The kernel module MODULE-PARAM-EXAM.C in the source code package is an example of the interaction between user state and kernel state data using module parameters and Sysfs. The module has three parameters that can be set from the command line, and the following is an example of a running result on the author's system:

$ insmod./module-param-exam.ko my_invisible_int=10 my_visible_int=20 mystring= "Hello,world"
my_invisible_int =
My_visible_int =
mystring = ' Hello,world '
$ ls/sys/module/module_param_exam/parameters/
MyString  my_visible_int
$ cat/sys/module/module_param_exam/parameters/mystring
hello,world
$ cat /sys/module/module_param_exam/parameters/my_visible_int
$ echo >/SYS/MODULE/MODULE_PARAM_ Exam/parameters/my_visible_int
$ cat/sys/module/module_param_exam/parameters/my_visible_int
2000
$ echo "abc" >/sys/module/module_param_exam/parameters/mystring
$ cat/sys/module/module_param_exam/ Parameters/mystring
ABC
$ rmmod module_param_exam
my_invisible_int = ten
my_visible_int = 2000
mystring = ' abc '

Four, Sysctl

Sysctl is an effective way for users to set and obtain the configuration parameters of the runtime kernel, in which case the user can change the kernel configuration parameters at any time when the kernel is running, or it can obtain the kernel configuration parameters at any time, usually These configuration parameters of the kernel also appear in the/proc/sys directory of the proc file system, the user application can directly through the file in this directory to implement the kernel configuration of read and write operations, for example, users can

Cat/proc/sys/net/ipv4/ip_forward

To know if the kernel IP layer allows IP packets to be forwarded, the user can

Echo 1 >/proc/sys/net/ipv4/ip_forward

The kernel IP layer is set to allow forwarding of IP packets, that is, the machine is configured as a router or gateway. In general, all Linux releases also provide a system tool sysctl that can set and read kernel configuration parameters, but the tool relies on the proc file system, and the kernel must support the proc file system in order to use the tool. The following is an example of using the Sysctl tool to get and set kernel configuration parameters:

$ sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 0
$ sysctl-w net.ipv4.ip_forward=1
Net.ipv4.ip_ Forward = 1
$ sysctl net.ipv4.ip_forward
Net.ipv4.ip_forward = 1

Note that the parameter net.ipv4.ip_forward is actually converted to the corresponding proc file/proc/sys/net/ipv4/ip_forward, option-W indicates that the kernel configuration parameter is set, there is no option to represent the read kernel configuration parameter, and the user can use the Sysctl-a to read all kernel configuration parameters, for more information on Sysctl tools, please refer to the man page Sysctl (8).

However, the proc file system is not necessary for sysctl, and in the absence of proc file system, it is still possible to use the system call Sysctl provided by the kernel to set up and read the kernel configuration parameters.

A practical example program is given in the source code package, which shows how to use sysctl in kernel and user state. The header file sysctl-exam.h defines the Sysctl entry ID, and the User state application and kernel modules require these IDs to manipulate and register sysctl entries. The kernel module is implemented in file Sysctl-exam-kern.c, in which each SYSCTL entry corresponds to a struct ctl_table structure that defines the ID (field sysctl) of the Ctl_name entry to be registered, in the proc The name below (field procname), the corresponding kernel variable (field data, note that the assignment of the field must be a pointer), the maximum length allowed for the entry (field MaxLen, which is used primarily for string kernel variables so that when the entry is set, A string that exceeds the maximum length is truncated to the back of the extra long portion), the entry's Access rights (field mode) under the proc file system, and the handler function (field Proc_handler, for an integer kernel variable, when passed through the proc setting, should be set to &PROC_ Dointvec, and for string kernel variables, set to &proc_dostring), String handling policy (field strategy, generally this is &sysctl_string).

The SYSCTL entry can be a directory, at which point the Mode field should be set to 0555, otherwise the SYSCTL entry under it will not be accessible through sysctl system calls, and the child points to all entries under the directory entry, and for multiple entries in the same directory, it is not necessary to register one by one. Users can organize them into an array of struct ctl_table types, and then register them once, but at this point the last structure of the array must be set to NULL, i.e.

{
	. ctl_name = 0
}

The registered SYSCTL entry uses the function register_sysctl_table (struct ctl_table *, int), the first parameter is the struct entry or entry array pointer for the defined CTL_TABLE SYSCTL structure, The second parameter is the position inserted into the SYSCTL Entry table, and if inserted to the end, it should be 0, or not 0 if inserted to the beginning. The kernel organizes all sysctl entries into sysctl tables.

When the module is unloaded, you need to use the function unregister_sysctl_table (struct Ctl_table_header *) to register SYSCTL entries that are registered through the function register_sysctl_table The function register_sysctl_table returns the structure struct Ctl_table_header when the call succeeds, which is the table header of the Sysctl table, which is used by the registry function to unload the corresponding SYSCTL entry. The user-state application sysctl-exam-user.c through the SYSCTL system to view and set the SYSCTL entries registered by the previous kernel module (of course, if the user's system kernel already supports the proc file system, it can be used directly using file operations such as Cat, Echo, and so on directly view and set these sysctl entries).

The following is an example of the output from the author running the module and application:

$ insmod./sysctl-exam-kern.ko
$ cat/proc/sys/mysysctl/myint
0
$ cat/proc/sys/mysysctl/mystring
$. /sysctl-exam-user
mysysctl.myint = 0
mysysctl.mystring = ""
$/sysctl-exam-user "Hello, World"
Old Value:mysysctl.myint = 0
New Value:mysysctl.myint =
vale:mysysctl.mystring = ""
new Value:my sysctl.mystring = "Hello, World"
$ cat/proc/sys/mysysctl/myint
$ cat/proc/sys/mysysctl/mystring< C15/>hello, World
$

V. System call

System calls are the interfaces that the kernel provides to the application, and most of the operations that apply to the underlying hardware are done by invoking system calls, such as getting and setting the system time, which requires gettimeofday and settimeofday to be called separately. In fact, all system calls involve data exchange between the kernel and the application, such as the file system operation function Read and write, setting and reading the setsockopt and getsockopt of the network protocol stack. Instead of explaining how to add new system calls, this section explains how to use existing system calls to implement user data transfer requirements.

In general, users can create a pseudo device to the application and the core data exchange between the channel, the most common practice is to use a pseudo-character device, the implementation of the method is:

1. Defines the necessary functions for manipulating character devices and sets the structure struct file_operations

Structure struct file_operations is very large, for the general data exchange requirements, only the definition of open, read, write, IOCTL, MMAP and release functions is sufficient, they actually correspond to the user state file system operation function Open, r EAD, write, IOCTL, mmap and close. Examples of the prototypes of these functions are as follows:

ssize_t exam_read (struct file * file, char __user * buf, size_t count, loff_t * PPOs)
{
...
}
ssize_t exam_write (struct file * file, const char __user * buf, size_t count, loff_t * PPOs)
{
...
}
int Exam_ioctl (struct inode * inode, struct file * file, unsigned int cmd, unsigned long argv)
{
...
}
int exam_mmap (struct file *, struct vm_area_struct *)
{
...
}
int Exam_open (struct inode * inode, struct file * file)
{
...
}
int exam_release (struct inode * inode, struct file * file)
{
...
}

After defining these action functions, you need to define and set the structure struct file_operations

struct File_operations exam_file_ops = {
	. Owner = This_module,
	. Read = Exam_read,
	. Write = exam_write,< C29/>.ioctl = Exam_ioctl,
	. mmap = Exam_mmap,
	. Open = Exam_open,
	. Release = Exam_release,
};

2. Register the defined pseudo character device and associate it with the struct file_operations above:

int exam_char_dev_major;
Exam_char_dev_major = Register_chrdev (0, "Exam_char_dev", &exam_file_ops);

Note that the first parameter of the function Register_chrdev, if 0, indicates that the kernel determines the primary device number of the registered pseudo-character device, which is the return of the function as the actual allocated main device number, or if it returns less than 0, indicating that the registration failed. Therefore, the user must determine the return value in order to handle the failure when using the function. In order to use this function, you must include the header file Linux/fs.h.

In the source code package, a typical example of using this method to exchange data between user state and kernel state is given, which contains three files: header file syscall-exam.h defines IOCTL command,. c file SYSCALL-EXAM-USER.C is a user state application, which exchanges data with the kernel state module through the file system operation function mmap and IOCTL, and the. c file syscall-exam-kern.c as the kernel module, it implements a pseudo character device, In order to exchange data with user state applications. In order to run the application Syscall-exam-user correctly, you need to create a pseudo character device for the implementation after inserting the module Syscall-exam-kern, and the user can use the following command to create the device correctly:

$ mknod/dev/mychrdev C ' DMESG | grep "char Device Mychrdev" | Sed ' s/.*major is//g ' 0

The user can then read and write/dev/mychrdev through cat, and the application Syscall-exam-user use Mmap to read the data and use IOCTL to get information about the character device and to cut the data content. It is just an example of how to use an existing system call to implement data interoperability that the user needs.

The following is an example of the results of the author running the module:

$ insmod./syscall-exam-kern.ko
Char device Mychrdev is registered, major is 254
$ mknod/dev/mychrdev C ' DMESG | grep "char Device Mychrdev" | Sed ' s/.*major is//g ' 0
$ cat/dev/mychrdev
$ echo "abcdefghijklmnopqrstuvwxyz" >/dev/mychrdev
$ cat/de V/mychrdev
abcdefghijklmnopqrstuvwxyz
$./syscall-exam-user
User process:syscall-exam-us (1433)
Available space:65509 bytes
Data len:27 bytes
Offset in physical:cc0 bytes
content by Mychrdev />ABCDEFGHIJKLMNOPQRSTUVWXYZ
$ cat/dev/mychrdev
ABCDE
$

VI. NetLink

NetLink is a special socket, it is unique to Linux, similar to the BSD Af_route but far more powerful than it is, currently in the latest Linux kernel (2.6.14) using NetLink for applications and kernel communications, including: routing Daemon (netlink_route), 1-wire subsystem (NETLINK_W1), User State Socket Protocol (NETLINK_USERSOCK), Firewall (netlink_firewall), socket Monitoring (NETLINK_INET_DIAG), NetFilter log (netlink_nflog), IPSec security Policy (NETLINK_XFRM), SELinux event Notification (netlink_selinux), ISCSI Subsystem (NETLINK_ISCSI), Process Audit (netlink_audit), Forwarding Information table query (netlink_fib_lookup), NetLink Connector (netlink_connector), NetFilter subsystem (netlink_netfilter), IPv6 Firewall (NETLINK_IP6_FW), DECnet routing information (NETLINK_DNRTMSG), kernel event notification to user state (Netlink_kobject _uevent), General NetLink (Netlink_generic).

NetLink is a very good way of data transfer between the kernel and the user application, the user state application uses the standard socket API to use the powerful functions provided by NetLink, and the kernel state needs to use a specialized kernel API to use NetLink.

NetLink has the following benefits relative to system calls, IOCTL, and/proc file systems:

1, in order to use NetLink, users only need to add a new type of NetLink protocol definition in include/linux/netlink.h, such as #define Netlink_mytest 17 then, kernel and user state application can immediately pass The Cket API uses the NetLink protocol type for data exchange. But system calls need to add new system calls, IOCTL will need to add devices or files, that requires a lot of code, proc file system needs to add new files or directories under/proc, which will make the confusion of the/proc even more confusing.

2. NetLink is an asynchronous communication mechanism in which messages passed between the kernel and the user state are stored in the socket cache queue, sending messages only to the receiver queue in the receiver's socket without waiting for the recipient to receive the message, but the system calls the IOCTL is the synchronous communication mechanism, if the data passed too long, will affect the scheduling granularity.

3. The kernel part of the NetLink can be implemented in a modular way, with no compile-time dependencies on the application parts and kernel portions of the NetLink, but system calls are dependent, and the implementation of the new system call must be statically connected to the kernel, which cannot be implemented in the module. Applications that use new system calls need to rely on the kernel at compile time.

4. NetLink support multicast, kernel module or application can broadcast the message to a NetLink group, any kernel module or application belonging to the Neilink group can receive the message, the kernel event to the user state notification mechanism used this feature, Any application that is interested in kernel events can receive kernel events sent by the subsystem, and the use of this mechanism will be described in a later article.

5. The kernel can use NetLink to initiate a session first, but system calls and IOCTL can only be invoked by a user application.

6. NetLink uses standard socket APIs and is therefore easy to use, but system calls and IOCTL require specialized training to be used.

user state using NetLink

User-state applications use standard socket APIs, sockets (), bind (), sendmsg (), recvmsg () and close () can easily use NetLink sockets, and the query man page can understand the details of how these functions are used, This article simply explains how users using netlink should use these functions. Note that the application using NetLink must include the header file Linux/netlink.h. Of course socket required header file is also necessary, sys/socket.h.

In order to create a NetLink socket, the user needs to invoke the socket () using the following parameters:

Sockets (Af_netlink, Sock_raw, Netlink_type)

The first parameter must be Af_netlink or Pf_netlink, in Linux, they are actually a thing, it means to use NetLink, the second parameter must be Sock_raw or SOCK_DGRAM, and the third parameter specifies the NetLink protocol type. As described earlier in the user-defined protocol type netlink_mytest, Netlink_generic is a generic protocol type that is used specifically for users, so users can use it directly without having to add new protocol types. The kernel predefined protocol types are:

#define NETLINK_ROUTE 0/* routing/device hook */#define NETLINK_W1 1/* 1-wire subsystem */#define NETLINK_USERSOCK 2/* Reserved for user m ODE SOCKET Protocols * * * #define NETLINK_FIREWALL 3/* firewalling hook * * * #d        Efine Netlink_inet_diag 4/* INET SOCKET monitoring */#define NETLINK_NFLOG 5        /* netfilter/iptables Ulog/#define NETLINK_XFRM 6/* IPSec */#define NETLINK_SELINUX 7           /* SELinux event Notifications/#define NETLINK_ISCSI 8/* OPEN-ISCSI * * #define Netlink_audit 9/* Auditing/#define NETLINK_FIB_LOOKUP #define NETLINK_CONNECTOR #define Netlink_netfi  Lter/* NetFilter subsystem/#define NETLINK_IP6_FW #define NETLINK_DNRTMSG 14/* DECnet Routing Messages * * #define NETLINK_KOBJECT_UEVENT/* Kernel messages to userspace * * * #define NETLINK_GENERIC 16

For each NetLink protocol type, can have up to 32 multicast groups, each multicast group is represented by a bit, the NetLink multicast feature makes sending messages to the same group requires only one system call, thus greatly reducing the number of system calls for applications that require multiple messages.

The function bind () is used to bind an open netlink socket with the NetLink source socket address. The address structure of the netlink socket is as follows:

struct SOCKADDR_NL
{
  sa_family_t    nl_family;
  unsigned short nl_pad;
  __u32          nl_pid;
  __u32          nl_groups;
};

Field nl_family must be set to Af_netlink or Pf_netlink, field Nl_pad is not currently used, so always set to 0, the field is nl_pid the ID of the process that receives or sends the message, and if you want the kernel to process a message or multicast message, set the field is 0, otherwise set to process ID to process the message. Field nl_groups is used to specify a multicast group, which is used to join the calling process to the multicast group specified in the field, and if set to 0, the caller does not join any multicast group.

The Nl_pid field of the address passed to the BIND function should be set to the process ID of this process, which is equivalent to the local address of the netlink socket. However, for multiple threads of a process using the NetLink socket, the field nl_pid can be set to a different value, such as:

Pthread_self () << 16 | Getpid ();

Therefore, the field nl_pid is not necessarily a process ID, it is only used to differentiate between a different receiver or sender's identity, and the user can set the field to suit his or her needs. Function bind is invoked in the following way:

Bind (FD, (struct sockaddr*) &nladdr, sizeof (struct sockaddr_nl));

FD is the file descriptor returned by the previous socket call, and the argument nladdr to the address of the struct SOCKADDR_NL type. In order to send a netlink message to the kernel or other User state application, the target NetLink socket address needs to be populated, at which point the field nl_pid and nl_groups represent the process ID and the multicast group of the receiving message respectively. If the field Nl_pid is set to 0, the message receiver is a kernel or multicast group, and if Nl_groups is 0, the message is a unicast message, otherwise the multicast message is represented. When you use function sendmsg to send NetLink messages, you also need to refer to the structure struct MSGHDR, struct nlmsghdr, and struct Iovec, and the structure struct MSGHDR needs the following settings:

struct MSGHDR msg;
memset (&msg, 0, sizeof (msg));
Msg.msg_name = (void *) & (NLADDR);
Msg.msg_namelen = sizeof (NLADDR);

Where Nladdr is the NetLink address of the message receiver.

struct NLMSGHDR is the NetLink socket's own message header, which is used for all protocol types defined by multiplexing and multipath decomposition NetLink and other controls, NetLink The kernel implementation will use this message header to multiplexing and multipath decomposition has some other control, so it is also known as the NetLink control block. Therefore, the application must provide the message header when sending a NetLink message.

struct NLMSGHDR
{
  __u32 nlmsg_len;   /* Length of message *
  /__u16 nlmsg_type;  /* Message type*/
  __u16 nlmsg_flags/* Additional flags * *
  __u32 nlmsg_seq;   /* Sequence Number * *
  __u32 nlmsg_pid;   /* Sending Process PID *
/};

Field Nlmsg_len Specifies the total length of the message, including the length of the data part immediately following the structure, and the size of the structure, and the field nlmsg_type is used to apply the type of the internal definition message, which is transparent to the NetLink kernel implementation, so it is set to 0 in most cases, and the field Nlmsg_ Flags are used to set message flags, and the available flags include:

* Flags values/*
#define NLM_F_REQUEST           1/       * It is REQUEST message.       *

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More