Use/proc to implement kernel-to-user space communication

Source: Internet
Author: User

Source: http://yfydz.cublog.cn

1. Preface

The communication between Linux kernel space and user space can be achieved through file read/write in the "/proc" directory. If you only control the parameters in the kernel rather than transmitting more data, it is appropriate to use "/proc. Another way of communication between the kernel and the user space is to use the read/write or IOCTL of the kernel device. We will introduce it later. 2. /proc Overview/proc directory is a file system simulated by the system and does not exist on the disk itself. The files in the directory indicate the Kernel Parameter information, which is divided into two categories, one type is writable, And the other type is read-only, it is other directories and files outside the "/proc/sys" directory. Of course, this is just a convention. It is also possible to create readable/proc files in other directories. The operation/proc directory does not require special tools. In the user layer, it is a common file. You can use the "cat" command in shell to view the file information and use the "Echo" command to Write File Information. After Linux kernel 2.4 and later versions, it has become very easy to create the/proc directory file. In earlier versions, we still need to construct the file operation structure. Now we only need to call simple functions. The/proc file is created by using the create_proc_entry () function and deleted using the remove_proc_entry () function. You can call the proc_mkdir () function to create a new directory. These functions are in FS/proc/generic. as defined in C, we usually do not need to directly use the create_proc_entry () function to establish it, but implement it through this function's wrap function. 3. Set config_proc_fs for the kernel compilation option of the read-only/proc file. 3.1 create a/proc read-only/proc file that can be created through the create_proc_read_entry () or create_proc_info_entry () function and called during module initialization ,: /* include/Linux/proc_fs.h */static inline struct proc_dir_entry * create_proc_read_entry (const char * Name,
Mode_t mode, struct proc_dir_entry * base,
Read_proc_t * read_proc, void * Data)
{
Struct proc_dir_entry * res = create_proc_entry (name, mode, base );
If (RES ){
Res-> read_proc = read_proc;
Res-> DATA = data;
}
Return res;
} This function requires five parameters:
Name: name of the file to be created
Mode: file Mode
Base: Directory
Read_proc: This is a function pointer, indicating the function used to read the file content.
Data: USER parameter pointer passed to the read_proc function static inline struct proc_dir_entry * create_proc_info_entry (const char * Name,
Mode_t mode, struct proc_dir_entry * base, get_info_t * get_info)
{
Struct proc_dir_entry * res = create_proc_entry (name, mode, base );
If (RES) RES-> get_info = get_info;
Return res;
} This function requires four parameters:
Name: name of the file to be created
Mode: file Mode
Base: Directory
Get_info: This is a function pointer that refers to the function that reads the file content. This function is less than the read_proc function above, the kernel has predefined some directories/proc/NET,/procbus,/proc/FS,/proc/driver, which are in FS/proc/root. defined in C: struct proc_dir_entry * proc_net, * proc_bus, * proc_root_fs, * proc_root_driver; 3.2 delete/proc read-only/proc files, which can be created through the remove_proc_entry () function, called when the module is deleted. This function is available in FS/proc/generic. defined in C: void remove_proc_entry (const char * Name, struct proc_dir_entry * parent) This function requires two parameters:
Name: name of the file to be created
Parent: parent directory 3.3 network-related/proc creation and deletion for network parameters (/proc/net), the kernel provides proc_net_create () and proc_net_remove () wrap the function to create and delete the/proc/NET file: static inline struct proc_dir_entry * proc_net_create (const char * Name,
Mode_t mode, get_info_t * get_info)
{
Return create_proc_info_entry (name, mode, proc_net, get_info );
} Static inline void proc_net_remove (const char * name)
{
Remove_proc_entry (name, proc_net );
} Proc_net is the pointer to the predefined "/proc/net" directory. In this way, you can directly call these two functions when creating read-only files in the network. 3.4 net/IPv4/af_inet.c:
...
// Create a/proc/NET/netstat File
Proc_net_create ("netstat", 0, netstat_get_info );
... The netstat_get_info () function is in net/IPv4/proc. as defined in file C, the parameter format of the function is fixed: // buffer is the buffer for data output, and all data to be output is written to this buffer;
// Start is used to return the position of the starting data in the buffer;
// Offset specifies the offset between the start point and the start point of the buffer. The actual start value is calculated using the buffer and // offset values;
// Length indicates the buffer length, which is allocated by the kernel itself. During programming, check whether the length of data written to the buffer exceeds the length limit. Int netstat_get_info (char * buffer, char ** start, off_t offset, int length)
{
Int Len, I;
// Len records the length of data written into the buffer. All data lengths must be accumulated.
Len = sprintf (buffer,
"Tcpext: syncookiessent syncookiesrecv syncookiesfailed"
"Embryonicrsts prunecalled rcvpruned ofopruned"
"Outofwindowicmps lockdroppedicmps arpfilter"
"TW twrecycled twkilled"
"Pawspassive pawsactive pawsestab"
"Delayedacks delayedacklocked delayedacklost"
"Listenoverflows listendrops"
"Tcpprequeued tcpdirectcopyfrombacklog"
"Tcpdirectcopyfromprequeue tcpprequeuedropped"
"Tcphphits tcphitstouser"
"Tcppureacks tcphpacks"
"Tcprenorecovery tcpsackrecovery"
"Tcpsackreneging"
"Tcpfackreorder tcpsackreorder tcprenoreorder tcptsreorder"
"Tcpfullundo tcppartialundo tcpdsackundo tcplossundo"
"Tcploss tcplostretransmit"
"Tcprenofailures tcpsackfailures tcplossfailures"
"Tcpfastretrans tcpforwardretrans tcpslowstartretrans"
"Tcptimeouts"
"Tcprenorecoveryfail tcpsackrecoveryfail"
"Tcpschedulerfailed tcprcvcollapsed"
"Tcpdsackoldsent tcpdsackofosent tcpdsackrecv tcpdsackoforecv"
"Tcpabortonsyn tcpabortondata tcpabortonclose"
"Tcpabortonmemory tcpabortontimeout tcpabortonlinger"
"Tcpabortfailed tcpmemorypressures/N"
"Tcpext :");
For (I = 0; I <offsetof (struct linux_mib, _ pad)/sizeof (unsigned long); I ++)
Len + = sprintf (buffer + Len, "% lu", fold_field (unsigned long *) net_statistics, sizeof (struct linux_mib), I )); len + = sprintf (buffer + Len, "/N"); If (Offset> = Len)
{
* Start = buffer;
Return 0;
}
// Calculate the start pointer of the data
* Start = buffer + offset;
Len-= offset; // check whether the write length overflows.
If (LEN> length)
Len = length;
If (LEN <0)
Len = 0;
Return Len;
} 4. The Readable/proc file must support the readable/proc file, and the kernel compilation option should be set to config_sysctl. The read/write/proc files are normally stored in the/proc/sys directory. The kernel parameters, global variables, and dynamically allocated memory space of these files cannot be temporary variables. 4.1 create a function to create a readable/proc file and use the register_sysctl_table () function to register it. This function is available in kernel/sysctl. the statement defined in C is as follows: struct ctl_table_header * register_sysctl_table (ctl_table * Table, int insert_at_head); this function returns a pointer of the struct ctl_table_header structure, which is used for release; the first table parameter of this function is the sysctl control table, which is defined as follows:/* include/Linux/sysctl. H */typedef struct ctl_table; struct ctl_table
{
Int ctl_name;/* ID of the value */
Const char * procname;/* name */
Void * data;/* for Kernel Parameters */
Int maxlen;/* storage space occupied by this parameter */
Mode_t mode;/* permission mode: rwxrwxrwx */
Ctl_table * child;/* subdirectory table */
Proc_handler * proc_handler;/* callback function for read/write data processing */
Ctl_handler * strategy;/* The callback function for reading/writing, which is used to pre-process data,
This function is executed before reading or writing operations, and the return value of this function
<0 indicates an error; = 0 indicates correct. Continue reading or writing;> 0 table
The read/write operations have been completed in the function. You can directly return */
Struct proc_dir_entry * de;/*/proc control block pointer */
Void * extra1;/* additional parameter, which is often used to indicate the maximum and minimum values */
Void * extra2;
}; Note that the 6th parameter sub-directory table in this structure makes the table a tree structure. The second parameter indicates the insert method of the linked list, whether it is inserted to the head of the linked list or the end of the linked list. It can be seen that it is important to fill in the struct ctl_table structure, and the most important is the structure item proc_handler, this function processes the input and output of data. If it is not a directory but a file, this item is indispensable. In earlier kernel versions, these must be compiled separately. Now, after 2.4, the kernel provides some functions to complete most of the data input and output functions: // process string data.
Extern int proc_dostring (ctl_table *, Int, struct file *,
Void *, size_t *);
// Process integer Vectors
Extern int proc_dointvec (ctl_table *, Int, struct file *,
Void *, size_t *);
// Process the integer vector, but the INIT process is slightly different.
Extern int proc_dointvec_bset (ctl_table *, Int, struct file *,
Void *, size_t *);
// Process the integer vector in the form of the maximum and minimum values
Extern int proc_dointvec_minmax (ctl_table *, Int, struct file *,
Void *, size_t *);
// Process the non-conforming long integer vector in the form of the maximum and minimum values
Extern int proc_doulongvec_minmax (ctl_table *, Int, struct file *,
Void *, size_t *);
// Process the integer vector, but the user data is converted to the jiffies value as the number of seconds, which is often used for time control.
Extern int proc_dointvec_jiffies (ctl_table *, Int, struct file *,
Void *, size_t *);
// Process a non-conforming long integer vector. The user data is converted into a jiffies value as a millisecond value, which is often used for time control.
Extern int proc_doulongvec_ms_jiffies_minmax (ctl_table * Table, Int,
Struct file *, void *, size_t *); for example, the following code is taken from net/IPv4/Netfilter/ip_conntrack_standalone.c: static ctl_table ip_ct_sysctl_table [] = {
{Net_ipv4_nf_conntrack_max, "ip_conntrack_max ",
& Ip_conntrack_max, sizeof (INT), 0644, null,
& Proc_dointvec },
{Net_ipv4_nf_conntrack_buckets, "ip_conntrack_buckets ",
& Ip_conntrack_htable_size, sizeof (unsigned INT), 0444, null,
& Proc_dointvec },
{Net_ipv4_nf_conntrack_tcp_timeout_syn_sent, "ip_conntrack_tcp_timeout_syn_sent ",
& Ip_ct_tcp_timeout_syn_sent, sizeof (unsigned INT), 0644, null,
& Proc_dointvec_jiffies },
......
{0}
}; Static ctl_table ip_ct_netfilter_table [] = {
{Net_00004_netfilter, "netfilter", null, 0, 0555, ip_ct_sysctl_table, 0, 0, 0, 0 },
{Net_ip_conntrack_max, "ip_conntrack_max ",
& Ip_conntrack_max, sizeof (INT), 0644, null,
& Proc_dointvec },
{0}
}; Static ctl_table ip_ct_1_4_table [] = {
{Net_ipv4, "IPv4", null, 0, 0555, ip_ct_netfilter_table, 0, 0, 0, 0 },
{0}
}; Static ctl_table ip_ct_net_table [] = {
{Ctl_net, "Net", null, 0, 0555, ip_ct_1_4_table, 0, 0, 0, 0 },
{0}
}; Static int init_or_cleanup (INT init)
{
...
Ip_ct_sysctl_header = register_sysctl_table (ip_ct_net_table, 0 );
...
} Some/proc/sys file control is complicated. The input of parameters is actually a trigger information to execute a series of operations. In this case, these default handler functions are insufficient, you need to write the proc_handle and strategy functions in the ctl_table structure separately. For example, for the/proc/sys/NET/IPv4/ip_forward file, the corresponding Kernel Parameter is ipv4_devconf.forwarding. If this parameter is changed, the forwarding attribute value of all Nic devices is changed, as defined below: /* Net/IPv4/sysctl_net_ipv4.c */static
Int limit 4_sysctl_forward (ctl_table * CTL, int write, struct file * filp,
Void * buffer, size_t * lenp)
{
// Keep the current forwarding Value
Int val = ipv4_devconf.forwarding;
Int ret;
// Complete the read/write operation of/proc/sys. If it is a write operation, the forwarding value has been changed to a new value.
Ret = proc_dointvec (CTL, write, filp, buffer, lenp); // write operation, change the forwarding value, and use the new forwarding value to modify the forwarding attribute of all NICs
If (write & lt; 4_devconf.forwarding! = Val)
Inet_forward_change (ipv4_devconf.forwarding); return ret;
} Static int defaults 4_sysctl_forward_strategy (ctl_table * Table, int * Name, int nlen,
Void * oldval, size_t * oldlenp,
Void * newval, size_t newlen,
Void ** context)
{
Int new;
If (newlen! = Sizeof (INT ))
Return-einval;
If (get_user (new, (int *) newval ))
Return-efault;
If (New! = Ipv4_devconf.forwarding)
Inet_forward_change (new );
// After assigning the forwarding value to a new value, you should be able to return a number greater than 0. Currently, if you do not assign a value, you can only return 0 to continue.
// However, this strategy function does not seem necessary. The proc_handler function above can be processed.
Return 0;/* caller does change again and handles oldval */
} Ctl_table limit 4_table [] = {
......
{Net_ipv4_forward, "ip_forward ",
& Ipv4_devconf.forwarding, sizeof (INT), 0644, null,
& Ipv4_sysctl_forward, & ipv4_sysctl_forward_strategy },
...... 4.2 release the function to release readable/proc files using the unregister_sysctl_table () function. This function is available in kernel/sysctl. the declaration in C is as follows: The Void unregister_sysctl_table (struct ctl_table_header * Header) parameter is the structure pointer of the returned struct ctl_table_header during creation, which is usually called in the module release function. 5. Conclusion/proc programming in the kernel is now very simple. It is suitable to control the/proc directory as a single kernel parameter, but it is not suitable for mass data transmission.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.