Linux Netlink Mechanism

Last Update:2018-12-07 Source: Internet

Author: User

Tags sendmsg

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Netlink is a special socket, which is unique in Linux. It is similar to af_route in BSD but far more powerful than Netlink. Currently, Netlink is in the latest Linux kernel (2.6.14) there are many applications that use Netlink to communicate with the kernel, including: routing daemon (netlink_route), 1-wire subsystem (netlink_w1), user-mode SOCKET protocol (netlink_usersock ), firewall (netlink_firewall), socket monitoring (netlink_inet_diag), netfilter log (netlink_nflog), IPSec Security Policy (netlink_xfrm), SELinux Event Notification (netlink_selinux), iSCSI subsystem (netlink_iscsi ), process audit (netlink_audit), forwarding info Table query (netlink_fib_loo Kup), Netlink connector (netlink_connector), netfilter subsystem (netlink_netfilter), IPv6 firewall (netlink_ip6_fw), decnet route information (netlink_dnrtmsg), kernel Event Notification to user State (netlink_kobject_uevent ), general Netlink (netlink_generic ).

Netlink is a very good way to transmit two-way data between the kernel and your applications. Users can use the powerful functions provided by Netlink by using standard socket APIs, the kernel state requires a dedicated kernel API to use Netlink.

Netlink has the following advantages over system calls, ioctl, And/proc file systems:

1. To use Netlink, you only need. add a new type of Netlink Protocol definition in H, for example, # define netlink_mytest 17. Then, the kernel and user-State applications can use this Netlink protocol type for data exchange through the socket API immediately. However, new system calls are required for system calls, while IOCTL requires devices or files, which requires a lotCode, The proc file system needs to add a new file or directory under/proc, which will make the original/proc more chaotic.

2. netlink is an asynchronous communication mechanism. Messages transmitted between the kernel and user-State applications are stored in the socket cache queue. messages are sent only to the receiving queue of the receiver's socket, you do not need to wait for the receiver to receive the message, but the system call and IOCTL are synchronous communication mechanisms. If the transmitted data is too long, the scheduling granularity will be affected.

3. the kernel part of Netlink can be implemented through modules. The application part of Netlink does not depend on the kernel part during compilation, but the system call depends on it, in addition, the implementation of the new system call must be statically connected to the kernel, which cannot be implemented in the module. Applications called by the new system must depend on the kernel during compilation.

4. netlink supports multicast. The kernel module or application can multicast messages to a Netlink group. Any kernel module or application in the neilink group can receive the message, the kernel event notification mechanism uses this feature. Any application interested in the kernel event can receive the kernel event sent by this subsystem.ArticleWe will introduce the use of this mechanism.

5. the kernel can use Netlink to initiate a session first, but the system call and IOCTL can only be called by the user application.

6. Netlink uses the standard socket API, so it is easy to use. However, system calls and IOCTL require special training before they can be used.

Use Netlink in user mode

User-Mode Applications can easily use netlink socket by using standard socket APIs, socket (), BIND (), sendmsg (), recvmsg (), and close, query the manual page to learn how to use these functions. This article only describes how Netlink users should use these functions. Note: applications using Netlink must contain the header file Linux/Netlink. h. Of course, the header file required by socket is also essential, sys/socket. h.

To create a netlink socket, you must use the following parameters to call socket ():

Socket (af_netlink, sock_raw, netlink_type)

The first parameter must be af_netlink or pf_netlink. in Linux, they are actually one thing. It indicates that Netlink is used, and the second parameter must be sock_raw or sock_dgram, the third parameter specifies the Netlink protocol type. For example, the User-Defined Protocol Type netlink_mytest and netlink_generic are common protocol types, which are used by users. Therefore, you can use it directly without adding a new protocol type. The predefined kernel protocol types include:

# Define netlink_route 0/* routing/device hook */
# Define netlink_w1 1/* 1-wire subsystem */
# Define netlink_usersock 2/* Reserved for user mode socket protocols */
# Define netlink_firewall 3/* firewalling hook */
# Define netlink_inet_diag 4/* Inet socket monitoring */
# Define netlink_nflog 5/* Netfilter/iptables ulog */
# Define netlink_xfrm 6/* IPSec */
# Define netlink_selinux 7/* SELinux Event Notifications */
# Define netlink_iscsi 8/* Open-iSCSI */
# Define netlink_audit 9/* auditing */
# Define netlink_fib_lookup 10
# Define netlink_connector 11
# Define netlink_netfilter 12/* netfilter subsystem */
# Define netlink_ip6_fw 13
# Define netlink_dnrtmsg 14/* decnet routing messages */
# Define netlink_kobject_uevent 15/* kernel messages to userspace */
# Define netlink_generic 16

For each Netlink protocol type, there can be up to 32 multicast groups, each multicast group is represented by a single bit, the multicast feature of Netlink allows only one system call to send messages to the same group. Therefore, it greatly reduces the number of system calls for applications that require multiple messages.

BIND () is used to bind an opened netlink socket to The Netlink source socket address. The address structure of netlink socket is as follows:

Struct sockaddr_nl
{
Sa_family_t nl_family;
Unsigned short nl_pad;
_ U32 nl_pid;
_ U32 nl_groups;
};

The nl_family field must be set to af_netlink or pf_netlink. The nl_pad field is not currently used. Therefore, it must always be set to 0. The nl_pid field is the ID of the process that receives or sends messages, if you want the kernel to process messages or multicast messages, set this field to 0; otherwise, set it to the process ID for message processing. The nl_groups field is used to specify multicast groups. The BIND function is used to add the calling process to the multicast group specified by this field. If it is set to 0, the caller is not added to any multicast group.

The nl_pid field of the address passed to the BIND function should be set as the process ID, which is equivalent to the local address of netlink socket. However, when multiple threads of a process use netlink socket, The nl_pid field can be set to another value, for example:

Pthread_self () <16 | getpid ();

Therefore, the nl_pid field is not necessarily a process ID. It is only used to identify different receivers or senders. You can set this field as needed. The call method of the function BIND is as follows:

BIND (FD, (struct sockaddr *) & nladdr, sizeof (struct sockaddr_nl ));

FD is the file descriptor returned by the previous SOCKET call. The nladdr parameter is a struct sockaddr_nl address. To send a Netlink message to the kernel or other user applications, you need to fill in the target netlink socket address. At this time, the fields nl_pid and nl_groups indicate the process ID and multicast group of the receiver respectively. If the field nl_pid is set to 0, the message receiver is the kernel or multicast group. If the value of nl_groups is 0, the message is a unicast message. Otherwise, the message is a multicast message. When sending a Netlink message using the sendmsg function, you also need to reference the struct msghdr, struct nlmsghdr, and struct iovec structures. The struct msghdr must be set as follows:

Struct msghdr MSG;
Memset (& MSG, 0, sizeof (MSG ));
MSG. msg_name = (void *) & (nladdr );
MSG. msg_namelen = sizeof (nladdr );

The nladdr is the Netlink address of the message receiver.

Struct nlmsghdr is the message header of the Netlink socket, which is used for multiplexing and multiplexing of all protocol types defined by Netlink and other controls, the kernel Implementation of Netlink will use this message header for multiplexing and multi-channel decomposition. It is also called a Netlink control block. Therefore, the application must provide this header when sending a Netlink message.

Struct nlmsghdr
{
_ U32 nlmsg_len;/* length of Message */
_ 2010nlmsg_type;/* Message Type */
_ 2010nlmsg_flags;/* Additional flags */
_ U32 nlmsg_seq;/* sequence number */
_ U32 nlmsg_pid;/* Sending process PID */
};

The nlmsg_len field specifies the total length of the message, including the length of the data section that follows the structure and the size of the structure. The nlmsg_type field is used to define the message type within the application, it is transparent to the Netlink kernel, so it is set to 0 in most cases, and the field nlmsg_flags is used to set the message flag. Available flags include:

/* flags values */
 # define nlm_f_request 1/* It is request message. */
 # define nlm_f_multi 2/* multipart message, terminated by nlmsg_done */
 # define nlm_f_ack 4/* reply with Ack, with zero or error code */
 # define nlm_f_echo 8/* echo this request */
/* modifiers to get request */
 # define nlm_f_root 0 x 100/* specify tree root */
 # define nlm_f_match 0x200/* return all matching */
 # define nlm_f_atomic 0x400/* atomic get */< BR> # define nlm_f_dump (nlm_f_root | nlm_f_match) 
/* modifiers to new request */
 # define nlm_f_replace 0x100/* override existing */
 # define nlm_f_excl 0x200/* Do not touch, if it exists */
 # deprecision nlm_f_create 0x400/* create, if it does not exist */
 # define nlm_f_append 0x800/* Add to end of list */

The nlm_f_request flag is used to indicate that a message is a request. This flag should be set for all messages first initiated by the application.

The nlm_f_multi flag indicates that the message is a part of a multi-part message. Subsequent messages can be obtained through the macro nlmsg_next.

Macro nlm_f_ack indicates that the message is the response of the previous request message. The serial number and process ID can associate the request with the response.

Nlm_f_echo indicates that the message is returned by a related package.

Indicates that nlm_f_root is used by various data acquisition operations of many Netlink protocols. This indicates that the requested data table should be returned to the user application as a whole, rather than an entry. A request with this flag usually sets the nlm_f_multi flag for the response message. Note: When this flag is set, the request is protocol-specific. Therefore, you must specify the protocol type in the nlmsg_type field.

Nlm_f_match indicates that only one data subset is required for a specific request of the Protocol. The data subset is matched by a specific filter of the specified protocol.

The nlm_f_atomic flag indicates that the data returned by the request should be collected atomically, which prevents the data from being modified during the acquisition.

Indicates that nlm_f_dump is not implemented.

Nlm_f_replace is used to replace existing entries in the data table.

The nlm_f_excl _ flag is used with create and append. If an entry already exists, it will fail.

The nlm_f_create flag indicates that an entry should be created in the specified table.

Nlm_f_append indicates adding a new entry to the end of the table.

The kernel needs to read and modify these flags. For general usage, you can set it to 0, only some advanced applications (such as netfilter and daemon require complex operations). The nlmsg_seq and nlmsg_pid fields are used to track messages. The former indicates the sequence number, and the latter indicates the process ID of the message source. The following is an example:

# Define max_msgsize 1024
Char buffer [] = "An Example Message ";
Struct nlmsghdr nlhdr;
Nlhdr = (struct nlmsghdr *) malloc (nlmsg_space (max_msgsize ));
Strcpy (nlmsg_data (nlhdr), buffer );
Nlhdr-> nlmsg_len = nlmsg_length (strlen (buffer ));
Nlhdr-> nlmsg_pid = getpid ();/* Self PID */
Nlhdr-> nlmsg_flags = 0;

The structure struct iovec is used to send multiple messages through one system call. The following is an example of this structure:

Struct iovec IOV;
IOV. iov_base = (void *) nlhdr;
IOV. iov_len = NlH-> nlmsg_len;
MSG. msg_iov = & IOV;
MSG. msg_iovlen = 1;

After completing the preceding steps, the message can be directly sent using the following statement:

Sendmsg (FD, & MSG, 0 );

When receiving a message, an application needs to allocate a large enough cache to save the message header and the data part of the message, and then fill the message header. After adding the cache, the application can directly call the recvmsg () function ().

# Define max_nl_msg_len 1024
Struct sockaddr_nl nladdr;
Struct msghdr MSG;
Struct iovec IOV;
Struct nlmsghdr * nlhdr;
Nlhdr = (struct nlmsghdr *) malloc (max_nl_msg_len );
IOV. iov_base = (void *) nlhdr;
IOV. iov_len = max_nl_msg_len;
MSG. msg_name = (void *) & (nladdr );
MSG. msg_namelen = sizeof (nladdr );
MSG. msg_iov = & IOV;
MSG. msg_iovlen = 1;
Recvmsg (FD, & MSG, 0 );

Note: FD is the Netlink socket descriptor opened by the SOCKET call.

After receiving a message, nlhdr points to the message header of the received message. nladdr saves the destination address of the received message, and the macro nlmsg_data (nlhdr) returns a pointer to the data part of the message.

In Linux/Netlink. H, some macros are defined to facilitate message processing. These macros include:

# Define nlmsg_alignto 4
# Define nlmsg_align (LEN) + NLMSG_ALIGNTO-1 )&~ (NLMSG_ALIGNTO-1 ))

The macro nlmsg_align (LEN) is used to obtain the minimum value not smaller than Len and the byte align.

# Define nlmsg_length (LEN) + nlmsg_align (sizeof (struct nlmsghdr )))

The macro nlmsg_length (LEN) is used to calculate the actual message length when the data part length is Len. It is generally used to allocate message cache.

# Define nlmsg_space (LEN) nlmsg_align (nlmsg_length (LEN ))

The macro nlmsg_space (LEN) returns the minimum value not less than nlmsg_length (LEN) and the byte alignment. It is also used to allocate message cache.

# Define nlmsg_data (NLH) (void *) (char *) NLH) + nlmsg_length (0 )))

The macro nlmsg_data (NLH) is used to obtain the first address of the Data part of the message. This macro is used to set and read the data part of the message.

# Define nlmsg_next (NLH, Len) (LEN)-= nlmsg_align (NLH)-> nlmsg_len ),\
(Struct nlmsghdr *) (char *) (NLH) + nlmsg_align (NLH)-> nlmsg_len )))

Macro nlmsg_next (NLH, Len) is used to obtain the first address of the next message, and Len is also reduced to the total length of the remaining message, this macro is generally used when a message is divided into several parts for sending or receiving.

# Define nlmsg_ OK (NLH, Len) (LEN)> = (INT) sizeof (struct nlmsghdr )&&\
(NLH)-> nlmsg_len> = sizeof (struct nlmsghdr )&&\
(NLH)-> nlmsg_len <= (LEN ))

The macro nlmsg_ OK (NLH, Len) is used to determine whether the message has the length of Len.

# Define nlmsg_payload (NLH, Len) (NLH)-> nlmsg_len-nlmsg_space (LEN )))

Macro nlmsg_payload (NLH, Len) is used to return the length of payload.

Function close is used to close the opened netlink socket.

Netlink kernel API

The Netlink kernel is implemented in the. c file net/CORE/af_netlink.c. To use Netlink, the kernel module must contain the header file Linux/Netlink. h. The use of Netlink in the kernel requires special APIs, which is totally different from the use of Netlink by user-mode applications. If you need to add a new Netlink negotiation type, you must modify the Linux/Netlink. h. Of course, the current Netlink implementation already includes a common protocol type netlink_generic for your convenience. You can use it directly without adding a new protocol type. As mentioned above, to add a new Netlink protocol type, you only need to add the following definition to Linux/Netlink. h:

# Define netlink_mytest 17

After this definition is added, You can reference the Protocol anywhere in the kernel.

In the kernel, to create a netlink socket, you need to call the following functions:

Struct sock *
Netlink_kernel_create (INT unit, void (* input) (struct sock * SK, int Len ));

The parameter unit indicates the Netlink protocol type, such as netlink_mytest. The input parameter is the Netlink message processing function defined by the kernel module. When any message arrives at the Netlink socket, the input function pointer will be referenced. The SK parameter of the function pointer input is actually the struct sock pointer returned by the function netlink_kernel_create. Sock is actually a socket kernel that represents the data structure, the socket created by the user-mode application also has a struct sock structure in the kernel. The following is an example of an input function:

Void input (struct sock * SK, int Len)
{
Struct sk_buff * SKB;
Struct nlmsghdr * NlH = NULL;
U8 * Data = NULL;
While (SKB = skb_dequeue (& SK-> receive_queue ))
! = NULL ){
/* Process Netlink message pointed by SKB-> data */
NlH = (struct nlmsghdr *) SKB-> data;
Data = nlmsg_data (NLH );
/* Process Netlink message with header pointed
* NlH and data pointed by data
*/
}
}

The function input () is called when the sending process executes sendmsg (), so that messages are processed in a timely manner. However, if the message is too long, the system calls sendmsg () in this case, a kernel thread can be defined to be specifically responsible for message receiving, and the function input only wakes up the kernel thread, so that sendmsg will return soon.

Function SKB = skb_dequeue (& SK-> receive_queue) is used to obtain the messages in the socket SK receiving queue. A structure of struct sk_buff is returned, and SKB-> data points to the actual Netlink message.

The skb_recv_datagram (nl_sk) function is also used to receive messages on the Netlink socket nl_sk. Different from skb_dequeue, it is pointed out that if there is no message in the socket receiving queue, it will cause the calling process to sleep and wait for the queue nl_sk-> sk_sleep. Therefore, it must be used in the process context. The kernel thread just mentioned can use this method to receive messages.

The following function input is used as an example:

Void input (struct sock * SK, int Len)
{
Wake_up_interruptible (SK-> sk_sleep );
}

When sending a Netlink message in the kernel, you also need to set the destination address and source address. messages in the kernel are managed by struct sk_buff. A macro is defined in Linux/Netlink. h:

# Define netlink_cb (SKB) (* (struct netlink_skb_parms *) & (SKB)-> CB ))

To facilitate Message Address Settings. The following is an example of Message Address Setting:

Netlink_cb (SKB). PID = 0;
Netlink_cb (SKB). dst_pid = 0;
Netlink_cb (SKB). dst_group = 1;

The field PID indicates the ID of the message sender process, that is, the source address. For the kernel, It is 0, and dst_pid indicates the ID of the Message Receiver process, that is, the target address. If the target is a group or kernel, it is set to 0; otherwise, dst_group indicates the target group address. If the target is a process or kernel, dst_group should be set to 0.

In the kernel, the module calls the netlink_unicast function to send a unicast message:

Int netlink_unicast (struct sock * SK, struct sk_buff * SKB, u32 PID, int nonblock );

The SK parameter is the socket returned by the function netlink_kernel_create (). The SKB parameter stores the message. Its data field points to the structure of the Netlink message to be sent, and the SKB control block stores the Message address information, the preceding macro netlink_cb (SKB) is used to conveniently set the control block. The PID parameter is the PID of the message receiving process. The nonblock parameter indicates whether the function is non-blocking. If it is 1, this function returns immediately when no cache is available. If it is 0, this function sleeps when no cache is available.

The kernel module or subsystem can also use the netlink_broadcast function to send broadcast messages:

Void netlink_broadcast (struct sock * SK, struct sk_buff * SKB, u32 PID, u32 group, int allocation );

The preceding three parameters are the same as those of netlink_unicast. The group parameter is the multicast group for receiving messages. Each of these parameters represents a multicast group. Therefore, if a multicast group is sent to Multiple Multicast Groups, set this parameter to the bit or of multiple multicast group IDs. The allocation parameter is the kernel memory allocation type. Generally, it is gfp_atomic or gfp_kernel. gfp_atomic is used for atomic context (that is, sleep is not allowed), while gfp_kernel is used for non-atomic context.

Use the sock_release function in the kernel to release the Netlink socket created by the function netlink_kernel_create:

Void sock_release (struct socket * sock );

Note that the type returned by the netlink_kernel_create () function is struct sock. Therefore, the sock_release function should be called like this:

Sock_release (SK-> sk_socket );

SK is the return value of the function netlink_kernel_create.

In the Source Code package , an example of Netlink is provided, it includes a kernel module netlink-exam-kern.c and two applications Program netlink-exam-user-recv.c, netlink-exam-user-send.c. The kernel module must first be inserted into the kernel, and then run the user-mode receiving program on one terminal, and run the user-mode sending program on the other terminal, the sender reads the text file specified by the parameter and sends it to the kernel module as the content of the Netlink message. The kernel module accepts the message and saves it to the kernel cache, it also exports to procfs through the proc interface, so you can also view all the content through/proc/netlink_exam_buffer, and the kernel also sends the message to the user-mode receiver, the user-mode receiver will output the received content to the screen.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More