LINUX NetLink mechanism

Last Update:2015-08-31 Source: Internet

Author: User

Tags modifiers sendmsg

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

LINUX NetLink mechanism

NetLink is a special socket that is unique to Linux, similar to the BSD Af_route but is far more powerful than it is, and is currently used in the latest Linux kernel (2.6.14) for applications with kernel communication, including: routing Daemon (netlink_route), 1-wire subsystem (NETLINK_W1), user-State socket Protocol (NETLINK_USERSOCK), Firewall (netlink_firewall), socket Monitoring (NETLINK_INET_DIAG), NetFilter log (netlink_nflog), IPSec security Policy (NETLINK_XFRM), SELinux event Notification (netlink_selinux), ISCSI Subsystem (NETLINK_ISCSI), Process Audit (netlink_audit), Forwarding Information table query (netlink_fib_lookup), NetLink Connector (netlink_connector), NetFilter subsystem (netlink_netfilter), IPv6 Firewall (NETLINK_IP6_FW), DECnet routing information (NETLINK_DNRTMSG), kernel events to the User state notification (netlink_kobject _uevent), Universal NetLink (Netlink_generic).

NetLink is a great way to transfer data between the kernel and the user application using the standard socket API to use the power of the NetLink, which requires a dedicated kernel API to use the NetLink.

NetLink has the following advantages over system calls, IOCTL, and/proc file systems:

1, in order to use NetLink, the user only need to add a new type of NetLink protocol definition in include/linux/netlink.h, such as #define Netlink_mytest 17 then, the kernel and the user state application can immediately pass so The Cket API uses this NetLink protocol type for data exchange. But the system calls need to add new system calls, the IOCTL need to add equipment or files, that requires a lot of code, proc file system needs to add new files or directories under/proc, which will make the chaotic/proc more chaotic.

2. NetLink is an asynchronous communication mechanism in which messages passed between the kernel and the user-state application are stored in the socket cache queue, and the message is sent only to the receiving queue of the receiver's socket, without waiting for the receiver to receive the message, but System call and IOCTL is synchronous communication mechanism , if the data passed too long, will affect the scheduling granularity.

3. The kernel portion of the NetLink can be implemented in a modular way, with the application part of the NetLink and the kernel part without compile-time dependencies, but the system calls have dependencies, and the implementation of the new system call must be statically connected to the kernel, which cannot be implemented in the module. Apps that are called with the new system need to rely on the kernel at compile time.

4. netlink Support Multicast, kernel modules or applications can multicast messages to a NetLink group, any kernel module or application belonging to the Neilink group can receive the message , the kernel event to the user state notification mechanism used this feature, Any application that is interested in kernel events can receive kernel events sent by that subsystem, and the use of this mechanism will be described in a later article.

5. the kernel can use NetLink to initiate sessions first, but system calls and IOCTL can only be invoked by the user application .

6. NetLink uses the standard socket API, so it's easy to use, but system calls and IOCTL require specialized training to use.

User Configuration using NetLink

User-State applications use the standard socket APIs, sockets (), bind (), sendmsg (), recvmsg (), and close () to easily use the netlink socket, and query the man page for details on how these functions are used, This article simply explains how users using netlink should use these functions. Note that an app that uses NetLink must contain the header file Linux/netlink.h. Of course the socket needs the header file is also necessary, sys/socket.h.

In order to create a NetLink socket, the user needs to call the socket () using the following parameters:

Sockets (Af_netlink, Sock_raw, Netlink_type)

The first parameter must be Af_netlink or Pf_netlink, in Linux, they are actually a thing, it means to use NetLink, the second parameter must be Sock_raw or SOCK_DGRAM, the third parameter specifies the NetLink protocol type, As in the previous user-defined protocol type netlink_mytest, Netlink_generic is a generic protocol type that is intended for use by users, so that users can use it directly without having to add new protocol types. The kernel pre-defined protocol types are:

#define NETLINK_ROUTE 0 */routing/device hook */
#define NETLINK_W1 1/* 1-wire subsystem */
#define NETLINK_USERSOCK 2/* Reserved for user mode socket protocols */
#define NETLINK_FIREWALL 3 */firewalling hook */
#define NETLINK_INET_DIAG 4/* INET Socket monitoring */
#define NETLINK_NFLOG 5/* netfilter/iptables Ulog */
#define NETLINK_XFRM 6/* IPSec */
#define NETLINK_SELINUX 7/* SELINUX Event Notifications */
#define NETLINK_ISCSI 8/* OPEN-ISCSI */
#define NETLINK_AUDIT 9/* Auditing */
#define NETLINK_FIB_LOOKUP 10
#define Netlink_connector 11
#define NETLINK_NETFILTER/* NetFilter subsystem */
#define NETLINK_IP6_FW 13
#define NETLINK_DNRTMSG/* DECnet Routing messages */
#define Netlink_kobject_uevent */Kernel messages to userspace */
#define Netlink_generic 16

For each NetLink protocol type, there can be as many as 32 multicast groups, each of which is represented by a bit, and the multicast feature of NetLink makes sending messages to the same group only requires one system call, thus greatly reducing the number of system calls for applications that require multiple messages.

The function bind () binds an open netlink socket to the NetLink source socket address. The address structure of the netlink socket is as follows:

struct SOCKADDR_NL
{
  sa_family_t    nl_family;
  unsigned short nl_pad;
  __u32          Nl_pid;
  __u32          nl_groups;
};

Field nl_family must be set to Af_netlink or Pf_netlink, field Nl_pad is not currently used, so always set to 0, field Nl_pid is the ID of the process that receives or sends the message, and if you want the kernel to process messages or multicast messages, set the field is 0, otherwise set to process ID to process the message. The field nl_groups is used to specify a multicast group, and the BIND function is used to join the calling process to the multicast group specified in the field, if set to 0, indicating that the caller does not join any multicast group.

The Nl_pid field of the address passed to the BIND function should be set to the process ID of the process, which is equivalent to the local address of the netlink socket. However, for cases where multiple threads of a process use the NetLink socket, the field nl_pid can be set to another value, such as:

Pthread_self () << 16 | Getpid ();

So the field nl_pid is actually not necessarily a process ID, it's just an identity that distinguishes between different receivers or senders, and the user can set the field to suit their needs. The function bind is called in the following way:

Bind (FD, (struct sockaddr*) &nladdr, sizeof (struct sockaddr_nl));

FD returns the file descriptor for the preceding socket call, and the parameter nladdr is the address of the struct SOCKADDR_NL type. In order to send a netlink message to the kernel or other user-state applications, the target NetLink socket address needs to be populated, at which point the fields Nl_pid and nl_groups each represent the process ID and multicast group of the recipient receiving the message. If the field Nl_pid is set to 0, the message receiver is a kernel or multicast group, and if Nl_groups is 0, it indicates that the message is a unicast message, otherwise it represents a multicast message. when sending NetLink messages using function sendmsg, you also need to refer to struct struct msghdr, struct nlmsghdr, and struct Iovec, and struct struct MSGHDR should be set as follows:

struct MSGHDR msg;
memset (&msg, 0, sizeof (msg));
Msg.msg_name = (void *) & (NLADDR);
Msg.msg_namelen = sizeof (NLADDR);

Where Nladdr is the NetLink address of the message receiver.

The struct NLMSGHDR is the NetLink socket's own message header, which is used for all protocol types defined by multiplexing and NetLink and some other controls, NetLink The kernel implementation will use this message header to multiplex and decompose some other controls, so it is also known as the NetLink control block. Therefore, the app must provide the message header when it sends the NETLINK message.

struct NLMSGHDR
{
  __u32 Nlmsg_len;   /* Length of message */
  __u16 Nlmsg_type;  /* Message type*/
  __u16 Nlmsg_flags; /* Additional Flags */
  __u32 Nlmsg_seq;   /* Sequence Number */
  __u32 Nlmsg_pid;   /* Sending process PID */
};

The field Nlmsg_len specifies the total length of the message, including the length of the data part immediately following the structure, and the size of the structure, and the field nlmsg_type is used to apply the type of the internally defined message , which is transparent to the NetLink kernel implementation, so it is most often set to 0, the field nlmsg_flags is used to set message flags, and the available flags are:

/* Flags values */
 #define NLM_F_REQUEST 1/* It is REQUEST message.               */
 #define NLM_F_MULTI 2/* Multipart message, terminated by Nlmsg_done */
 #define Nlm_f_ack 4/* Reply with ACK, with zero or error code */
 #define NLM_F_ECHO 8/* ECHO this req Uest */
/* Modifiers to GET request */
 #define NLM_F_ROOT 0x100/* Specify tree ROOT */
 #def INE Nlm_f_match 0x200/* return all matching */
 #define NLM_F_ATOMIC 0x400/* ATOMIC GET */
 #d Efine Nlm_f_dump (nlm_f_root| Nlm_f_match) 
/* Modifiers to NEW request */
 #define NLM_F_REPLACE 0x100/* Override existing */
 #d Efine nlm_f_excl 0x200/* Do not touch, if it exists */
 #define NLM_F_CREATE 0x400/* CREATE, if it does Not exist */
 #define NLM_F_APPEND 0x800/* Add to end of list */

The flag nlm_f_request is used to indicate that the message is a request, and that the flag should be set for all messages that the app initiates first.

The flag Nlm_f_multi is used to indicate that the message is part of a multipart message, and subsequent messages can be obtained through macro nlmsg_next.

The macro nlm_f_ack indicates that the message is the response of the previous request message, and the sequence number and process ID can associate the request with the response .

The flag Nlm_f_echo indicates that the message is related to a postback of a package.

The flag nlm_f_root is used by various data acquisition operations for many NetLink protocols that indicate that the requested data table should be returned as a whole to the user application, rather than an entry for an entry. A request with this flag usually causes the response message to be set to the NLM_F_MULTI flag. Note that when the flag is set, the request is protocol-specific, so you need to specify the protocol type in the field Nlmsg_type.

The flag Nlm_f_match indicates that the protocol-specific request requires only a subset of data, and the subset of data is matched by a specified protocol-specific filter.

The flag nlm_f_atomic indicates that the data returned by the request should be collected atomically, which prevents the data from being modified during the acquisition.

Flag Nlm_f_dump not implemented.

The flag nlm_f_replace is used to replace existing entries in the data table.

The flag Nlm_f_excl_ is used in conjunction with CREATE and APPEND, and will fail if the entry already exists.

The flag nlm_f_create indicates that an entry should be created in the specified table.

The flag nlm_f_append indicates that a new entry is added at the end of the table.

The kernel needs to read and modify these flags, for general use, the user set it to 0, just some advanced applications (such as netfilter and routing daemon need it to do some complex operations), fields Nlmsg_seq and nlmsg_pid used to apply the tracking message, The former represents the sequence number, which is the message source process ID. Here is an example:

#define MAX_MSGSIZE 1024
Char buffer[] = "An example message";
struct NLMSGHDR nlhdr;
NLHDR = (struct NLMSGHDR *) malloc (Nlmsg_space (max_msgsize));
strcpy (Nlmsg_data (NLHDR), buffer);
Nlhdr->nlmsg_len = Nlmsg_length (strlen (buffer));
Nlhdr->nlmsg_pid = Getpid ();  /* Self PID */
nlhdr->nlmsg_flags = 0;

Note: The above struct NLMSGHDR nlhdr; is wrong, it should be the pointer type, 1024 is the address of the message buffer, followed by the struct NLMSGHDR (4-byte alignment)

#define Nlmsg_alignto? 4
#define NLMSG_ALIGN (Len) ((len) +nlmsg_alignto-1 & ~ (nlmsg_alignto-1))
#define Nlmsg_hdrlen? ((int) nlmsg_align (sizeof (struct NLMSGHDR)))
#define NLMSG_LENGTH (len) (len) +nlmsg_align (Nlmsg_hdrlen)
#define NLMSG_SPACE (len) nlmsg_align (Nlmsg_length (len))

struct struct Iovec is used to send multiple messages through a system transfer, the following is an example of the structure use:

struct Iovec Iov;
Iov.iov_base = (void *) NLHDR;
Iov.iov_len = nlh->nlmsg_len;
Msg.msg_iov = &iov;
Msg.msg_iovlen = 1;

After completing the above steps, the message can be sent directly via the following statement:

Sendmsg (FD, &msg, 0);

When an app receives a message, it needs to allocate a cache that is large enough to hold the message header and the data part of the message, and then populate the message header, which can be called directly to the function recvmsg () to receive it.

#define Max_nl_msg_len 1024
struct SOCKADDR_NL nladdr;
struct MSGHDR msg;
struct Iovec Iov;
struct NLMSGHDR * NLHDR;
NLHDR = (struct NLMSGHDR *) malloc (Max_nl_msg_len);
Iov.iov_base = (void *) NLHDR;
Iov.iov_len = Max_nl_msg_len;
Msg.msg_name = (void *) & (NLADDR);
Msg.msg_namelen = sizeof (NLADDR);
Msg.msg_iov = &iov;
Msg.msg_iovlen = 1;
Recvmsg (FD, &msg, 0);

Note: FD opens the NetLink socket descriptor for the socket call.

After the message is received, NLHDR points to the message header of the received message, NLADDR saves the destination address of the received message, and Macro Nlmsg_data (NLHDR) returns a pointer to the data part of the message.

Some macros are defined in Linux/netlink.h that facilitate the processing of messages, including:

#define Nlmsg_alignto   4
#define NLMSG_ALIGN (Len) ((len) +nlmsg_alignto-1 & ~ (nlmsg_alignto-1))

The macro nlmsg_align (len) is used to obtain a minimum value that is not less than Len and byte-aligned.

#define NLMSG_LENGTH (len) (len) +nlmsg_align (sizeof (struct NLMSGHDR))

The macro nlmsg_length (len) is used to calculate the actual message length when the data part length is Len. It is typically used to allocate message caches.

#define NLMSG_SPACE (len) nlmsg_align (Nlmsg_length (len))

Macro Nlmsg_space (len) returns the smallest numeric value that is not less than nlmsg_length (len) and byte-aligned, and it is also used to assign message caches.

#define NLMSG_DATA (NLH)  ((void*) (((char*) NLH) + nlmsg_length (0)))

Macro Nlmsg_data (NLH) is used to get the first address of the data part of the message, which is required to set up and read the message data section.

#define NLMSG_NEXT (Nlh,len)      (len)-= Nlmsg_align ((NLH)->nlmsg_len), \
                      (struct nlmsghdr*) (((char*) (NLH)) + nlmsg_align ((NLH)->nlmsg_len))

Macro Nlmsg_next (Nlh,len) is used to get the first address of the next message, and Len is also reduced to the total length of the remaining message, which is typically used when a message is sent or received in several sections.

#define NLMSG_OK (Nlh,len) (len) >= (int) sizeof (struct NLMSGHDR) && \
                           (NLH)->nlmsg_len >= sizeof (struct NLMSGHDR) && \
                           (NLH)->nlmsg_len <= (len))

Macro NLMSG_OK (Nlh,len) is used to determine if a message has Len so long.

#define NLMSG_PAYLOAD (Nlh,len) ((NLH)->nlmsg_len-nlmsg_space ((len)))

Macro Nlmsg_payload (Nlh,len) is used to return the length of the PAYLOAD.

The function close is used to close the open netlink socket.

NetLink Kernel API

NetLink kernel implementation in. c file net/core/af_netlink.c, the kernel module must also contain the header file Linux/netlink.h if it wants to use NetLink. The kernel requires a dedicated API to use NetLink, which is completely different from the use of NetLink by user-state applications. If the user needs to add a new NetLink protocol type, it must be done by modifying the linux/netlink.h, of course, the current NetLink implementation already contains a common protocol type netlink_generic to facilitate user use, Users can use it directly without adding new protocol types. As mentioned earlier, in order to add a new NetLink protocol type, users simply need to add the following definition to Linux/netlink.h to:

#define Netlink_mytest  17

As soon as this definition is added, the user can reference the protocol anywhere in the kernel.

In the kernel, in order to create a NetLink socket user needs to call the following function:

struct sock *
netlink_kernel_create (int unit, void (*input) (struct sock *sk, int len));

The parameter unit represents the NetLink protocol type, such as netlink_mytest, and parameter input is the NETLINK message handler defined by the kernel module, which is referenced when a message arrives at the NetLink socket. The parameter of the function pointer input SK is actually the struct sock pointer returned by the function netlink_kernel_create, sock is actually a core of the socket representing the data structure, The socket created by the user-state application is also represented in the kernel by a struct sock structure. The following is an example of an input function:

void input (struct sock *sk, int len)
{
struct Sk_buff *skb;
struct NLMSGHDR *nlh = NULL;
U8 *data = NULL;

       ! = NULL) {
/* Process NetLink message pointed by Skb->data */
NLH = (struct NLMSGHDR *) skb->data;
data = Nlmsg_data (NLH);

  * NLH and data pointed by data
  */
}   
}

The function input () is called when the sending process executes sendmsg () to handle the message in a timely manner, but if the message is particularly long, this processing increases the execution time of the system call SENDMSG (), in which case a kernel thread is specifically responsible for message reception. The function input only wakes up the kernel thread, so the sendmsg will return soon.

The function SKB = Skb_dequeue (&sk->receive_queue) is used to obtain the message on the receive queue of the socket SK and returns the structure of a struct Sk_buff,skb-> Data points to the actual NetLink message.

function Skb_recv_datagram (NL_SK) is also used to receive messages on NetLink socket Nl_sk, unlike Skb_dequeue, which indicates that if there is no message on the socket's receive queue, it will cause the calling process to sleep in the waiting queue Nl_ Sk->sk_sleep, so it must be used in the context of the process, and the kernel thread that was just talking can receive messages this way.

The following function, input, is an example of this use:

void input (struct sock *sk, int len)
{
  Wake_up_interruptible (Sk->sk_sleep);
}

When sending NetLink messages in the kernel, you also need to set the destination address and the source address, and the message in the kernel is managed by the struct Sk_buff, and a macro is defined in Linux/netlink.h:

#define NETLINK_CB (SKB)         (* (struct netlink_skb_parms*) & ((SKB)->CB))

To facilitate the address setting of the message. Here is an example of a message address setting:

NETLINK_CB (SKB). PID = 0;
NETLINK_CB (SKB). dst_pid = 0;
NETLINK_CB (SKB). Dst_group = 1;

The field PID represents the message sender process ID, which is the source address, for the kernel, which is 0, dst_pid represents the message receiver process ID, which is the destination address, if the target is a group or kernel, it is set to 0, otherwise Dst_group represents the target group address, if it targets a process or kernel, DST The _group should be set to 0.

In the kernel, the module calls the function Netlink_unicast to send a unicast message:

int Netlink_unicast (struct sock *sk, struct Sk_buff *skb, u32 pid, int nonblock);

The parameter SK is the function netlink_kernel_create () The socket returned, the parameter SKB holds the message, its data field points to the NetLink message structure to be sent, and the SKB control block holds the address information of the message, the preceding macro NETLINK_CB ( SKB) is used to conveniently set the control block, the parameter PID is the PID that receives the message process, the parameter nonblock indicates whether the function is non-blocking, if 1, the function will be returned immediately when no receive cache is available, and if it is 0, the function can sleep when no receive cache is available.

A kernel module or subsystem can also use the function Netlink_broadcast to send a broadcast message:

void Netlink_broadcast (struct sock *sk, struct Sk_buff *skb, u32 PID, u32 Group, int allocation);

The first three parameters are the same as netlink_unicast, and the parameter group is the multicast group that receives the message, each representing a multicast group, so if it is sent to more than one multicast group, the parameter is set to the bit or of multiple multicast group ID. The parameter allocation is the kernel memory allocation type, which is generally used for gfp_atomic or gfp_kernel,gfp_atomic in the context of atoms (that is, it is not possible to sleep), and Gfp_kernel is used for non-atomic contexts.

Use the function sock_release in the kernel to release the NetLink socket created by the function netlink_kernel_create ():

void Sock_release (struct socket * sock);

Note that the function netlink_kernel_create () returns a type of struct sock, so the function sock_release should call this:

Sock_release (Sk->sk_socket);

SK is the return value of the function netlink_kernel_create ().

An example of using NetLink is given in the source code package , which includes a kernel module NETLINK-EXAM-KERN.C and two application netlink-exam-user-recv.c, Netlink-exam-user-send.c. The kernel module must first be plugged into the kernel, then run the user-State receiver on one terminal, run the user-state sender on another terminal, the sender reads the text file specified by the parameter and sends it as the content of the NetLink message to the kernel module, and the kernel module accepts the message to be saved to the kernel cache. It is also exported through the Proc interface to the PROCFS, so the user can also be/proc/netlink_exam_buffer to see all the content, while the kernel also sends the message to the user-State receiver, the user-state receiver will output the received content to the screen

LINUX NetLink mechanism

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More