Interaction between Linux User State and kernel state-Netlink

Source: Internet
Author: User

[Size = 4] interaction between Linux User and Kernel
-- Netlink [/size]

Author: Kendo
2006-9-3

This is a learning note, mainly for the implementation and analysis of the source code imp2 in Linux kernel space and user space communication. The source code can be downloaded from the following URL:
Http://www-128.ibm.com/developerworks/cn/linux/l-netlink/imp2.tar.gz

References
Implementation and Analysis of communication between Linux kernel space and user space
Http://www-128.ibm.com/developerworks/cn/linux/l-netlink? CA = dwcn-newsletter-Linux
On data exchange between user space and kernel space in Linux
Http://www-128.ibm.com/developerworks/cn/linux/l-kerns-usrs/

Theory

In Linux Versions later than 2.4, almost all interrupt processes communicate with user-State processes using Netlink sockets, such as iprote2 network management tools, netlink is used for interaction with the kernel. netfilter, the famous kernel package filtering framework, is also changing to Netlink in the latest version, it will be one of the main methods for communication between Linux User and kernel. Its communication is based on a process ID, which is generally the ID of the process. When one end of the communication is interrupted, this ID is 0. When netlink socket is used for communication, both parties of the communication are user-State processes, the method of use is similar to message queue. However, one end of the communication is the interruption process, but the Use method is different. Netlink socket supports the interrupt process. It no longer needs to start a kernel thread when receiving user space data in the kernel space, instead, it calls the receiving function specified in advance through another Soft Interrupt. Working Principle



The Soft Interrupt instead of the kernel thread is used to receive data. This ensures real-time data reception.
When netlink socket is used for communication between the kernel space and the user space, the method for creating a user space is similar to that for a general socket, but the method for creating a kernel space is different, is the process created when netlink socket implements such communication:

User space

User-mode applications use standard sockets to communicate with the kernel, standard socket API functions, socket (), BIND (), sendmsg (), recvmsg () and close () it is easy to apply to netlink socket.
To create a netlink socket, you must use the following parameters to call socket ():

socket(AF_NETLINK, SOCK_RAW, netlink_type)

The protocol cluster corresponding to Netlink is af_netlink. The second parameter must be sock_raw or sock_dgram. The third parameter specifies the Netlink protocol type, which can be a custom type, you can also use the predefined kernel types:


#define NETLINK_ROUTE          0       /* Routing/device hook                          */
#define NETLINK_W1             1       /* 1-wire subsystem                             */
#define NETLINK_USERSOCK       2       /* Reserved for user mode socket protocols      */
#define NETLINK_FIREWALL       3       /* Firewalling hook                             */
#define NETLINK_INET_DIAG      4       /* INET socket monitoring                       */
#define NETLINK_NFLOG          5       /* netfilter/iptables ULOG */
#define NETLINK_XFRM           6       /* ipsec */
#define NETLINK_SELINUX        7       /* SELinux event notifications */
#define NETLINK_ISCSI          8       /* Open-iSCSI */
#define NETLINK_AUDIT          9       /* auditing */
#define NETLINK_FIB_LOOKUP     10
#define NETLINK_CONNECTOR      11
#define NETLINK_NETFILTER      12      /* netfilter subsystem */
#define NETLINK_IP6_FW         13
#define NETLINK_DNRTMSG        14      /* DECnet routing messages */
#define NETLINK_KOBJECT_UEVENT 15      /* Kernel messages to userspace */

# Define netlink_generic 16

Similarly, the socket returned by the socket function can be handed over to function calls such as Bing:

static int skfd;
skfd = socket(PF_NETLINK, SOCK_RAW, NL_IMP2);
if(skfd < 0)
{
      printf("can not create a netlink socket/n");
      exit(0);
}

The BIND function needs to bind the Protocol address. The Netlink socket address uses the struct sockaddr_nl structure description:

struct sockaddr_nl
{
  sa_family_t    nl_family;
  unsigned short nl_pad;
  __u32          nl_pid;
  __u32          nl_groups;
};

The member nl_family is the protocol cluster af_netlink, and the member nl_pad is not currently used. Therefore, always set it to 0, and the member nl_pid is the ID of the process that receives or sends messages, if you want the kernel to process messages or multicast messages, set this field to 0; otherwise, set it to the process ID for message processing. The member nl_groups is used to specify multicast groups. The BIND function is used to add the calling process to the multicast group specified by this field. If it is set to 0, the caller is not added to any multicast group:

Struct sockaddr_nl local;

Memset (& Local, 0, sizeof (local ));
Local. nl_family = af_netlink;
Local. nl_pid = getpid ();/* Set PID to your own PID value */
Local. nl_groups = 0;
/* Bind a socket */
If (BIND (skfd, (struct sockaddr *) & Local, sizeof (local ))! = 0)
{
Printf ("BIND () Error/N ");
Return-1;
}

The user space can call the send function cluster to send messages to the kernel, such as sendto and sendmsg. Similarly, you can use struct sockaddr_nl to describe a peer address to be called by the send function, slightly different from the local address, because the peer is the kernel, The nl_pid member needs to be set to 0:

struct sockaddr_nl kpeer;
memset(&kpeer, 0, sizeof(kpeer));
kpeer.nl_family = AF_NETLINK;
kpeer.nl_pid = 0;
kpeer.nl_groups = 0;

Another problem is the composition of messages sent by the kernel. When we send an IP network packet, the packet structure is "IP packet header + IP data". Similarly, the message structure of Netlink is "Netlink message header + data ". The Netlink message header is described using the struct nlmsghdr structure:

struct nlmsghdr
{
  __u32 nlmsg_len;   /* Length of message */
  __u16 nlmsg_type;  /* Message type*/
  __u16 nlmsg_flags; /* Additional flags */
  __u32 nlmsg_seq;   /* Sequence number */
  __u32 nlmsg_pid;   /* Sending process PID */
};

The nlmsg_len field specifies the total length of the message, including the length of the Data part that follows the structure and the size of the structure. Generally, we use the macro nlmsg_length provided by Netlink to calculate the length, you only need to provide the length of the data to be sent to the nlmsg_length macro. It automatically calculates the total length after Alignment:

/* Calculate the length of the datagram containing the header */
# Define nlmsg_length (LEN) + nlmsg_align (sizeof (struct nlmsghdr )))
/* Byte alignment */
# Define nlmsg_align (LEN) + NLMSG_ALIGNTO-1 )&~ (NLMSG_ALIGNTO-1 ))

We can also see a lot of macros provided by Netlink. These macros provide great convenience for us to compile Netlink macros.

The nlmsg_type field is used to define the message type within the application. It is transparent to the Netlink kernel. Therefore, in most cases, it is set to 0. The nlmsg_flags field is used to set the message flag, you can set it to 0, but only some advanced applications (such as netfilter and daemon require it to perform some complex operations). The fields nlmsg_seq and nlmsg_pid are used for application tracing, the former indicates the sequence number, and the latter indicates the ID of the message source process.

Struct msg_to_kernel/* Custom message header, which only contains the Netlink message header */
{
Struct nlmsghdr HDR;
};

Struct msg_to_kernel message;
Memset (& message, 0, sizeof (Message ));
Message. HDR. nlmsg_len = nlmsg_length (0);/* calculate the message because only one request message is sent and no extra data is available. Therefore, the data length is 0 */
Message. HDR. nlmsg_flags = 0;
Message. HDR. nlmsg_type = imp2_u_pid;/* set the custom message type */
Message. HDR. nlmsg_pid = Local. nl_pid;/* set the PID of the sender */

In this way, with the local address, peer address, and sent data, you can call the sending function to send messages to the kernel:
/* Send a request */
Sendto (skfd, & message, message. HDR. nlmsg_len, 0,
(Struct sockaddr *) & kpeer, sizeof (kpeer ));

After sending the request, you can call the Recv function cluster to receive data from the kernel. the received data includes the Netlink message header and the data to be transmitted:

/* The received data includes the Netlink message header and Custom Data Structure */
Struct u_packet_info
{
Struct nlmsghdr HDR;
Struct packet_info icmp_info;
};
Struct u_packet_info Info;
While (1)
{
Kpeerlen = sizeof (struct sockaddr_nl );
/* Receive the data returned by the kernel space */
Rcvlen = recvfrom (skfd, & info, sizeof (struct u_packet_info ),
0, (struct sockaddr *) & kpeer, & kpeerlen );

/* Process received data */
......
}

Similarly, the function close is used to close the opened netlink socket. In the program, because the program continuously receives messages that process the kernel, it needs to receive the user's close signal to exit. Therefore, the job of disabling the socket is put in the Custom signal function sig_int for processing:

/* This signal function is used to process the exit actions of some programs */
Static void sig_int (INT signo)
{
Struct sockaddr_nl kpeer;
Struct msg_to_kernel message;

Memset (& kpeer, 0, sizeof (kpeer ));
Kpeer. nl_family = af_netlink;
Kpeer. nl_pid = 0;
Kpeer. nl_groups = 0;

Memset (& message, 0, sizeof (Message ));
Message. HDR. nlmsg_len = nlmsg_length (0 );
Message. HDR. nlmsg_flags = 0;
Message. HDR. nlmsg_type = imp2_close;
Message. HDR. nlmsg_pid = getpid ();

/* Send a message to the kernel. The nlmsg_type indicates that the application will be disabled */
Sendto (skfd, & message, message. HDR. nlmsg_len, 0, (struct sockaddr *) (& kpeer), sizeof (kpeer ));

Close (skfd );
Exit (0 );
}

In this end function, a message "I have exited" is sent to the kernel, and then the close function is called to close the Netlink socket and exit the program.

[Size = 3] kernel space [/size]

With the application kernel, the kernel space also mainly completes three tasks:
N create a netlink socket
N receive and process the data sent by the user space
N send data to user space

The API function netlink_kernel_create is used to create a netlink socket and register a callback function to receive messages that process a user space:

struct sock *
netlink_kernel_create(int unit, void (*input)(struct sock *sk, int len));

The parameter unit indicates the Netlink protocol type, such as nl_imp2. The input parameter is the Netlink message processing function defined by the kernel module. When a message arrives at the Netlink socket, the input function pointer will be referenced. The SK parameter of the function pointer input is actually the struct sock pointer returned by the function netlink_kernel_create. Sock is actually a socket kernel that represents the data structure, the socket created by the user-mode application also has a struct sock structure in the kernel.

Static int _ init Init (void)
{
Rwlock_init (& user_proc.lock);/* initialize the read/write lock */

/* Create a netlink socket. The protocol type is custom ml_imp2, and kernel_reveive is the acceptance handler */
Nlfd = netlink_kernel_create (nl_imp2, kernel_receive );
If (! Nlfd)/* creation failed */
{
Printk ("can not create a netlink socket/N ");
Return-1;
}

/* Register a netfilter hook */
Return nf_register_hook (& imp2_ops );
}


Module_init (init );

The user space sends two custom message types to the kernel: imp2_u_pid and imp2_close, which are request and close. The kernel_receive function processes the two types of messages respectively:


Declare_mutex (receive_sem);/* initialize the semaphore */
Static void kernel_receive (struct sock * SK, int Len)
{
Do
{
Struct sk_buff * SKB;
If (down_trylock (& receive_sem)/* obtain the semaphore */
Return;
/* Obtain the SKB from the receiving queue, and then perform some basic length validity verification */
While (SKB = skb_dequeue (& SK-> receive_queue ))! = NULL)
{
{
Struct nlmsghdr * NlH = NULL;

If (SKB-> Len> = sizeof (struct nlmsghdr ))
{
/* Obtain the header of the nlmsghdr structure in the Data */
NlH = (struct nlmsghdr *) SKB-> data;
If (NLH-> nlmsg_len> = sizeof (struct nlmsghdr ))
& (SKB-> Len> = NlH-> nlmsg_len ))
{
/* After the full-length method verification is completed, the system processes the custom message type of the application. It stores the user PID, that is, the kernel saves the message to whom the message is sent "*/
If (NLH-> nlmsg_type = imp2_u_pid)/* Request */
{
Write_lock_bh (& user_proc.pid );
User_proc.pid = NlH-> nlmsg_pid;
Write_unlock_bh (& user_proc.pid );
}
Else if (NLH-> nlmsg_type = imp2_close)/* close the application */
{
Write_lock_bh (& user_proc.pid );
If (NLH-> nlmsg_pid = user_proc.pid)
User_proc.pid = 0;
Write_unlock_bh (& user_proc.pid );
}
}
}
}
Kfree_skb (SKB );
}
Up (& receive_sem);/* returns the semaphore */
} While (nlfd & nlfd-> receive_queue.qlen );
}

Because the kernel module may be called by multiple processes at the same time, semaphores and locks are used in the function for mutual exclusion. SKB = skb_dequeue (& SK-> receive_queue) is used to obtain messages from the socket SK receiving queue. A structure of struct sk_buff is returned, and SKB-> data points to the actual Netlink message.

The program registers a netfilter hook. The hook function is get_icmp, which intercepts ICMP data packets and then calls the send_to_user function to send data to the Application Space process. The sent data is the INFO structure variable, which is the struct packet_info structure. This structure contains two members: Source and Destination addresses. Netfilter hook is not the focus of this article, skipped.
Send_to_user is used to send data to the user space process. The call is completed by the API function netlink_unicast:

int netlink_unicast(struct sock *sk, struct sk_buff *skb, u32 pid, int nonblock);

The SK parameter is the socket returned by the function netlink_kernel_create (). The SKB parameter stores the message to be sent. Its data field points to the Netlink message structure to be sent, the SKB control block stores the Message address information. The PID parameter is the PID of the message receiving process. The nonblock parameter indicates whether the function is non-blocking. If it is 1, this function returns immediately when no cache is available. If it is 0, it sleeps when no cache is available.
A message sent to a user space process consists of the Netlink message header, data section, and control fields. The control field contains the target address and source address to be set when the kernel sends a Netlink message, messages in the kernel are managed through sk_buff, Linux/Netlink. h defines the netlink_cb macro to facilitate Message Address Settings:

#define NETLINK_CB(skb)         (*(struct netlink_skb_parms*)&((skb)->cb))

For example:

NETLINK_CB(skb).pid = 0;
NETLINK_CB(skb).dst_pid = 0;
NETLINK_CB(skb).dst_group = 1;

The field PID indicates the ID of the message sender process, that is, the source address. For the kernel, It is 0, and dst_pid indicates the ID of the Message Receiver process, that is, the target address. If the target is a group or kernel, it is set to 0; otherwise, dst_group indicates the target group address. If the target is a process or kernel, dst_group should be set to 0.


Static int send_to_user (struct packet_info * info)
{
Int ret;
Int size;
Unsigned char * old_tail;
Struct sk_buff * SKB;
Struct nlmsghdr * NlH;
Struct packet_info * packet;

/* Calculate the total message length: add data addition to the Message Header */
Size = nlmsg_space (sizeof (* info ));

/* Allocate a new socket cache */
SKB = alloc_skb (size, gfp_atomic );
Old_tail = SKB-> tail;

/* Initialize a Netlink message header */
NlH = nlmsg_put (SKB, 0, 0, imp2_k_msg, size-sizeof (* NLH ));
/* Skip the message header and point to the data zone */
Packet = nlmsg_data (NLH );
/* Initialize the data zone */
Memset (packet, 0, sizeof (struct packet_info ));
/* Fill in the data to be sent */
Packet-> src = Info-> SRC;
Packet-> DEST = Info-> DEST;

/* Calculate the length difference between SKB and netlink */
NlH-> nlmsg_len = SKB-> tail-old_tail;
/* Set the control field */
Netlink_cb (SKB). dst_groups = 0;

/* Send data */
Read_lock_bh (& user_proc.lock );
Ret = netlink_unicast (nlfd, SKB, user_proc.pid, msg_dontwait );
Read_unlock_bh (& user_proc.lock );


}

The function initializes the Netlink message header, fills the data area, and sets the control field. All three parts are included in skb_buff. Finally, the function calls the netlink_unicast function to send the data.
The function calls an important macro nlmsg_put of Netlink, which is used to initialize the Netlink message header:

#define NLMSG_PUT(skb, pid, seq, type, len) /
({ if (skb_tailroom(skb) < (int)NLMSG_SPACE(len)) goto nlmsg_failure; /
   __nlmsg_put(skb, pid, seq, type, len); })
static __inline__ struct nlmsghdr *
__nlmsg_put(struct sk_buff *skb, u32 pid, u32 seq, int type, int len)
{
struct nlmsghdr *nlh;
int size = NLMSG_LENGTH(len);

nlh = (struct nlmsghdr*)skb_put(skb, NLMSG_ALIGN(size));
nlh->nlmsg_type = type;
nlh->nlmsg_len = size;
nlh->nlmsg_flags = 0;
nlh->nlmsg_pid = pid;
nlh->nlmsg_seq = seq;
return nlh;
}

Note that the nlmsg_failure label is called, so the label should be defined in the program.

Use the sock_release function in the kernel to release the Netlink socket created by the function netlink_kernel_create:

void sock_release(struct socket * sock);

The program releases Netlink sockets and netfilter hook in the exit module:

Static void _ exit Fini (void)
{
If (nlfd)
{
Sock_release (nlfd-> socket);/* release netlink socket */
}
Nf_unregister_hook (& imp2_ops);/* remove the netfilter hook */
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.