Linux Advanced Character driver Operation Iotcl and blocking IO

Source: Internet
Author: User
Tags case statement min mutex sleep switch case

Linux device-driven iotcl of advanced character-driven operations

Most drivers, in addition to providing read and write to the device, also need to provide hardware-controlled interfaces, such as querying how much resolution a framebuffer device can provide, the time to read an RTC device, and setting up a gpio for high and low power parity. And the realization of the hardware operation ability is usually realized by IOCTL method.

1. Prototype Introduction

The prototype for IOCTL in user space is:

int ioctl (int fd, unsigned long cmd, ...);

The point in the prototype does not represent a variable number of arguments, but rather a

A single optional parameter, traditionally identified as Char *ARGP. These dots are only there to block type checking at compile time. 3rd

The actual characteristics of a parameter depend on the specific control command issued (the 2nd parameter). Some commands do not use parameters, some with an integer

Values, and some use pointers to other data. Using a pointer is a method of passing arbitrary data to the IOCTL call; The device can then exchange any amount of data with user space.

The prototype of IOCTL in kernel space is:

Int (*ioctl) (struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg);

The inode and Filp pointers are the values of the file descriptor fd that the corresponding application passes, and the same parameters that are passed to the open method. Cmd

The parameter is passed unchanged from the user, and the optional parameter arg parameter passes in the form of a unsigned long, regardless

Whether it is given by the user as an integer or a pointer. If the calling program does not pass the 3rd argument, the driver receives the

The arg value is undefined. The compiler cannot warn you because type checking is turned off on this extra parameter.

2. Choice of IOTCL cmd

Before implementing IOCTL, we should define a set of IOCTL commands, an easy way to use a simple set of numbers, such as 0 to 9. This situation is generally not a problem, but it is best not to do so, ioctl cmd should be in the system is unique, so as to prevent the wrong device to send the correct command. This will not happen if the IOCTL command is unique within the system.

In Linux, ioctl cmd is divided into several bits to help create a unique cmd. These segments are generally: type (modulus), ordinal, transmission direction and parameter size. You can refer to Include/asm/ioctl.h and documentation/at the time of definition Ioctl-number.txt two files, the header file defines the macros that build the cmd command, and Ioctl-number.txt enumerates the Tpye that have been used in the kernel, and for uniqueness, try not to overlap the type here.

Here is a brief introduction to these few bits:

type

Magic number. Just select a number (after referencing the ioctl-number.txt) and use it in the entire drive. This member is 8-bit wide (_ioc_typebits).

Number

Ordinal (ordinal) number. It is 8-bit (_ioc_nrbits) wide.

direction

The direction of data transfer if this particular command involves data transfer. Possible values are _ioc_none (no data transfer), _ioc_read, _ioc_write, and _ioc_read|_ioc_write (data are transmitted in 2 directions). Data transfer is viewed from the point of view of the application; _ioc_read means read from the device, so the device must be written to user space. Note that this member is a bitmask, so _ioc_read and _ioc_write can be extracted using a logical AND operation.

size

The size of the user data involved. The width of this member is dependent on the system, but it is often 13 or 14 bits. You can find its value in the macro _ioc_sizebits for your particular system. You are not forced to use this size member-the kernel does not check it-but it is a good idea. Using this member correctly can help detect errors in user space programs and enable you to achieve backward compatibility if you ever need to change the size of related data items. If you need a larger data structure, however, you can ignore the size member. We soon saw how to use this member.

The following is a display that defines the IOCTL command:

/* use ' K ' as magic number * *
#define Scull_ioc_magic ' k '
* * Use a different 8-bit number in your code * *

#define Scull_iocreset _io (scull_ioc_magic, 0)
/*
* S means "Set" through a PTR,
* T means ' tell ' directly with the argument value
* G means ' get ': reply by setting through a pointer
* Q means "Query": response is on the return value
* X means "eXchange": Switch G and S atomically
* H means "sHift": Switch T and Q atomically
*/
#define Scull_iocsquantum _iow (scull_ioc_magic, 1, int)
#define Scull_iocsqset _iow (scull_ioc_magic, 2, int)
#define Scull_ioctquantum _io (scull_ioc_magic, 3)
#define Scull_ioctqset _io (Scull_ioc_magic, 4)
#define Scull_iocgquantum _ior (scull_ioc_magic, 5, int)
#define Scull_iocgqset _ior (scull_ioc_magic, 6, int)
#define Scull_iocqquantum _io (scull_ioc_magic, 7)
#define Scull_iocqqset _io (scull_ioc_magic, 8)
#define Scull_iocxquantum _IOWR (scull_ioc_magic, 9, int)
#define Scull_iocxqset _IOWR (scull_ioc_magic,10, int)
#define Scull_iochquantum _io (scull_ioc_magic, 11)
#define Scull_iochqset _io (scull_ioc_magic, 12)

#define SCULL_IOC_MAXNR 14

More about macro definitions such as _IOWR can refer to the definitions in the header file.

3. Return value of IOCTL

IOCTL implementations are often a switch case statement, and the return value relies on the implementation of each case branch. When you encounter an undefined cmd and return what value, I suggest using-einval to indicate useless arguments. Another point, when the case branch is more, some people often forget to write a break, causing the subsequent cases Branch also executes, resulting in errors.

4. Arg parameter of IOCTL

Some IOCTL commands do not require arg this parameter, and most ioctl need to pass data at the application and kernel levels. When the arg parameter is a cosmetic, it is very simple, we can use it directly. If it is a pointer, you need to be careful.

Data exchange between application layer and kernel layer we often use the copy_from_user and Copy_to_user functions, which can be used securely to move data. These functions can also be used in the IOCTL method. But the data items in the IOCTL are often very small data, and with these two functions a bit cumbersome, we can try to use other methods to implement the data transfer.

int ACCESS_OK (int type, const void *ADDR, unsigned long size);

This function is used to check whether a given address satisfies a specific access requirement, and this function checks only and does not have data copy. You can safely transfer data after you use ACCESS_OK. You can use the following interface to transfer data:

Put_user (Datum, PTR)
__put_user (Datum, PTR)
These macros define write datum to user space, they are relatively fast, and should be invoked in place of copy_to_user whenever a single value is to be routed. These macros have been written to allow the passing of any type of pointer to Put_user as long as it is a user-space address. The data size that is transferred depends on the type of the PRT parameter and is determined at compile time using a compiler built in sizeof and typeof. The result is that if PRT is a char pointer, it transmits a byte, as well as for 2, 4, and possibly 8 bytes.

Put_user Check to ensure that the process is able to write to the given memory address, it returns 0 on success, and returns-efault on error. __put_user does less checking (it does not call ACCESS_OK), but can still fail if the memory being pointed to is not writable by the user. Therefore, __put_user should only be used when the memory area has been checked with ACCESS_OK.

As a general rule, when you implement a read method, call __put_user to save several cycles, or when you copy several items, call Access_ok once before the first data transfer, as shown in ioctl above.

Get_user (local, PTR)
__get_user (local, PTR)

These macro definitions are used to receive individual data from user space. They are like Put_user and __put_user, but pass data in the opposite direction. The obtained value is stored in local variable locals, and the return value indicates whether the operation was successful. Again, __get_user should only be used in addresses that have already been checked using ACCESS_OK.




blocking IO for Linux device-driven advanced character-driven operations

For example, a process called read reads data, what to do when there is no data to read, whether to return immediately or wait until there is data; the other is that the process calls write data to the device, what if the buffer is full or the device is busy, is it returned immediately or continues to wait until the device is writable? In this case, the general default is to make the process sleep until the request is satisfied. This article introduces the processing method that encounters this kind of problem drive.
Sleep

What is sleep? A process sleep means that it temporarily abandons the power of the CPU until a certain condition occurs before the system can be scheduled again.

It's easy to get a process to sleep in a drive, but there are a few rules that need special attention.

The atomic context cannot sleep. This means that the driver cannot sleep when holding a spin lock, Seqlock, or RCU lock.
The shutdown is interrupted without sleep. You cannot sleep in an interrupt handler function.
You can sleep while holding the semaphore, but it will cause other waiting processes to go to sleep, so you should pay special attention to the short sleep time.
After being awakened you should do some necessary checking to make sure that the condition you are waiting for is satisfied. Because you don't know what happened during the time of sleep.
Be sure to wake up before you sleep, or don't sleep.

How to sleep and wake up

The process of sleep enters the waiting queue, and a waiting queue can be declared as follows:

Declare_wait_queue_head (name);
Or dynamically, as follows:

wait_queue_head_t My_queue;
Init_waitqueue_head (&my_queue);

When a process requires sleep, you can call the following interface:

The process is put into an uninterrupted sleep, generally don't do this
Wait_event (queue, condition)
It may be interrupted by a signal, this version should check the return value, if the return Non-zero is likely to be interrupted by some signals, the driver should///the return-erestartsys.
Wait_event_interruptible (queue, condition)
The following two waits for a period of time and returns 0 after the timeout.
Wait_event_timeout (queue, condition, timeout)
Wait_event_interruptible_timeout (queue, condition, timeout)


To wake the dormant process, other processes will invoke the wake function:

The following function wakes all the processes waiting on a given queue, typically with interruptible pairing, without//
void Wake_up (wait_queue_head_t *queue);
void Wake_up_interruptible (wait_queue_head_t *queue);

blocking and non-blocking selection

It said the sleep method, this implementation is blocking IO implementation, there is also a situation requires that regardless of Io is available, call to return immediately, is a non-blocking implementation. For example, when read, although there is no data to read, but I do not want to wait, I want to return immediately.

Non-blocking IO is indicated by the O_nonblock sign in Filp->f_flags, which is located in <linux/fcntl.h&gt, and is automatically included by the <linux/fs.h>. This flag can be specified at open time.

Io is blocked by default (without specifying O_nonblock), and the following criteria need to be met when implementing Read/write:

If a process calls read but no data is available (not yet), the process must block. This process is awakened immediately when data is reached, and that data is returned to the caller, even less than the number requested in the count parameter to the method.
If a process calls write and there is no space in the buffer, the process must block, and it must be in a different wait queue as read. When some data is written to a hardware device and the space in the output buffer becomes idle, the process is awakened and the write call succeeds, although the data may only be partially written if the buffer only has no space to give the requested count byte.

Both are assumed to have input and output buffers, and in fact, almost every device driver has an input and output buffer. Buffering improves access efficiency and prevents loss of data.

If O_nonblock is specified, it is non-blocking access. The practice of read and write is different. In this case, these calls simply return to-eagain. Only Read,write and open file operations receive the effect of non-blocking flags.

Here is an implementation of a simple read, which is compatible with blocking and non-blocking implementations (key places to add comments):

static ssize_t scull_p_read  (Struct file *filp, char __user *buf,  size_t count, loff_t *f_pos) {        struct 
scull_pipe *dev = filp->private_data;         if  (Down_interruptible (&dev->sem))   
              return -ERESTARTSYS;         while  (DEV-&GT;RP&NBSP;==&NBSP;DEV-&GT;WP)          { /* nothing to read */                 up (&dev->sem); /*  release the lock */            
    //determines whether it is blocked access, and if it is non-blocking access, returns-eagain immediately.                 if  (filp->f_flags  & o_nonblock)               
          return -EAGAIN;                 pdebug ("\"%s\)  reading: going to sleep\n ",  current->comm);       
           //if it is blocked access, then the sleep waits until the read condition is satisfied to continue execution.                 if  (Wait_ Event_interruptible (dev->inq,  (DEV-&GT;RP&NBSP;!=&NBSP;DEV-&GT;WP))                           return -erestartsys; /* signal: tell&Nbsp;the fs layer to handle it */ /* otherwise loop, but  first reacquire the lock */                 if  (Down_interruptible (&dev->sem))                       
   return -ERESTARTSYS;         }         /* ok,  data is there, return something */       
  &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;//read the data normally below.         if  (DEV-&GT;WP&NBSP;&GT;&NBSP;DEV-&GT;RP)                  count = min ( count,  (size_t) (dev->WP&NBSP;-&NBSP;DEV-&GT;RP));         else /* the write pointer has  wrapped, return data up to dev->end */                 count = min (count,  (size_t) (dev-
&GT;END&NBSP;-&NBSP;DEV-&GT;RP));         if  (Copy_to_user (buf, dev->rp, count))          {           
     up  (&dev->sem);                 return -
Efault;         } &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;DEV-&GT;RP
 += count;         if  (dev->rp == dev->End) &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;DEV-&GT;RP  = dev->buffer; /* wrapped */        up
  (&AMP;DEV-&GT;SEM);         /* finally, awake any writers and
 return */        wake_up_interruptible (&AMP;DEV-&GT;OUTQ);         pdebug ("\"%s\ " did read %li bytes\n"),
current->comm,  (long) count);
        return count; }



 
Mutex wait

Before we said that when a process calls WAKE_UP, all the waiting processes on this queue are set to run. This is generally not a problem, but in individual cases it may be known in advance that only one awakened process will succeed in obtaining the required resources and that other processes will sleep again. If there are too many processes waiting, waking up all the way to sleep is also a resource-intensive operation that can degrade the system's performance. To cope with this situation, the kernel adds a mutex waiting option. As a result, the process of mutually exclusive waiting is awakened one at a time.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.