Write your first Linux kernel module (currently proofread to miscellaneous devices)

Source: Internet
Author: User
Tags goto mutex readable dmesg

Tag:linux    kernel     module    linux_kernel 3.0   

want to start black out? There's no clue how to start? Let us show you how to do ...
  Kernel programming is often considered black magic. In the sense of Arthur C Clarke, it may be. The Linux kernel differs greatly from the user space: Take care of yourself, because a small bug in your code can affect the entire system. There is no easy way to do floating-point arithmetic, the stack is both fixed and small, and the code you write is always asynchronous, so you need to consider concurrency. Nonetheless, the Linux kernel is a very large and complex C program that is open to everyone (reading, learning, improving), and you can be part of it.
The simplest way to start kernel programming is to write a module-a piece of code that can be dynamically loaded into the kernel. "
   The simplest way to start kernel programming might be to write a module-a piece of code that can be dynamically loaded and deleted in the kernel. There are also limitations for modules-for example, they cannot add or remove fields from commonly used data structures such as process descriptors. But in other ways they are mature kernel-level code that, if needed, can always be compiled into the kernel (so that all restrictions can be bypassed). Fully developing and compiling linux source code tree module (This is expected to be called a Tree build), which is very handy if you just want to play and don't want to commit your changes to the mainline kernel.
  In this tutorial, we will develop a simple kernel module to create a/dev/reverse device. A string read is written to the device with the word order reversed ("Hello World" becomes "World Hello"). This is a very popular programmer interview problem and you may get some bonus points when you implement it at the kernel level through your own ability. There are a few caveats to tell you before you start: Errors in the module can cause system crashes and data loss (unlikely, but possible). Make sure you back up all your important data before you start, or, better yet, experiment on a virtual machine.
Avoid root privileges whenever possible

By default,/dev/reverse is root, so you must run your test program with sudo. To solve this problem, create a/lib/udev/rules.d/99-reverse.rules file that writes:
subsystem== "Misc", kernel== "reverse", mode= "0666"
Don't forget to reinsert the module. Making device nodes accessible to non-root users is often not a good idea, but it is useful in the development process. Needless to say, it is not a good idea to run test binaries as root users.
structure of the module
Most Linux kernel modules are written in C (except for the underlying architecture-specific parts), and it is recommended that you keep your modules in one file (reverse.c). We put the full source code on GitHub (I uploaded the source comment, download it for free), let's take a look at some clips below. First, we include some common headings and describe the module using predefined macros:
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>

Module_license ("GPL");
Module_author ("Valentine Sinitsyn <[email protected]> ");
Module_description ("In-kernel phrase reverser");
Everything here is simple, except Module_license (): It's not a pure sign. The kernel strongly endorses the gpl-compatible code, so if you set some non-gpl-compatible (such as "proprietary"), some kernel module functionality will not be available.
when do I have to write a kernel module?
Kernel programming is interesting, but writing (especially debugging) kernel code in real projects requires a certain amount of skill. In general, if there is no other way to solve your problem, then you can downgrade the kernel level. You will be able to use the user space:
You develop a USB driver-see LIBUSB.
You develop a file system-try fuse.
Your extended netfilter--libnetfilter_queue can help you.
In general, native kernel code will perform better, but this performance loss for many projects is not important.
Since kernel programming is always asynchronous, Linux has no primary () function to run the module in sequence. Instead, you provide callbacks for various events, like this:
static int __init reverse_init (void)
{
PRINTK (kern_info "reverse device has been registeredn");
return 0;
}

static void __exit reverse_exit (void)
{
PRINTK (kern_info "reverse device has been unregisteredn");
}

Module_init (Reverse_init);
Module_exit (Reverse_exit);
Here, we define the function to be called the insert and delete of the module. Only the first one is required. Now they just print a message to the kernel ring buffer (accessed from user space via the DMESG command); Kern_info is a log level (note that there is no comma). __init and __exit are attributes--a piece of metadata loaded into a function or variable. This attribute is rarely seen in C code in user space, but is common in the kernel. All marked __init will be recycled after initialization (remember "freeing unused kernel memory ..." message?). The __exit indicates that the function is safely optimized when the code is statically compiled into the kernel. Finally, the Module_init () and Module_exit () macros set the Reverse_init () and Reverse_exit () functions as our life cycle callback modules. The actual function name is not important, if you wish you can call them init () and exit () or start () and stop (). They are declared static and therefore invisible in the external module. In fact, any function is invisible in the kernel unless explicitly exported. However, the prefix of your functional module name is a common convention between kernel programmers.
These are bare bones-let's make it more interesting. The module can accept parameters as follows:
# modprobe Foo bar=1
The modinfo command displays all parameter acceptance modules, which can also be used as files in/sys/module//parameters. Our module needs a buffer to store the parameters-we make the size of it configurable by the user. Add the following three lines below Module_description ():
static unsigned long buffer_size = 8192;
Module_param (Buffer_size, ULONG, (S_IRUSR | S_irgrp | S_iroth));
Module_parm_desc (buffer_size, "Internal buffer size");
Here, we define a variable to store the value, wrap it into an argument, and make it readable by SYSFS. The description of the parameter (the last line) appears in the output of the modinfo.
The user can set the buffer_size directly, we need to clear the invalid value in Reverse_init (). You should always check the data from outside the kernel--if you don't (do), you're placing the kernel in an exception or security hole.
static int __init Reverse_init ()
{
if (!buffer_size)
return-1;
PRINTK (Kern_info
"Reverse device has been registered, buffer size is%lu bytesn",
Buffer_size);
return 0;
}
Returning a non-0 value from the module initialization function failed.

The Linux kernel is the source of all the modules you develop. However, it is very large and may have a succession of difficulties. Fortunately, there are many ways to make it easier to navigate a large code base. First, there is cscope-an ancient tool that runs on a terminal. Simply run make Cscope & &cscope in the top-level directory of the kernel code. Cscope is well integrated with vim and Emacs, so you can use it without leaving your favorite editor.
If the terminal-based tool is not your dish, access
http://lxr.free-electrons.com. It is a web-based kernel navigation tool but unlike cscope as much as possible (for example, you can't easily find the usage of a function), it still provides enough quick lookups.
Now is the time to compile the module. You need a kernel version of the Run header file (Linux-headers or equivalent) and build-essential (or similar package). Next, it's time to create the boilerplate makefile:
Obj-m + = REVERSE.O
All
Make-c/lib/modules/$ (Shell uname-r)/build m=$ (PWD) modules
Clean
Make-c/lib/modules/$ (Shell uname-r)/build m=$ (PWD) Clean

You can now create your first module with make. If the previous input is correct, you will find a Reverse.ko file in the current directory. by command sudo insmod Reverse.ko, insert it into the kernel; run:

$ DMESG | Tail-1
[5905.042081] Reverse device has been registered, buffer size is 8192 bytes
Congratulations! However, the line is now lying-there are no device nodes. Let's fix it.
Miscellaneous Equipment
In Linux, there is a special character device type called "Miscellaneous" (or simple "promiscuous"). This is designed for small device drivers with a single entry point, and exactly what we need. All misc devices share the same main device number (10), so a driver (driver/character/MISC.C) can take care of them, and their small numbers. In all other senses, they are just normal character devices.
Registering a minor version number (and an entry point) for the device, you declare the struct misc_device, fill it with fields (note syntax), and call Misc_register () pointer to this structure. To do this, you also need to include the H data source files for Linux/miscdevice:
static struct Miscdevice Reverse_misc_device = {
. minor = Misc_dynamic_minor,
. Name = "Reverse",
. FoPs = &reverse_fops
};
static int __init Reverse_init ()
{
...
Misc_register (&reverse_misc_device);
PRINTK (Kern_info ...
}
Here, we request that the first available (dynamic) small number of devices be named "reverse"; the th ellipsis means omitting the code, as we've seen. Don't forget to unregister the device module:
static void __exit reverse_exit (void)
{
Misc_deregister (&reverse_misc_device);
...
}

The ' fops ' field stores a pointer to a struct file_operations (declared in Linux/fs.h), and the "the" the entry point for OU R module. Reverse_fops is defined as:

static struct File_operations Reverse_fops = {
. Owner = This_module,
. open = Reverse_open,
...
. Llseek = Noop_llseek
};
Again, Reverse_fops contains a set of callback functions to execute (also called methods) when the user space code opens a device, reading, writing, or closing a file descriptor. If you ignore these, it's wise to use fallback instead. This is why we explicitly set the Llseek method to Noop_llseek (), which (as the name implies) does not. The default implementation changes a file pointer and we don't want our device to seekable now (this will be your homework today).
I turn it off
Let us implement the method. We will assign a new buffer for each file descriptor to open, close and free. This is not really secure: if a user-space application Vulnerability Descriptor (perhaps intentionally), it may occupy memory and make the system unusable. You should consider these possibilities in the real world, but for this tutorial, this is acceptable.
We need a structure to describe the buffer. The kernel provides a number of common data structures: Linked lists (double-stranded), hash tables, trees, and so on. However, buffers are usually implemented from the beginning. We're going to call our "structural buffers":
struct Buffer {
Char *data, *end, *read_ptr;
unsigned long size;
};
The data is a pointer to a string that is stored in this buffer, and ends with the end of the first byte string. Read_ptr is the data that read () should start reading. Buffer size Storage Integrity--now, we don't use this field. You should not assume that your user structure will correctly initialize all of these, so better encapsulate buffer allocation and redistribution capabilities. They are usually called buffer_alloc () and Buffer_free ().
static struct buffer *buffer_alloc (unsigned long size)
{
struct buffer *buf;
BUF = Kzalloc (sizeof (*BUF), Gfp_kernel);
if (unlikely (!BUF))
Goto out;
...
Out
return buf;
}
Kernel memory is allocated with Kmalloc () and released with Kfree (); in Kzalloc () flavor all-zeroes set memory. Unlike the standard malloc (), its kernel version receives flags that specify the memory requirements for the type of the second parameter. Here, gfp_kernel means we need a normal kernel memory (instead of a direct memory access or high-end memory area) and the function can hibernate (reschedule the process). sizeof (* buf) is a common approach through the size structure of pointers.
You should always check the return value of Kmalloc (): non-associative null pointers will cause kernel panic. Also be aware of the possible use of () macro. It (and the opposite possible () macro) is widely used by the kernel to represent conditions that are almost always so (or false). It does not affect the control flow, but helps modern processors to improve performance, branch predictions.
Finally, note the go to. They are often considered evil, however, the Linux kernel (and some other system software) employs them to implement a centralized function exit. This results in less depth nesting and more readable code, and is like the high-level language used in Try-ctach blocks.
With Buffer_alloc () and Buffer_free (), the open and closed implementations are very simple.
static int Reverse_open (struct inode *inode, struct file *file)
{
int err = 0;
File->private_data = Buffer_alloc (buffer_size);
...
return err;
}
A structure file is a standard kernel data structure that stores information about an open file, such as the current file location (File->f_pos), National flag (file->f_flags), or open (File->f_mode). Another area where File->private_data is used to put files with some arbitrary data. Its type is void *, outside the owner of the opaque kernel file. We store buffers.
If the buffer allocation fails, we display this call to the user space code by returning a negative value (-ENOMEM). The AC library does an open (2) system call (probably, glibc) to detect and set the errno appropriately.
Learn to read and write
"Read" and "write" methods, the real work is done. When the data is written to the buffer, we discard the previous content and reverse the word in place without any temporary storage. The Reading method simply copies the data from the kernel buffer to the user space. But should the Reverse_read () method do if there is no data in the buffer? In user space, the read () call is blocked until the data is available. In the kernel, you have to wait. Fortunately, there is a mechanism that is called the "Waiting Queue".
The idea is simple. If the current process needs to wait for certain events, its descriptor (a struct task_struct is stored as "current") puts the non-runnable (sleep) state and adds it to the queue. Then schedule () to select another process to run. Generating event queues with code wake-up attendants bring them back to the task_running state. The scheduler will select one of them in the future. Linux has several non-runnable process states, especially task_interruptible (sleep, can interrupt the signal) and task_killable (sleep process can be killed). All of this should be handled correctly and wait for the queue for you to do so.
Natural place to store we read the wait queue header structure buffer, so add the wait_queue_head_t read_queue field first. You should also include Linux/sched.h. The wait queue can statically declare the Declare_waitqueue () macro. In our example, dynamic initialization is necessary, so this line is added to Buffer_alloc ():
Init_waitqueue_head (&buf->read_queue);
We wait for available data, or the read_ptr! = end condition becomes true. We also want to wait for interruptible (say, by CTRL + C). So the "read" approach should look like this:
Static ssize_t reverse_read (struct file *file, Char __user * out,
size_t size, loff_t * off)
{
struct Buffer *buf = file->private_data;
ssize_t result;
while (buf->read_ptr = = buf->end) {
if (File->f_flags & O_nonblock) {
result =-eagain;
Goto out;
}
if (wait_event_interruptible
(Buf->read_queue, Buf->read_ptr! = buf->end)) {
result =-erestartsys;
Goto out;
}
}
...
We have been looping until the data and using wait_event_interruptible () (This is a macro, not a function, which is why the queue is passed by value) and so on if not. If Wait_event_interruptible () is interrupted, it returns a non-0 value that we translate-erestartsys. Refers to a system call that should be restarted. File->f_flags Check for open files in nonblocking mode: If there is no data, we return to-eagain.
We cannot use () if () instead (), because there can be many processes and other data. When using the method, the scheduler chooses to run in an unpredictable way, and then this code has the opportunity to execute the buffer and can be empty. Now we need to copy the data from the BUF-> data to the user space. Copy_to_user () kernel function:
size = min (size, (size_t) (buf->end-buf->read_ptr));
if (Copy_to_user (out, buf->read_ptr, size)) {
result =-efault;
Goto out;
}
The user space pointer call may fail if it is wrong, and if this happens, we return-efault. Remember not to trust anything outside the kernel!
Buf->read_ptr + = size;
result = size;
Out
return result;
}

Simple arithmetic is required so that the data can be read in any block. The method returns the number of bytes read or error codes.
The writing method is simpler and shorter. First, we check the buffer for enough space, and then we use the Copy_from_userspace () function to get the data. Then the read_ptr and end pointers reset the buffer content to the opposite:
Buf->end = buf->data + size;
Buf->read_ptr = buf->data;
if (Buf->end > Buf->data)
Reverse_phrase (Buf->data, buf->end-1);
Here, Reverse_phrase () all burdens depend on the Reverse_word () function, which is very short and inline tagged. This is another common optimization; however, you should not overuse it because inline makes the kernel image small.
Finally, we need to run data such as the read_queue process, as described earlier. Wake_up_interruptible ():
Wake_up_interruptible (&buf->read_queue);
Yo! You now have a kernel module that compiles at least successfully. Now it's time to test it out.
Debugging Kernel Code
Perhaps the most common kernel debugging method is printing. You can use normal PRINTK () with the Kern_debug log level (presumably) if you wish. However, there are better ways. Using Pr_debug () or dev_dbg (), if you are writing a device driver, have your own "fabric device": They support dynamic debugging (DYNDBG) functionality and can enable or disable requests (see document/Dynamic-debug-howto.txt). For purely development information, use Pr_devel (), which becomes an empty operation unless debugging is defined. To enable the Debug module, include:
CFLAGS_REVERSE.O: =-ddebug
Makefile. After that, use the DMESG command to view the generated debug message Pr_debug () or Pr_devel (). Alternatively, you can send debug messages directly to the console. To do this, either console_loglevel the kernel variable is set to 8 or higher (echo 8 >/PROC/SYS/KERNEL/PRINTK) or temporarily print debug information at a high problem like the Kern_err log level. Naturally, you should remove this type of debug statement before you publish your code.
Note that the kernel messages appear on the console, not in a terminal emulator window such as xterm; This is why you will find it recommended not to develop the environment in the X kernel.
Surprise, Surprise!
Compile the module and load it into the kernel:
$ make
$ sudo insmod Reverse.ko buffer_size=2048
$ lsmod
Reverse 2419 0
$ ls-l/dev/reverse
Crw-rw-rw-1 root root, 15:53/dev/reverse
Everything seems to be in the place where it should be. Now that the test module is working, we'll write a small program that changes its first command-line argument. The main () function (no error checking) looks like this:
int fd = open ("/dev/reverse", O_RDWR);
Write (FD, argv[1], strlen (argv[1]));
Read (FD, argv[1], strlen (argv[1]));
printf ("Read:%sn", argv[1]);
Run the program:
$./test ' A quick brown fox jumped over the The lazy dog '
Read:dog lazy The over jumped Fox Brown Quick A
It's working! Learn this: Try using either the original or single-nucleotide phrase, empty or non-English strings (if you have a keyboard layout setting) and anything else.
Now, let's get things a little tricky. We will create two process-shared file descriptors (hence the kernel buffers). A string is constantly written to the appliance and the other reads them. Fork (2) The following example is used in system calls, but Pthreads will work. I also omitted the code to turn the device on and off, error checking (again):
Char *phrase = "A quick brown fox jumped over the The lazy dog";
if (fork ())
/* Parent is the writer */
while (1)
Write (FD, phrase, Len);
Else
/* Child is the reader */
while (1) {
Read (FD, buf, Len);
printf ("Read:%sn", buf);
}
What do you want this program to output? Here's what's on my laptop.
READ: Dog lazy The over jumped Fox Brown Quick A
READ: A Kcicq Brown Fox jumped over the the lazy dog
READ: A Kciuq Nworb xor jumped Fox Brown Quick A
READ: A Kciuq Nworb xor jumped Fox Brown Quick A
...
What's going on here? It's a contest. We think that reading and writing are atoms, or executing an instruction from the beginning until the end. While the kernel is a concurrent beast, it is easy to rearrange the kernel part of the process run to write operations somewhere in the Reverse_phrase () function. If the process is read () the author is given the opportunity to complete before it sees the data in an inconsistent state. Such a mistake is really hard to debug. But how to fix it?
Basically, we need to make sure that no reading method can be executed until the Write method returns. If you set up a multithreaded application, you may have seen synchronous primitives (locks) of mutexes and semaphores. Linux also they, but there are subtle differences. The kernel code can run in the process context (the "representative" of the user-space code works as our method does), in the interrupt context (for example, in an IRQ handler). If you are in the process context and the lock has been you need, you just sleep and retry until you succeed. You can't sleep in the interrupt context, so the code rotates in a loop until the lock is available. The corresponding primitive is called a spin lock, but in our case, a simple mutex object, only one process can "hold" at a given time--is enough. A real code can also use read-write semaphores for performance reasons.
You can also use read-write semaphores for performance reasons.
Locks always protect some data (in our case, a "structural buffer" instance), and it is very common to incorporate it into structural protection. So we add a mutex (struct mutex) to the "Structure buffer". We also have to initialize the mutex with Mutex_init (); Buffer_alloc () is a good place. Code that uses mutexes must also include linux/mutex.h.
Mutual exclusion is more like a traffic light that is useless unless the driver looks at the signal. So we need to update reverse_read () and Reverse_write () to get the mutex before doing any buffering and releasing it when they are done. Let's look at the reading method--writing works just the same way:
Static ssize_t reverse_read (struct file *file, Char __user * out,
size_t size, loff_t * off)
{
struct Buffer *buf = file->private_data;
ssize_t result;
if (mutex_lock_interruptible (&buf->lock)) {
result =-erestartsys;
Goto out;
}
We got the lock function at the beginning. Mutex_lock_interruptible () Gets the mutex and returns or sleeps the process until it is mutually exclusive. As before, the _interruptible suffix means sleep can interrupt the signal.
while (buf->read_ptr = = buf->end) {
Mutex_unlock (&buf->lock);
/* ... wait_event_interruptible () here ... */
if (mutex_lock_interruptible (&buf->lock)) {
result =-erestartsys;
Goto out;
}
}
Here is our "Wait data" loop. You should not sleep when holding a mutex, or what might happen is called a "deadlock". So, if there is no data, we release the mutex and call Wait_event_interruptible (). When it returns, we regain the mutex and continue as usual:
if (Copy_to_user (out, buf->read_ptr, size)) {
result =-efault;
Goto Out_unlock;
}
...
Out_unlock:
Mutex_unlock (&buf->lock);
Out
return result;
Finally, an error occurs when the mutex is unlocked at the end of the function or when the mutex is held. Recompile the module (don't forget to reload) and run the second Test again. You should now see data that is not corrupted.
What's next?
Now you're a little bit familiar with kernel hacks. We just touched on the superficial problem, but there are still more problems. Our first module was deliberately simple, but the concepts you learned will remain the same in more complex scenarios. concurrency, method tables, registration callbacks, will process sleep and wake them up to what each kernel hacker should do, and now you've seen them in action. Maybe your kernel code will end up in the mainline Linux source tree, and if this happens, write to us!


Write your first Linux kernel module (currently proofread to miscellaneous devices)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.