Linux Block device drivers <3>

Source: Internet
Author: User
Tags function prototype

Transferred from: http://blog.chinaunix.net/uid-15724196-id-128141.html

3rd Chapter

+---------------------------------------------------+
| Write a block device driver |
+---------------------------------------------------+
| Zhao Lei |
| Email[email protected]|
+---------------------------------------------------+
| The copyright of the article belongs to the original author. |
| You are free to reprint this article, but the original copyright information must be retained.
| For commercial use, be sure to contact the original author if you have not obtained |
| Copyright disputes arising from the authorization shall be the sole responsibility of the infringer.
+---------------------------------------------------+

In the previous chapter we discussed the dress problem of MM, and successfully replaced her with a feather, the key is thin as organdy new clothes.
And in this chapter, we're going to go a little bit further, that is: strip her off.
The goal is to be more in line with our aesthetic, and to be able to learn more about the mm (except for those who prefer the uniform fur).
The price to pay is that the chapter is a little bit more complicated.

Although the NoOp Scheduler is really simple enough to be simpler than our drivers, the 120 lines of code in 2.6.27 have amply illustrated the problem.
But obviously, no matter how simple it is, as long as it exists, we take it as a liability.
Here we're not going to go over and over again. Lip demonstrates what benefits, difficulties, and how to connect with the international, without using the I/O Scheduler,
After all, we are not talking about gasoline prices, and we are not PetroChina. We are more concerned with actually doing something useful to the driver.

But I/O scheduler this layer of masking clothes is not so easy to take off, because in fact we also use another function of its bundle, is the request queue.
So our procedures in the first two chapters are so simple.
In detail, the request queue Request_queue has a MAKE_REQUEST_FN member variable, and we look at its definition:
struct Request_queue
{
...
MAKE_REQUEST_FN *MAKE_REQUEST_FN;
...
}
It is actually:
typedef int (MAKE_REQUEST_FN) (struct request_queue *q, struct bio *bio);
A pointer to a function.

If the above remark is confusing to the reader, then please move on to a bench and let's Begin the story.

Access to a common block layer, such as a request to read a piece of data on a block device, usually prepares a bio and then calls the Generic_make_request () function.
The caller is fortunate because he often does not need to care about how the Generic_make_request () function does, just know that this magical function will fix all the problems for him OK.
And we are not so fortunate, because for a piece of device-driven designers, if you do not know the generic_make_request () function of the internal situation, it is likely to make the driver of the user can not get security.

It is still RTFSC to know the effective method inside the generic_make_request (), but here are some hints.
We can find __generic_make_request (bio) in Generic_make_request (),
Then find ret = Q-&GT;MAKE_REQUEST_FN (q, bio) in this line in the __generic_make_request () function.
Lazy omitting all the key steps to unlocking the puzzle, here's a good conclusion that the author believes but the reader doesn't necessarily believe:
Generic_make_request () is ultimately handled by calling the REQUEST_QUEUE.MAKE_REQUEST_FN function to complete the request described by bio.

By the end of the story, we can now explain why we've listed the inexplicable data structure.
For block device drivers, it is the REQUEST_QUEUE.MAKE_REQUEST_FN function that handles all requests on this block device.
In other words, as long as we implement the REQUEST_QUEUE.MAKE_REQUEST_FN, then the block device-driven primary mission is nearly complete.
In this chapter, what we want to do is:
1: Let Request_queue.make_request_fn point to the Make_request function we designed
2: Write out the Make_request function we designed

If the reader is now daring to pick up the keyboard, the author will surely pretend to ask the reader a question:
Your spirit of study met the Chengguan?
If the reader is baffled by this remark, the author will add another question:
The Make_request function was obviously not implemented in the first two chapters, but how did the driver work at that time?
And then just clear your throat and ask yourself to answer.

The first two chapters do not use the Make_request function, but when we use Blk_init_queue () to get Request_queue,
The universal system knows that we have low-income it, so we have a relief, and this is the famous __make_request () function.
Request_queue.make_request_fn points to the __make_request () function, so all requests to the block device are directed to the __make_request () function.

__make_request () function is not vegetarian, immediately shouted on his brother, that is, I/O Scheduler to help, the result is the bio request was processed by the I/O scheduler.
At the same time, __make_request () itself is not idle, it took the bio of the salted fish sniff, lick, and then put in the mouth Chew Chew, the fishbone fish scale off,
It is then affectionately fed to the driver author's mouth through the Do_request function (that is, the first parameter of Blk_init_queue).
This explains how we handled the block device request through the Simp_blkdev_do_request () function in the first two chapters.

We understand that __make_request () function is good, it put bio this salted fish into request_queue feed to do_request function, can let us to the following benefits:
1:request.buffer not in high-end memory
This means that we don't need to think about mapping high-end memory to virtual storage.
2:request.buffer's memory is contiguous.
Therefore, we do not need to consider whether the request.buffer corresponding memory address is divided into several segments of the problem
These benefits appear to be natural, as some administrative omission of the "relevant departments" think people pay taxes to raise them also natural,
But soon we will see a situation that is not very natural.

If the reader is mm, may think that a drop pot to chew salted fish well and lovingly fed it is a very romantic thing (also hope that the reader and the author contact),
But for most male it workers, unless the orientation problem, otherwise ...
So now we'd rather kick the __make_request () function and chew on bio's salted fish.
Of course, kicking the __make_request () function also means getting rid of the I/O Scheduler's handling.

Kicking the __make_request () is easy, and using the Blk_alloc_queue () function instead of the blk_init_queue () function to get the request_queue is OK.
In other words, we put the original
Simp_blkdev_queue = Blk_init_queue (Simp_blkdev_do_request, NULL);
Changed it.
Simp_blkdev_queue = Blk_alloc_queue (Gfp_kernel);
Such

As for the Simp_blkdev_do_request () function that chews the saliva residue, we also throw away:
Delete the Simp_blkdev_do_request () function from the beginning to the end.

At the same time, because now want to strip, so in the last chapter we pay a great effort to change the thin underwear also do not need,
That is, the addition of the previous chapter of the Elevator_init () this part of the function is also deleted, that is, delete the following part:
Old_e = simp_blkdev_queue->elevator;
if (Is_err_value (Elevator_init (Simp_blkdev_queue, "NoOp")))
PRINTK (kern_warning "Switch elevator failed, using default\n");
Else
Elevator_exit (old_e);

Here we have succeeded in letting __make_request () take off, but to chew bio, we need to add something:
First, we assign our own bio-processing function to Request_queue, which is implemented by the Blk_queue_make_request () function, adding this line after Blk_alloc_queue ():
Blk_queue_make_request (Simp_blkdev_queue, simp_blkdev_make_request);
And then implement our own simp_blkdev_make_request () function,
and then compile.

If the code modified as described above gives the reader less confidence, we list the modified Simp_blkdev_init () function here:
static int __init simp_blkdev_init (void)
{
int ret;

Simp_blkdev_queue = Blk_alloc_queue (Gfp_kernel);
if (!simp_blkdev_queue) {
ret =-enomem;
Goto Err_alloc_queue;
}
Blk_queue_make_request (Simp_blkdev_queue, simp_blkdev_make_request);

Simp_blkdev_disk = Alloc_disk (1);
if (!simp_blkdev_disk) {
ret =-enomem;
Goto Err_alloc_disk;
}

strcpy (Simp_blkdev_disk->disk_name, simp_blkdev_diskname);
Simp_blkdev_disk->major = Simp_blkdev_devicemajor;
Simp_blkdev_disk->first_minor = 0;
Simp_blkdev_disk->fops = &simp_blkdev_fops;
Simp_blkdev_disk->queue = Simp_blkdev_queue;
Set_capacity (Simp_blkdev_disk, simp_blkdev_bytes>>9);
Add_disk (Simp_blkdev_disk);

return 0;

Err_alloc_disk:
Blk_cleanup_queue (Simp_blkdev_queue);
Err_alloc_queue:
return ret;
}
It also changed err_init_queue to Err_alloc_queue, hoping that readers would not ask questions about it.

As mentioned at the beginning of this chapter, the content of this chapter may be more complex, and now seems to have been done.
And now the progress is probably ... Half!
But it is comforting to note that the rest of the content is only our simp_blkdev_make_request () function.

First, the function prototype is given:
static int simp_blkdev_make_request (struct request_queue *q, struct bio *bio);
This function is used to process a bio request.
The function accepts the struct request_queue *q and the struct bio *bio as parameters, and the information about the request is in the Bio parameter,
The struct Request_queue *q is not handled by __make_request (), which means we cannot use Q in the same way as in the previous chapters.
So here's what we're looking at: Bio.

About the format of bio and Bio_vec we're still not going to explain too much here, for the same reason that we want to avoid a lot of articles with Google.
Here we only say a word:
Bio corresponds to a request for a contiguous space on a block device, and several Bio_vec included in the bio are used to indicate the corresponding memory per segment of the request.
So Simp_blkdev_make_request () is essentially a loop in which each Bio_vec in the bio is taken care of.

This magical cycle is like this:
Dsk_mem = Simp_blkdev_data + (bio->bi_sector << 9);

Bio_for_each_segment (Bvec, bio, i) {
void *iovec_mem;

Switch (BIO_RW (bio)) {
Case READ:
Case Reada:
Iovec_mem = Kmap (bvec->bv_page) + bvec->bv_offset;
memcpy (Iovec_mem, Dsk_mem, Bvec->bv_len);
Kunmap (Bvec->bv_page);
Break
Case WRITE:
Iovec_mem = Kmap (bvec->bv_page) + bvec->bv_offset;
memcpy (Dsk_mem, Iovec_mem, Bvec->bv_len);
Kunmap (Bvec->bv_page);
Break
Default
PRINTK (Kern_err simp_blkdev_diskname
": Unknown value of BIO_RW:%lu\n",
BIO_RW (bio));
#if Linux_version_code < Kernel_version (2, 6, 24)
Bio_endio (bio, 0,-eio);
#else
Bio_endio (bio,-eio);
#endif
return 0;
}
Dsk_mem + = bvec->bv_len;
}
The number of start sectors and sectors of the block device requested by bio is stored in bio.bi_sector and Bio.bi_size,
We first obtain this bio request from Bio.bi_sector in our block device memory in the initial part of the location, deposited in Dsk_mem.
Then traverse each Bio_vec in the bio, where we use the system-provided bio_for_each_segment macros.

The code in the loop looks a little familiar, and is simply handled according to the type of request. Reada means pre-read, well-designed pre-read requests that can improve I/O efficiency,
It's a bit like in-memory prefetch (), and we're not going to go into more detail here, because it can write a whole article about our memory-based block device driver,
Just follow the read request and the same process will be OK.

Before and after the familiar memcpy, we found the two new faces of Kmap and Kunmap.
It also proves that salted fish is more difficult to chew than rotten meat.
The memory address in Bio_vec is described using page *, which also means that memory pages may be in high-end memory and cannot be accessed directly.
In this case, the general approach is to use the Kmap map to the non-linear mapping region for access, of course, after the visit to remember to put the mapped area back,
Do not be relied on your memory is not yet, in fact, in the i386 structure, the larger the memory available in the non-linear mapping region more tense.
The details of high-end memory also please Google, anyway, in my impression that Intel always has something to do with hardware constraints to programmers to help the job.
Fortunately, the limitations of the 64-bit machines that are becoming popular should not be so easy to break through, at least I think so.

The default in switch is used to handle other situations, and our handling is simple, throwing an error message, and then calling Bio_endio () to tell the upper class that the bio is wrong.
But this evil Bio_endio () function is changed in 2.6.24, and if our driver is part of the kernel, we just need to synchronize the update call to Bio_endio ().
But this is obviously not the case, and we want this driver to be able to adapt to the kernel both before and after 2.6.24, so we use conditional compilation to compare kernel versions.
Also, because of the use of Linux_version_code and kernel_version macros, you need to add # include <linux/version.h>.

At the end of the loop, the number of bytes processed in this round loop is added to Dsk_mem so that the dsk_mem points to the data in the corresponding block device in the next Bio_vec.

Readers may begin to resist the thought that this chapter is not over, yes, it will be over soon, but we have to add a tiny piece of the loop:
1: The variable declaration before the loop:
struct Bio_vec *bvec;
int i;
void *dsk_mem;
2: Detects if the access request exceeds the block device limit before looping:
if ((Bio->bi_sector << 9) + bio->bi_size > Simp_blkdev_bytes) {
PRINTK (Kern_err simp_blkdev_diskname
": Bad Request:block=%llu, count=%u\n",
(unsigned long long) bio->bi_sector, bio->bi_size);
#if Linux_version_code < Kernel_version (2, 6, 24)
Bio_endio (bio, 0,-eio);
#else
Bio_endio (bio,-eio);
#endif
return 0;
}
3: End this bio after the loop and return to success:
#if Linux_version_code < Kernel_version (2, 6, 24)
Bio_endio (bio, bio->bi_size, 0);
#else
Bio_endio (bio, 0);
#endif
return 0;
Bio_endio is used to return the result of this processing of the bio request, in the kernel after 2.6.24, the first parameter is the bio pointer being processed, the second parameter succeeds at 0, and the failure is-errno.
In the kernel before 2.6.24, there is also a unsigned int bytes_done in the middle, which returns the number of bytes that have been settled.

Now we can take a long sigh of relief, we are finished.
or attach the completion code for Simp_blkdev_make_request ():
static int simp_blkdev_make_request (struct request_queue *q, struct bio *bio)
{
struct Bio_vec *bvec;
int i;
void *dsk_mem;

if ((Bio->bi_sector << 9) + bio->bi_size > Simp_blkdev_bytes) {
PRINTK (Kern_err simp_blkdev_diskname
": Bad Request:block=%llu, count=%u\n",
(unsigned long long) bio->bi_sector, bio->bi_size);
#if Linux_version_code < Kernel_version (2, 6, 24)
Bio_endio (bio, 0,-eio);
#else
Bio_endio (bio,-eio);
#endif
return 0;
}

Dsk_mem = Simp_blkdev_data + (bio->bi_sector << 9);

Bio_for_each_segment (Bvec, bio, i) {
void *iovec_mem;

Switch (BIO_RW (bio)) {
Case READ:
Case Reada:
Iovec_mem = Kmap (bvec->bv_page) + bvec->bv_offset;
memcpy (Iovec_mem, Dsk_mem, Bvec->bv_len);
Kunmap (Bvec->bv_page);
Break
Case WRITE:
Iovec_mem = Kmap (bvec->bv_page) + bvec->bv_offset;
memcpy (Dsk_mem, Iovec_mem, Bvec->bv_len);
Kunmap (Bvec->bv_page);
Break
Default
PRINTK (Kern_err simp_blkdev_diskname
": Unknown value of BIO_RW:%lu\n",
BIO_RW (bio));
#if Linux_version_code < Kernel_version (2, 6, 24)
Bio_endio (bio, 0,-eio);
#else
Bio_endio (bio,-eio);
#endif
return 0;
}
Dsk_mem + = bvec->bv_len;
}

#if Linux_version_code < Kernel_version (2, 6, 24)
Bio_endio (bio, bio->bi_size, 0);
#else
Bio_endio (bio, 0);
#endif

return 0;
}

Readers can replace the previous chapter's Simp_blkdev_do_request () function directly with the Simp_blkdev_make_request () function in this chapter.
Then replace the previous chapter with the same name function with the Simp_blkdev_init () function in this chapter, and add # include <linux/version.h&gt to the head of the file;
You get the final code for this chapter.

Before concluding this chapter, let's try it out:
The first is to compile and load:
# make
Make-c/lib/modules/2.6.18-53.el5/build SUBDIRS=/ROOT/TEST/SIMP_BLKDEV/SIMP_BLKDEV_STEP3 Modules
MAKE[1]: Entering directory '/usr/src/kernels/2.6.18-53.el5-i686 '
CC [M]/root/test/simp_blkdev/simp_blkdev_step3/simp_blkdev.o
Building modules, Stage 2.
Modpost
cc/root/test/simp_blkdev/simp_blkdev_step3/simp_blkdev.mod.o
LD [M]/root/test/simp_blkdev/simp_blkdev_step3/simp_blkdev.ko
MAKE[1]: Leaving directory '/usr/src/kernels/2.6.18-53.el5-i686 '
# Insmod Simp_blkdev.ko
#
Then use the methods in the previous chapter to see information about this device in SYSFS:
# Ls/sys/block/simp_blkdev
Dev holders range removable size slaves stat subsystem uevent
#
We found that our driver was missing the queue subdirectory in the SYSFS directory.
It's not surprising, or it's going to freak out.

In this chapter we implement our own make_request function to deal with bio, which is to get rid of the I/O Scheduler and the General __make_request () of bio processing.
Since the data in our block device is in memory and does not involve DMA operations and does not require pathfinding, this should be the most appropriate way to handle the block device of this form.
Similar drivers in Linux mostly use the processing in this chapter, but for most physical disk-based block device drivers, using a suitable I/O scheduler can improve performance.
At the same time, the rebound mechanism contained in __make_request () can also provide good help for block device drivers that require DMA operations.

Although quantitative change has a qualitative change, it is usually more complicated than quantitative change.
In the same vein, it is much more difficult to strip a mm dress than to let her change a thin dress compared to the previous chapter.
However, we finally even coax a cheat to let mm take off, and pay the price of sweating:
The complexity of this chapter is much deeper than the previous chapter.

If the content of this chapter is unfortunate enough to make the reader feel that the head volume has increased, we will announce a good news as a remedy:
Because according to convention, the following chapters 1, 2 will appear some relaxed content to let the reader get the full rest.

< not finished, to be continued >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.