Write a block device driver 11,12

Source: Internet
Author: User

http://blogold.chinaunix.net/u3/108239/showart.php?id=2144635

11th Chapter

+---------------------------------------------------+
| Write a block device driver |
+---------------------------------------------------+
| Zhao Lei |
| Email: [Email protected] |
+---------------------------------------------------+
| The copyright of the article belongs to the original author. |
| You are free to reprint this article, but the original copyright information must be retained.
| For commercial use, be sure to contact the original author if you have not obtained |
| Copyright disputes arising from the authorization shall be the sole responsibility of the infringer.
+---------------------------------------------------+

In this chapter we still prepare for the use of high-end memory for block device drivers.
The preparatory work here does not mean to add or change any function,
It's about cleaning up some of the code, because they look a little complicated.

Readers with programming experience can probably realize that the most common programming is not the input program, but copy-paste.
This is because we may be constantly discovering design problems when programming, or realizing that we can adopt a better structure, and then of course we implement it.
Of course, a better situation would probably be to identify the best structure at the beginning of the plan to avoid future changes,
But the truth tends to run counter to the ideal, but the key is that we find it timely to correct it, instead of punches to cover it up like some departments do.
You know, the more the wine is the more fragrant, but the more the garbage is more smelly, if we can not make a perfect design at first, at least we have the courage to correct.

Here the reader may have sensed, here we are going to modify the Simp_blkdev_make_request () function, because it looks a bit big,
So that when you modify it in the previous chapters, you have to list large sections of code to show the results of the changes.
However, this is not the main reason, compared to shortening the function length, we may be more concerned about the segmentation function is to improve the readability of the code.

In fact, the division of Simp_blkdev_make_request () is also for the future to achieve high-end memory support,
Because accessing high-end memory will undoubtedly involve page mapping problems, and the processing of page mappings involves this function,
So we also want to make this part of the function independent, so as not to change this big function,
It may also be for the author's preference, as the author, even if it is a character in the function, will check the whole function from beginning to end.
To determine that this change has no other effect, which explains why the author prefers simpler functions.
Of course, this preference is not necessarily a good thing, such as the first two days to choose LCD TV, the author tends to display + set-top box ...

For the reader who has been sticking to this chapter, the function of the simp_blkdev_make_request () function should be learned by heart,
So we list the modified code directly:
static int simp_blkdev_trans_oneseg (struct page *start_page,
unsigned long offset, void *buf, unsigned int len, int dir)
{
void *dsk_mem;

Dsk_mem = page_address (start_page);
if (!DSK_MEM) {
PRINTK (Kern_err simp_blkdev_diskname
": Get page ' s address failed:%p\n", start_page);
Return-enomem;
}
Dsk_mem + = offset;

if (!dir)
memcpy (buf, Dsk_mem, Len);
Else
memcpy (Dsk_mem, buf, Len);

return 0;
}

static int Simp_blkdev_trans (unsigned long long dsk_offset, void *buf,
unsigned int len, int dir)
{
unsigned int done_cnt;
struct page *this_first_page;
unsigned int this_off;
unsigned int this_cnt;

done_cnt = 0;
while (Done_cnt < Len) {
/* Iterate each data segment */
This_off = (Dsk_offset + done_cnt) & ~simp_blkdev_datasegmask;
this_cnt = min (len-done_cnt,
(unsigned int) Simp_blkdev_datasegsize-this_off);

This_first_page = Radix_tree_lookup (&simp_blkdev_data,
(Dsk_offset + done_cnt) >> simp_blkdev_datasegshift);
if (!this_first_page) {
PRINTK (Kern_err simp_blkdev_diskname
": Search Memory Failed:%llu\n",
(Dsk_offset + done_cnt)
>> simp_blkdev_datasegshift);
Return-enoent;
}

if (Is_err_value (Simp_blkdev_trans_oneseg (This_first_page,
This_off, buf + done_cnt, this_cnt, dir))
Return-eio;

done_cnt + = this_cnt;
}

return 0;
}

static int simp_blkdev_make_request (struct request_queue *q, struct bio *bio)
{
int dir;
unsigned long long dsk_offset;
struct Bio_vec *bvec;
int i;
void *iovec_mem;

Switch (BIO_RW (bio)) {
Case READ:
Case Reada:
dir = 0;
Break
Case WRITE:
dir = 1;
Break
Default
PRINTK (Kern_err simp_blkdev_diskname
": Unknown value of BIO_RW:%lu\n", BIO_RW (bio));
Goto Bio_err;
}

if ((Bio->bi_sector << simp_blkdev_sectorshift) + bio->bi_size
> Simp_blkdev_bytes) {
PRINTK (Kern_err simp_blkdev_diskname
": Bad Request:block=%llu, count=%u\n",
(unsigned long long) bio->bi_sector, bio->bi_size);
Goto Bio_err;
}

Dsk_offset = Bio->bi_sector << simp_blkdev_sectorshift;

Bio_for_each_segment (Bvec, bio, i) {
Iovec_mem = Kmap (bvec->bv_page) + bvec->bv_offset;
if (!IOVEC_MEM) {
PRINTK (Kern_err simp_blkdev_diskname
": Map Iovec Page failed:%p\n", bvec->bv_page);
Goto Bio_err;
}

if (Is_err_value (Simp_blkdev_trans (Dsk_offset, Iovec_mem,
Bvec->bv_len, dir)))
Goto Bio_err;

Kunmap (Bvec->bv_page);

Dsk_offset + = bvec->bv_len;
}

#if Linux_version_code < Kernel_version (2, 6, 24)
Bio_endio (bio, bio->bi_size, 0);
#else
Bio_endio (bio, 0);
#endif

return 0;

Bio_err:
#if Linux_version_code < Kernel_version (2, 6, 24)
Bio_endio (bio, 0,-eio);
#else
Bio_endio (bio,-eio);
#endif
return 0;
}

The code is functionally no different from the original,
We're just abstracting from the Simp_blkdev_trans () function that handles data transfer between a block device and a contiguous memory.
And the same function, but the data length conforms to block device data block length limit simp_blkdev_trans_oneseg () function.

In this way, the structure of the program is more obvious:
Simp_blkdev_make_request () is responsible for determining the direction of data transmission, checking that the bio request is legitimate, traversing each bvec in bio, and mapping the memory pages in the Bvec.
Then throw the rest of the work to Simp_blkdev_trans (),
and the Simp_blkdev_trans () function by splitting the request data to solve the data across multiple block device data block problem, and by the way the block device data block of the first page to find out,
Then invite the simp_blkdev_trans_oneseg () function to appear.
The Simp_blkdev_trans_oneseg () function is fortunate, because most of the pre-bedding work has been done, and it is as long as the leader planted the tree as the last shovel of soil,
Can usher in a warm applause. In fact, simp_blkdev_trans_oneseg () gets the memory corresponding to the page pointer and then performs the data transfer for a specified length in the given data direction.
SIMP_BLKDEV_TRANS_ONESEG () does not need to be concerned about whether the data length is beyond the bounds of block device data blocks, as the leader does not care about the tree.

The code in this chapter also doesn't do experiments, because we don't really have much to do.
As to whether it can be compiled, the author has tried, interested readers can probably verify that the previous sentence is not true.

As a prelude to supporting high-end memory, the previous chapter and this chapter have made some confusing changes.
But so far, the preparations have been done, and our program has laid a solid foundation for supporting high-end memory.
The next chapter will go to the point where we will achieve this long-awaited feature.

<未完,待续>


12th Chapter

+---------------------------------------------------+
| Write a block device driver |
+---------------------------------------------------+
| Zhao Lei |
| Email: [Email protected] |
+---------------------------------------------------+
| The copyright of the article belongs to the original author. |
| You are free to reprint this article, but the original copyright information must be retained.
| For commercial use, be sure to contact the original author if you have not obtained |
| Copyright disputes arising from the authorization shall be the sole responsibility of the infringer.
+---------------------------------------------------+

In this chapter we will implement support for high-end memory.

When girls get along, chatting with her, shopping, climbing, watching movies, playing chess every thing seems to be not too big relationship with marriage,
But after a day and a year, the girl may have seen you as part of her life subconsciously,
The final result appears to be so natural that even the proposal is somewhat superfluous.

Learning is also very similar, we seriously study every one of the same knowledge, and strive to seek every answer on its own terms,
Can not allow themselves to become experts, but experts have not experienced a long period of serious study,
The result of hard study and careful thinking.

As with our program, we went through the preparatory work in the previous chapters and the distance from the target function was probably not too far away.
And now all we have to do is implement it.

First, change the Alloc_diskmem () function to add a __GFP_HIGHMEM flag to the gfp_mask of the function that is requesting memory, that is, alloc_pages ().
This makes it a priority to use high-end memory when requesting blocks of memory for block devices.
The following functions have been modified:
int Alloc_diskmem (void)
{
int ret;
int i;
struct page *page;

Init_radix_tree (&simp_blkdev_data, Gfp_kernel);

for (i = 0; I < (simp_blkdev_bytes + simp_blkdev_datasegsize-1)
>> Simp_blkdev_datasegshift; i++) {
page = alloc_pages (Gfp_kernel | __gfp_zero | __gfp_highmem,
Simp_blkdev_datasegorder);
if (!page) {
ret =-enomem;
Goto Err_alloc;
}

ret = Radix_tree_insert (&simp_blkdev_data, I, page);
if (Is_err_value (ret))
Goto Err_radix_tree_insert;
}
return 0;

Err_radix_tree_insert:
__free_pages (page, simp_blkdev_datasegorder);
Err_alloc:
Free_diskmem ();
return ret;
}

But it's not all done yet, and with the high-end memory, we have to be able to use it.
It's like bringing back a hot-tempered mm is just the beginning, and the more crucial thing is how not to let others flaming slam the door after half an hour.
So we're going to continue to transform the code that uses memory, which is the simp_blkdev_trans_oneseg () function.

Before this function was very simple, because the application is low-end memory, which ensures that the memory has been mapped in the kernel's address space.
As a result, the conversion of the page pointer to the memory pointer is done using a page_address () function.
But for high-end memory It's not that simple.

First, high-end memory needs to be mapped to a non-linear mapping area prior to access, and this mapping should be lifted after access to prevent people from cursing our program like a public servant owes a white stripe,
We can use the Kmap () and Kunmap () functions to solve this problem.

Then we'll also consider another boundary problem, the page boundary.
Since the Kmap () function We use can only map one physical page at a time, when the data that needs to be accessed spans the page boundaries in the block device's memory block,
We need to identify the situation and make the appropriate processing, that is, multiple calls to the Kmap () and Kunmap () functions to access each page in turn.
We can deal with this situation in a way that is similar to the one in the previous section that handles the accessed data across multiple block device memory blocks.

In fact, in this case, we can also choose another scenario, that is, using the Vmap () function.
We can use it to map multiple physical pages with scattered addresses to a contiguous area of the address,
Of course there is no problem with the contiguous physical pages of these addresses that we are using as a block device storage space.
The problem is that the internal processing of the VMAP () function is more complex, which means that the vmap () function consumes more CPU time.
And when using the Vmap () function, we need to map all the pages that correspond to the length of the memory block at once.
But we tend not to access all of these pages, which means on the other hand the performance loss.
Therefore, we decided to choose to use the Kmap () function and let the program handle cross-page access problems on its own.

Referring to the above ideas, we have written a new simp_blkdev_trans_oneseg () function:
static int simp_blkdev_trans_oneseg (struct page *start_page,
unsigned long offset, void *buf, unsigned int len, int dir)
{
unsigned int done_cnt;
struct page *this_page;
unsigned int this_off;
unsigned int this_cnt;
void *dsk_mem;

done_cnt = 0;
while (Done_cnt < Len) {
/* Iterate each page */
This_page = Start_page + (offset + done_cnt) >> page_shift);
This_off = (offset + done_cnt) & ~page_mask;
this_cnt = Min (len-done_cnt, (unsigned int) page_size
-This_off);

Dsk_mem = Kmap (this_page);
if (!DSK_MEM) {
PRINTK (Kern_err simp_blkdev_diskname
": Map Device page failed:%p\n", this_page);
Return-enomem;
}
Dsk_mem + = This_off;

if (!dir)
memcpy (buf + done_cnt, Dsk_mem, this_cnt);
Else
memcpy (Dsk_mem, buf + done_cnt, this_cnt);

Kunmap (This_page);

done_cnt + = this_cnt;
}

return 0;
}

The core is to use the Kmap () function to map memory pages to kernel space and then access them.
To enable operation on high-end memory.

So far, the problems that have gone through a number of chapters have been solved.
With this change, we have at least two benefits:
1: Avoiding the scramble for valuable low-end memory
As a large memory consumer, the behavior of hogging low-end memory is not tolerated,
The reasons we have discussed in the previous chapters.
In the future, our procedures will at least not be despised in this respect.
2: Increase the maximum capacity of the block device
Using the original program, in i386 no matter how to build a block device of more than 896M capacity,
is actually smaller, because low-end memory is not likely to take all of the data from the block device,
And now the program can use all the free memory, including high-end memory,
This undoubtedly greatly increases the maximum capacity of the block device.

The experiments that were not carried out in the previous chapters are now finally ready to begin.

First of all, it proves that the program is still compiled after so many chapters of the toss:
# make
Make-c/lib/modules/2.6.18-53.el5/build SUBDIRS=/ROOT/TEST/SIMP_BLKDEV/SIMP_BLKDEV_STEP12 Modules
MAKE[1]: Entering directory '/usr/src/kernels/2.6.18-53.el5-i686 '
CC [M]/root/test/simp_blkdev/simp_blkdev_step12/simp_blkdev.o
Building modules, Stage 2.
Modpost
cc/root/test/simp_blkdev/simp_blkdev_step12/simp_blkdev.mod.o
LD [M]/root/test/simp_blkdev/simp_blkdev_step12/simp_blkdev.ko
MAKE[1]: Leaving directory '/usr/src/kernels/2.6.18-53.el5-i686 '
#

Then look at the current memory state:
# Cat/proc/meminfo
...
hightotal:1146816 KB
highfree:509320 KB
lowtotal:896356 KB
lowfree:872612 KB
...
#
We see high-end memory with low-end memory remaining 509M and 872M respectively.

Then load the current module, in order to let the module eat memory behavior more conspicuous,
We use the size parameter to specify a larger block device capacity:
# Insmod Simp_blkdev.ko size=500m
#

Now look at the memory changes:
# Cat/proc/meminfo
...
hightotal:1146816 KB
highfree:1652 KB
lowtotal:896356 KB
lowfree:863696 KB
...
#
The results show that the module consumes about 500M of high-end memory, as we expected.
Although the low-end memory seems to be less, we can not use the memory space occupied by the module itself to explain this phenomenon,
Because the code of the module and the memory of static data can not reach 8.9M anyway,
Maybe we'll explain it as a cache for some file operations, and a memory for the base tree structure,
The memory that this structure occupies will increase with the capacity of the block device, or we can calculate ...
But now we're not going to pay too much attention to this little problem, because it's nonsense,
Just as the final investigation of the spotted incident was a mere public money to buy cigarettes.
So we will not dwell on this 8.9M problem, because it is clear that the big head is in the reduction of more than 500 trillion high-end memory,
This reduced 500M high-end memory is enough to justify the changes in these chapters.

Let's remove the module and look at the state of the memory:
# Rmmod Simp_blkdev
# Cat/proc/meminfo
...
hightotal:1146816 KB
highfree:504684 KB
lowtotal:896356 KB
lowfree:868480 KB
...
#
The high-end memory that had just been occupied came back,
Everything seemed so harmonious.

As a last-step test, we do one thing we couldn't do before this chapter,
is to apply for more than 896M of memory.
Just now we saw that the remaining low-end memory and high-end memory reached a total of 1.37G,
All right, we'll apply for 1.3G:
# Insmod Simp_blkdev.ko size=1300m
#
We were pleasantly surprised to find that the system was not down.

Then look at the memory situation at this time:
# Cat/proc/meminfo
...
hightotal:1146816 KB
highfree:41204 KB
lowtotal:896356 KB
lowfree:48284 KB
...
#
High-end memory and the big head in the low-end memory are basically eaten,
The number is almost 1.3G, which is in line with our expectations.

It's not a good idea to keep the module occupied with so much memory.
Let's put it off:
# Rmmod Simp_blkdev.
#

With the end of this chapter, discussions around high-end memory have finally been made.
But our improvements to this driver are not complete, because we have to carry forward the spirit of doing everything,
The revitalization of a nation does not depend on the education of the pupils to cram the political ideology, nor does it depend on the officials and their families to study abroad,
Not to rely on public servants Shenxianshizu, fought, at their own health at the expense of eating and drinking to create 900 billion of GDP,
But relying on the honesty, seriousness, diligence, bravery, creativity, dedication and excellence of every fart people.

<未完,待续>
Read (372) | Comments (0) | Forwards (1) | 0

Previous post: Write a block device driver 13,14

Next post: Write a block device driver 9,10

Related Popular articles
    • sHTML is what _ssi have what use ...
    • String manipulation in the shell
    • The principle of Kalman filter explains ...
    • About "Error in Java: Not found or ...
    • Special characters in the shell
    • Linux DHCP Peizhi ROC
    • Soft links to Unix files
    • What does this command mean, I'm new ...
    • What does sed-e "/grep/d" mean ...
    • Who can help me solve Linux 2.6 10 ...
Leave something to the owner! ~~ Comment on the hot topic

Write a block device driver 11,12

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.