Analysis of reading and writing process of linux0.11 block equipment __linux

Source: Internet
Author: User
Tags goto hash

Linux treats all devices as files, and so does for block devices. Block devices are common to have floppy disk and hard disk, although today's hard drive transmission rate is very fast, but still far from memory, so when the CPU access to such block devices need to use a buffer to achieve maximum efficiency. Therefore, before analyzing the data transmission process of block equipment, we should understand the structure of the buffer area (cache).

Linux0.11 partition memory as cache, this part of memory from the end of the kernel to 4 m memory, a total of more than 3,000 logical blocks, note that a logical block size of 1KB, and the minix1.0 file system definition of the size of the disk block consistent.

When the user calls the high-level block device write function (Block_write), first the function based on the number of bytes to write to the cache to request the appropriate buffer block (less than one to calculate), and then copy the user to write the data to the corresponding buffer block, and finally the buffer block b_ Dirt member placement, which means that the buffer block is not synchronized to the device and frees the block to wake the user process.

Read (Block_read) process is similar to write, first of all, the function is to be written according to the number of bytes to the cache to request the appropriate buffer block (less than one to calculate), and then copy the data in the buffer block to the user buffer, and finally release the block to wake the user process.

It can be seen that the block device to read and write only with the cache, not directly manipulate the block equipment, so greatly improve the efficiency of the block device reading and writing. But here are a few key points to note:

1. When a file is first read, because there is no cache for the file in the buffer, the operation is still slowly waiting for the IO operation of the block device (low-level ll_rw_block), but this is a blocking operation, which means that the task will be suspended during the wait, so it does not affect the execution of other processes.

2. For the write operation because it is on the cache, of course, this process will be very fast, but eventually the data will be written to the device (low-level ll_rw_block), that is, to do a synchronization. When to sync, first, if the next time from the cache to apply to just have been previously applied but has not been synchronized block (B_dirt), then the block synchronized; second, the user actively calls the Sync system call, manual synchronization.



How Linux manages this buffer zone.

First, there may be these questions: how to apply for a buffer block. How to associate with the device and block number. How to find the buffer block according to the device number and block number.

Perhaps the easiest way to think of it first is to string the 3,000-odd buffers together in a linked list, start indexing from the head of the list when a block is needed, and then find an idle item to allocate. However, when looking for a buffer block, also want to start from scratch, so the time complexity of looking for the worst is O (n), preferably O (1).

Although this method is simple, but the biggest problem is the search efficiency is low, Linux in order to improve the efficiency of the search using a hash table to manage the buffer block. We know that using the hash table to find, the algorithm time complexity is theoretically o (1), for the hash queue is not the buffer block and then use the LRU strategy to find, so compared to the previous method, its advantages are obvious.

A hash table finds an element that is based on the location of the element by calculating the hash value (key value) so that the element can be accessed directly, and the hash value is obtained according to the keyword according to some algorithm (hash function). In the linux0.11 system. Obviously the keywords here need to involve the device number and the logical block number, and Linux uses the hash function as the remainder method, namely: The device number (XOR) logical block number Mod p. We know that using hash table to avoid hash conflict, so the choice of P is very important, linux0.11 selected value and hash table length is the same, 307.

How the specific process is done.

First, when initialized, all the cache blocks are connected in turn into a two-way circular list, as shown in the following figure.


When the high level in the execution of reading and writing block devices will call the GETBLK function allocation buffer block, the function will first according to the device number and logical block number according to the hash function to get the hash value key (assuming the device number is 0x0201, logical block number is 1500,key=171), In fact, the key value is the index number of the array of hash_table pointers, and then retrieves the hash_table to see if the hash_table[171] points to a valid buffer block, if it exists, it is returned directly, and if it does not, the first scan A hash conflict may be moved to a later cache block (find_buffer,tmp->b_next) and, if not found, an idle buffer block is found in more than 3,000 buffers (Free_list always points to the most recently used block, LRU), and assigns the pointer of the buffer block to hash_table[171]. At this point, a buffer block in the buffer will be associated with the device number (0x0201) and the logical Block number (1500), and the block can be found directly according to the hash value when the next high-level hash_table the block.

Attention:

1. The free_list pointer changes in two functions per Insert_into_queues and remove_from_queues, and it always points to the last cache block that was inserted .

2. each item in the hash_table array maintains a queue (bidirectional list), and when a device accesses a logical block, the corresponding pointer to the hash value points to a buffer block that has been accessed. The newly allocated buffer block is then inserted in front of the old buffer block. This operation seems to be done to resolve the hash conflict, so it is possible to access a hash item that already exists without getting the buffer block (possibly the next piece of the buffer block).

Appendix 1:

struct Buffer_head {
	char * b_data;			/* Pointer to data block (1024 bytes)/
	unsigned long b_blocknr;	/* Block Number * *
	unsigned short b_dev;		/* Device (0 = free) * *
	unsigned char b_uptodate;
	unsigned char b_dirt;		/* 0-clean,1-dirty * *
	unsigned char b_count;		/* Users using this block *
	/unsigned char b_lock;		/* 0-ok, 1-locked
	* * struct task_struct * b_wait;
	struct Buffer_head * B_prev;
	struct Buffer_head * b_next;
	struct Buffer_head * b_prev_free;
	struct Buffer_head * b_next_free;

B_uptodate: Indicates whether the data in this buffer is synchronized with the block device, and in the Bitmap.c file, the allocation block is placed and cleared. Obviously when you read block, you need to check it. B_dirt: Indicates whether the data in the buffer has been modified. If 1 (block_write is placed), it has been modified, otherwise it has not been modified.
B_count: When a buffer block is successfully fetched by the process, the value is +1, and the Block is released-1 (Getblk succeeds, then +1,brelse-1).

The b_lock:ll_rw_blk.c file is Lock_buffer and Unlock_buffer unlocked to check whether the buffer block's data is modified (WRITE) or updated (READ) in the Make_request function.

Appendix 2:

struct Buffer_head * GETBLK (int dev,int block) {struct Buffer_head * tmp, * BH; Repeat:if (BH = get_hash_table (dev,block))//If the corresponding block is found in hash_table, return.
		It's too easy.
	return BH;
	TMP = free_list;//is not found in Hash_table, the free block is found in the bidirectional cyclic list of all buffer blocks, noting that Free_list always points to the most recently used buffer block.
			do {if (Tmp->b_count)//If the block has process access, the next piece is detected.
		Continue if (!BH | |
			Badness (TMP) <badness (BH)) {//Determines whether the flag weight (dirt,lock) of the TMP buffer block is less than the weight of BH.
			BH = tmp; if (!
				Badness (TMP))//If Dirt,lock is 0, the free buffer block BH is found.
		Break
	}/* and repeat until we find something good/} while (TMP = tmp->b_next_free)!= free_list); if (!BH) {//If the loop entire buffer block list does not find an idle buffer block, the sleep process waits.
		If there is an idle buffer block, the process wakes up (the Brelse function wakes it up), and then repeat the lookup.
		SLEEP_ON (&buffer_wait);
	Goto repeat; } wait_on_buffer (BH); Wait for the buffer block to unlock.
	Possibly Make_request function which is being locked for checking. if (bh->b_count)//unlock, the discovery is occupied.
	Then repeat new to find Goto repeat; while (Bh->b_dirt) {//Here, the block is not occupied.
		However, if the buffer block is found to be modified, synchronize the disk and wait for the unlock again.
		Sync_dev (Bh->b_dev);
		Wait_on_buffer (BH); if (bh->b_count)//Waiting to be completed after discoverybe occupied again.
			Well, luck is worse than repeat.
	Goto repeat; }/* note!! While we are slept waiting for this block, somebody else might * * already have added "this" blocks to the cache.
Check IT/if (Find_buffer (Dev,block))//Because before sleep, here again check whether the buffer block in the sleep and other processes occupy, if not empty, OK, repeat goto repeat; /* OK, FINALLY We know that this is the only one of it ' s kind,////* and that it's unused (b_count=0), unlocked (b_  lock=0), and clean */bh->b_count=1;
	Everything goes well, find a valid buffer block bh->b_dirt=0;
	bh->b_uptodate=0; Remove_from_queues (BH); Before using this block, you first need to remove it from the hash_table because we want to specify a new device number and a logical block number for that block. The block is then inserted into the bash_table.
	Pay special attention to where the two functions insert and remove the block.
	bh->b_dev=dev;
	bh->b_blocknr=block;
	Insert_into_queues (BH);
return BH;
 }

Reference documents:

Zhao, Linux kernel fully commented.





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.