Design and Implementation of an analysis file system from ramfs

Source: Internet
Author: User

Author: Yang honggang, Lanzhou University

Mailbox: eagle.rtlinux@gmail.com/rtlinux@163.com

--------------------------------------------------------------------

Ref:
Linux-3.6.9/fs/ramfs
2. Documentation/filesystems/ramfs-rootfs-initramfs.txt

Directory:

0. Relationship between data structures of the file system

1. Analyzed the implementation of ramfs, the simplest Linux File System.

Analyzed the implementation of ramfs, the simplest file system.

2. Summary of the minimal File System Design

Summarize the analysis of ramfs to find out the key to the design and implementation of the Minimum File System.

3. Minimum File System instance

This part basically copies the implementation of ramfs. Through this process and the implementation of related functions, we can study

The role of each function. Deepen understanding of FS.

Bytes -------------------------------------------------------------------------------------------------------------------

Chapter I relationship between data structures of the file system

Take ext4 as an example before starting to show the relationship between the data structures of the entire file system. After passing

Gradually understand the meaning of the entire graph. HD image download

Chapter 1 analyzes the implementation of ramfs, the simplest Linux File System.

Ramfs shows how to design a virtual file system ).
It implements all the logic of a POSIX compliant file system.
The file system does not implement an independent data structure, but only uses the data defined in VFS
Structure.

Ramfs exports the Linux disk buffer to a ram-based file that can be dynamically adjusted
System. Ramfs does not have a backup storage source. Write operations to files in ramfs will also
Allocate directory items and page cache, but the data is not written back to any other storage media.
This means that the Memory Page involved will not be marked as "clean" so that the VM will not
Reclaim the memory allocated to ramfs.

The implementation of ramfs is completely based on the existing Linux Buffer mechanism, so its code
Few (no more than 634 rows ).

First, check the modules on which ramfs depends by viewing the MAKEFILE file.
OBJ-y + = ramfs. o

File-MMU-Y: = file-nommu.o
File-MMU-$ (config_mmu): = file-mmu.o
Ramfs-objs + = inode. o $ (file-MMU-y)

It can be seen that ramfs-objs is created by inode. C. If config_mmu is configured as y
The file-mmu.c is also included; otherwise, the file-nommu.c is included.
The following discussion assumes that config_mmu is Y.

--- Include/Linux/ramfs. h -----
This file is the interface provided by the ramfs module to other modules. Including:

Function
Ramfs_get_inode ()
Ramfs_mount ()
Init_rootfs ()
Ramfs_fill_super ()

Data Structure
Const struct file_operations ramfs_file_operations
Vm_operations_struct generic_file_vm_ops

------------------------------
---- Fs/ramfs/file-mmu.c -----
------------------------------

In this file, there are ramfs_file_operations, ramfs_aops, and ramfs_file_inode_operations.
.

Const struct address_space_operations ramfs_aops = {
. Readpage = simple_readpage,
. Write_begin = simple_write_begin,
. Write_end = simple_write_end,
. Set_page_dirty = _ set_page_dirty_no_writeback,
};
In this structure, it is the "address space" structure of ramfs. "Address space" structure,
Associate the cached data with its source. This structure is used by ramfs to apply for a memory page from the memory module and provides
Reads data from the backup data source, fills the buffer, and writes data in the buffer to the backup device.

Const struct file_operations ramfs_file_operations = {
. Read = do_sync_read,
. Aio_read = generic_file_aio_read,
. Write = do_sync_write,
. Aio_write = generic_file_aio_write,
. MMAP = generic_file_mmap,
. Fsync = noop_fsync,
. Splice_read = generic_file_splice_read,
. Splice_write = generic_file_splice_write,
. Llseek = generic_file_llseek,
};
This structure is a set of common file operation functions for files in ramfs.

Const struct inode_operations ramfs_file_inode_operations = {
. Setattr = simple_setattr,
. Getattr = simple_getattr,
};
The struct structure includes a set of file node operation functions of ramfs.

-------------------------
---- Fs/ramfs/inode. c ---
-------------------------

Ramfs module initialization function:
Static int _ init init_ramfs_fs (void );
Call the kernel function register_filesystem () to register ramfs.
Initialization of the description structure rootfs_fs_type of ramfs is as follows:
Static struct file_system_type ramfs_fs_type = {
. Name = "ramfs ",
. Mount = ramfs_mount,
. Kill_sb = ramfs_kill_sb,
};
"Ramfs" is the name of the file system.
Ramfs_mount is a ramfs Mount operation.
When ramfs_kill_sb no longer requires the ramfs file system.

------------------------------------------------------
Purpose: Initialize the super block of ramfs.
Note: initialization includes mounting parameters, root inode, and root directory items.
Int ramfs_fill_super (struct super_block * Sb, void * data, int silent );

[1] Save the mount parameters of the file system to the SB. s_options field. This value is used for generic_show_options ()
Function to display Mounting Parameters of the file system.
[2] allocate the ramfs_fs_info Structure specific to ramfs. Its prototype is:

Struct ramfs_mount_opts {
Umode_t mode;
};

Struct ramfs_fs_info {
Struct ramfs_mount_opts mount_opts;
};
[3] The pointer to the ramfs_fs_info structure is stored in the private field s_fs_info of the ramfs super block.
[4] Call ramfs_parse_options () to parse the mount parameters. Mount
XX is saved in the Mode Field of ramfs_mount_opts. The default value is 0755. Other mount parameters are ignored.
[5] The s_maxbytes of the initialization super block is max_lfs_filesize. This field indicates the maximum file length.
The s_blocksize field of the super block is page_cache_size. This field indicates the length of the file system block,
The Unit is byte. Another field is s_blocksize_bits, which also indicates the length of the file system block,
It only takes the base 2 Logarithm for s_blocksize.
Initialize the s_magic field of the super block. This field is the magic number of the super block and is used to check the damage of the super block.
The s_op field of the super block is ramfs_ops. This structure contains operations for processing Super blocks.

Static const struct super_operations ramfs_ops = {
. Statfs = simple_statfs, // provides the statistics of the file system, such as the number of used and unused data blocks, or the maximum length of the file name.
. Drop_inode = generic_delete_inode, // Delete inode when the reference count of inode is reduced to 0.
. Show_options = generic_show_options, // displays the file system load options.
};

Initialize the s_time_gran field of the super block. It indicates the maximum possible granularity of various timestamps supported by the file system. Unit: ns.
[6] Call ramfs_get_inode () to generate a ramfs superblock representing the root node inode. Initialize the inode fields. Finally, return
Pointer to the root inode. (For more information, see the following section .)
[7] Call d_make_root () to create and initialize a directory for the root inode.

------------------------------------------------------
Purpose: search for or create a VFS super_block structure. Execute ramfs_fill_supper () to overwrite ramfs
Block. Then, add one to the reference count of the root directory.
Struct dentry * ramfs_mount (struct file_system_type * fs_type,
Int flags, const char * dev_name, void * data );

-----------------------------------------------

Static void ramfs_kill_sb (struct super_block * SB)

[1] Release ramfs's private data structure ramfs_fs_info.
[2] kill_litter_super () is called to clear the super block.

------------------------------------------------------
Struct inode * ramfs_get_inode (struct super_block * Sb,
Const struct inode * Dir, umode_t mode, dev_t Dev)

[1] Call new_inode () to create an inode structure that complies with the ramfs type.
[2] inode initialization number.
Initialize the UID, GID, and mode fields of inode.
The "address space" operation set for initializing inode is ramfs_aops, which is related to reading and writing ramfs file content.
The information about the backup storage device that initializes inode is ramfs_backing_dev_info. A backup storage device is an external device related to an "address space" and is the source and destination of data.

Add the gfp_highuser tag for the inode flags field. The flags field flag set is used to save information about the green CODE memory zone from which the ing page comes. This field can also be
It is used to save the error information generated during asynchronous transmission. errors cannot be directly transmitted to the caller during asynchronous transmission. As_eio indicates a general I/O error, and as_enospc indicates that there is not enough space.
To complete an asynchronous write operation. Gfp_highuser indicates that the physical memory of the Memory Page is preferentially obtained from the high-end memory area.

Add the as_unevictable flag to the flags of inode. In this way, the memory pages involved in ramfs will be placed in the unevictable_list, and these memory pages will not be recycled.

Initialize I _atime, I _mtime, and I _ctime of inode as the current time.

[3] set four different operation sets for inode Based on Mode settings:
-Reg // common file
Set inode I _op TO THE ramfs_file_inode_operations operation set.
Set I _fop of inode to ramfs_file_operations.
-Dir // folder
Set inode I _op TO THE ramfs_dir_inode_operations operation set.
Set I _fop of inode to simple_dir_operations.
-Link // link
Set I _op to page_symlink_inode_operations.
[4] returns the initialized inode pointer.

--------- Address Space Operation -------------------
Const struct address_space_operations ramfs_aops = {
. Readpage = simple_readpage,
. Write_begin = simple_write_begin,
. Write_end = simple_write_end,
. Set_page_dirty = _ set_page_dirty_no_writeback,
};
Simple_readpage: used to read a page of data from the backup storage. There is no backup storage for ramfs and only RAM.

Simple_write_begin:
Calculate the page offset value of the file based on the file's read/write position POS.
Search for or allocate a Page Structure Based on the index.
Initialize the data in the page as "0 ".

Simple_write_end: After the page is written, update the page.

_ Set_page_dirty_no_writeback: Mark a page as "dirty ". However, the write-back operation is not performed,
Because ramfs does not need to be written back to the disk. This is also why the previously written data is lost after ramfs is overwritten and mounted.

------------ Backup storage -----------------
Static struct backing_dev_info ramfs_backing_dev_info = {
. Name = "ramfs ",
. Ra_pages = 0,/* No readahead */
. Capabilities = bdi_cap_no_acct_and_writeback |
Bdi_cap_map_direct | bdi_cap_map_copy |
Bdi_cap_read_map | bdi_cap_write_map | bdi_cap_exec_map,
};
This structure describes the description of the backup storage device as the data source and destination.

Ra_pages: sets the maximum number of pre-reads.
Bdi_cap_no_acct_and_writeback does not need to be written back, dirty page statistics are not executed, and write-Back Page statistics are not automatically calculated.
Bdi_cap_map_direct can be directly mapped.
Bdi_cap_map_copy can be mapped.
Bdi_cap_read_map can be read by ing.
Bdi_cap_write_map can be mapped to write.
Bdi_cap_exec_map can be mapped to execute code.

----------- File node operation set ------------

Const struct inode_operations ramfs_file_inode_operations = {
. Setattr = simple_setattr,
. Getattr = simple_getattr,
};

Simple_setattr: A simple attribute setting function. Only for memory file systems or special file systems.
If the metadata on the disk needs to be modified when the file size changes, the file system must provide relevant
.
Simple_getattr: obtains the attributes of a file, such as Ino, mode, nlink, uid, and GID.

------------- File operation set ----------------

Const struct file_operations ramfs_file_operations = {
. Read = do_sync_read,
. Aio_read = generic_file_aio_read,
. Write = do_sync_write,
. Aio_write = generic_file_aio_write,
. MMAP = generic_file_mmap,
. Fsync = noop_fsync,
. Splice_read = generic_file_splice_read,
. Splice_write = generic_file_splice_write,
. Llseek = generic_file_llseek,
};

All common file operation methods provided for the VFS layer.

------------ Folder node operation set -------

Static const struct inode_operations ramfs_dir_inode_operations = {
. Create = ramfs_create,
. Lookup = simple_lookup,
. Link = simple_link,
. Unlink = simple_unlink,
. Symlink = ramfs_symlink,
. Mkdir = ramfs_mkdir,
. Rmdir = simple_rmdir,
. Mknod = ramfs_mknod,
. Rename = simple_rename,
};

Static int ramfs_create (struct inode * Dir, struct dentry * dentry, umode_t mode, bool excl)
: Create a common file under the directory @ Dir, and create a directory item @ dentry associated with the file under @ dir.

Simple_lookup: common search operations. Search for the corresponding inode instance based on the file system object name (string .???
Set the d_delete field to simple_delete_dentry () for the Directory item ().
Add the directory items to the hash table of the Directory items for quick search.
Int simple_link (struct dentry * old_dentry, struct inode * Dir, struct dentry * dentry)
Create a hard link. Create a hard link from the @ dentry directory entry under @ dir to the inode managed by the @ old_dentry directory entry.
Simple_unlink: the inverse operation of ssimple_link.

Static int ramfs_symlink (struct inode * Dir, struct dentry * dentry, const char * symname)
Create and initialize a link inode under the directory @ dir. The link path is @ symname.

Static int ramfs_mkdir (struct inode * Dir, struct dentry * dentry, umode_t Mode)
Create a directory entry inode under the directory @ Dir and associate it with @ dentry.
Simple_rmdir: the inverse operation of ramfs_mkdir.

Ramfs_mknod: allocate and initialize a device inode under @ Dir and associate it with @ dentry.

------------ Simple_dir_operations ----------
Const struct file_operations simple_dir_operations = {
. Open = dcache_dir_open,
. Release = dcache_dir_close,
. Llseek = dcache_dir_lseek,
. Read = generic_read_dir,
. Readdir = dcache_readdir,
. Fsync = noop_fsync,
};

The ramfs folder operation directly uses simple_dir_operations.

Chapter 2 Summary of Minimum File System Design

* ********* Summary: Minimum File System Design

1. Required operations

-File System Registration
Use register_filesystem () to register a file system with the kernel.
-File System uninstallation
Unregister_filesystem () is used to uninstall the file system.

2. Data structure to be provided

When you use register_filesystem () to register a file system with the kernel, you must provide an instance of [1] struct file_system_type,
This instance describes the basic information of the file system. It mainly includes @ name, the name of the file system. @ Mount operations related to file system mounting,
During mounting, you must provide the super_block initialization operation. @ Kill_sb provides the super block cleanup operation.

When initializing a file system superblock, you must specify a collection of super block operations. This requires a [2] struct super_operations structure.
Mainly include @ statfs, which is used to provide statistics of the file system, such as the number of unused data blocks and the maximum length of file names.
@ Drop_inode: delete inode when the reference count of inode is reduced to 0.
@ Show_options: displays the mount options of the file system.

When assigning a new inode, You need to initialize inode operations. This requires initializing a [3] struct address_space_operations.
The operation set with the "address space" structure. The address space establishes an association between data in the memory and its data source. When more physical
When the memory is used, it is responsible for applying for memory from the memory module. It also provides the ability to read data from the backup data source, populate the memory buffer, and update the data.
Write back to the backup device. Here, @ readpage is used to read one page of data. It is usually initialized to the standard function mpage_readpage of the kernel.
Initialize to simple_readpage in ramfs. @ Write_begin is called by a common buffer write function to notify the File System
Write several bytes to the specific offset of the file. The file system allocates memory space as needed.
After @ readpage and data copy are successfully performed, @ write_end must be called.
@ Set_page_dirty is called by VM to mark a page as dirty. Set the pagedirty tag of the page and the pagecache_tag_dirty tag of the base tree.

[4] struct backing_dev_info contains information about the backup storage related to the address space. The backup storage refers to the address space
An external device used as the source of data in an address space, usually a block device. @ Name is the name of the backup storage device. @ Ra_pages is the maximum pre-read value.
In page_cache_size. @ Capabilities the most important information is whether data pages can be written back. For example, bdi_cap_no_acct_and_writeback
Indicates that no write-back is required, dirty page statistics are not executed, and write-Back Page statistics are not automatically calculated.

[5] struct inode_operations is an operation set related to folder nodes. This mainly includes creating/deleting common files, device files, directory files,
Link files, search, rename, and other related operations.
[6] A Collection of operations related to the struct file_operations folder.

[7] struct inode_operations is the file node-related operation set.
[8] The operation set related to the struct file_operations file.

Chapter 3 Minimum File System instances

* ********** Minimum File System instance

Wendyfs is a ram-based file system extracted from ramfs to help readers quickly master the design of the Linux File System. Implementation
See wendyfs. C. Use methods to meet readme.
The. RENAME operation in wendyfs_dir_inode_operations is commented out in the implementation, so that the RENAME operation cannot be used in wendyfs. For example,
MV filea fileb. In this way, we can study the role of each method.

// Wendyfs. c/** most of the Code in this module is copied from FS/ramfs/inode. C and fs/ramfs/file-mmu.c. * It mainly implements the basic functions of a ram file system (except renaming a file MV filea fileb) * This module can help us learn about the Linux File System. */# Include <Linux/Fs. h>/* super_operations */# include <Linux/module. h> # include <Linux/init. h> # include <Linux/pagemap. h>/* mapping_set_unevictable () */# include <Linux/time. h>/* current_time */# include <Linux/backing-dev.h> // shocould be defined in the heaeder files # define wendyfs_magic login inode * wendyfs_get_inode (struct super_block * Sb, const struct inode * Dir, umode_t mode, dev_t de V); int myset_page_dirty_no_writeback (struct page * page);/* wendyfs's address space operations */const struct address_space_operations wendyfs_aops = {. readpage = simple_readpage ,. write_begin = simple_write_begin ,. write_end = simple_write_end ,. set_page_dirty = myset_page_dirty_no_writeback,}; static struct backing_dev_info wendyfs_backing_dev_info = {. name = "wendyfs ",. ra_pages = 0,/* No readahead */. Capabilities = bytes | bdi_cap_map_direct | bytes | bdi_cap_read_map | bdi_cap_write_map | bdi_cap_exec_map,};/** for address_spaces which do not use buffers nor write back. */INT myset_page_dirty_no_writeback (struct page * Page) {If (! Pagedirty (page) return! Testsetpagedirty (PAGE); return 0;}/* file creation */static int wendyfs_mknod (struct inode * Dir, struct dentry * dentry, umode_t mode, dev_t Dev) {int error =-enospc; struct inode * inode = wendyfs_get_inode (Dir-> I _sb, Dir, mode, Dev); If (inode) {d_instantiate (dentry, inode ); DGET (dentry);/* extra count-pin the dentry in core */error = 0; Dir-> I _mtime = Dir-> I _ctime = current_time;} return error;} stat IC int wendyfs_mkdir (struct inode * Dir, struct dentry * dentry, umode_t mode) {int retval = wendyfs_mknod (Dir, dentry, mode | s_ifdir, 0 ); // dir inode if (! Retval) inc_nlink (DIR); return retval;} static int wendyfs_create (struct inode * Dir, struct dentry * dentry, umode_t mode, bool excl) {return wendyfs_mknod (Dir, dentry, mode | s_ifreg, 0);} static const struct inode_operations wendyfs_dir_inode_operations = {. create = wendyfs_create ,. lookup = simple_lookup ,. mkdir = wendyfs_mkdir ,. rmdir = simple_rmdir ,. link = simple_link ,. unlink = simple_unlink ,//. rename = Sim Ple_rename,}; const struct inode_operations wendyfs_file_inode_operations = {. setattr = simple_setattr ,. getattr = simple_getattr,}; const struct file_operations wendyfs_file_operations = {. read = do_sync_read ,. aio_read = generic_file_aio_read ,. write = do_sync_write ,. aio_write = generic_file_aio_write ,. fsync = noop_fsync ,. llseek = generic_file_llseek,}; struct inode * wendyfs_get_inode (struct supe R_block * Sb, const struct inode * Dir, umode_t mode, dev_t Dev) {/* allocate one inode */struct inode * inode = new_inode (SB ); /* init the inode */If (inode) {inode-> I _ino = get_next_ino ();/* init uid, GID, mode for new inode according to POSIX Standards */inode_init_owner (inode, Dir, mode);/* set the address space operation set */inode-> I _mapping-> a_ops = & wendyfs_aops; /* set the backing device I Nfo */inode-> I _mapping-> backing_dev_info = & wendyfs_backing_dev_info;/* The pages wendyfs covered will be placed on unevictable_list. so these pages will not be reclaimed. */mapping_set_unevictable (inode-> I _mapping); inode-> I _atime = inode-> I _mtime = inode-> I _ctime = current_time; /* Set inode and file operation sets */switch (Mode & s_ifmt) {default: init_special_inode (inode, mode, Dev); break; Case S_ifdir:/* dir inode operation set */inode-> I _op = & strong;/* dir operation set */inode-> I _fop = & simple_dir_operations; inc_nlink (inode); break; case s_ifreg:/* regular file inode operation set */inode-> I _op = & strong;/* regular file operation set */inode-> I _fop = & wendyfs_file_operations; break ;}} return inode;}/* super block related operations */static Co NST struct super_operations wendyfs_ops = {. statfs = simple_statfs ,. drop_inode = generic_delete_inode ,. show_options = generic_show_options,}; int wendyfs_fill_super (struct super_block * Sb, void * data, int silent) {struct inode * inode; Sb-> s_maxbytes = 4096; sb-> s_blocksize = 4096; Sb-> s_blocksize_bits = 12; Sb-> s_magic = wendyfs_magic;/* set super block operations */Sb-> s_op = & wendyfs_ops; sb-> s_time_gran = 1;/* Create and initialize the root inode */inode = wendyfs_get_inode (SB, null, s_ifdir, 0); Sb-> s_root = d_make_root (inode); If (! Sb-> s_root) Return-enomem; return 0;} struct dentry * wendyfs_mount (struct file_system_type * fs_type, int flags, const char * dev_name, void * Data) {return mount_nodev (fs_type, flags, Data, wendyfs_fill_super);} static void wendyfs_kill_sb (struct super_block * SB) {kill_litter_super (SB);} static struct detail = {. name = "wendyfs ",. mount = wendyfs_mount ,. kill_sb = success,}; static int _ init init_wendy_fs (void) {return register_filesystem (& wendy_fs_type);} static void _ exit exit_wendy_fs (void) {unregister_filesystem (& wendy_fs_type);} module_init (modules) module_exit (modules) module_author ("Yang honggang, <eagle.rtlinux@gmail.com>"); module_license ("GPL ");

//READMEHowto use:0. Compile wendyfs module   $make1. Insert wendyfs module#insmod wendyfs.ko2. Check the filesystem list    #cat /proc/filesystems    ...nodevwendyfs3. Mount the wendyfs     #mount -t wendyfs none /mnt4. Create a dir in the mount point    #cd /mnt    #mkdir hello5. Delete the hello dir    #rm -rf hello6. Create a file and rename it   #echo "hello" > halo   #cat halo   #mv halo hello   mv: cannot move `halo' to `hello': Operation not permitted   ...

Makefile

obj-m := wendyfs.odefault:make -C /lib/modules/`uname -r`/build M=`pwd` modulesclean:rm modules.order Module.symvers *.ko *.mod.* *.o .tmp_versions/ .*cmd -rf

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.