Operating System Learning Notes: File system implementation

Last Update:2015-11-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. File system structure
To provide efficient and convenient access to the disk, the operating system easily stores, locates, and extracts data through the file system. The file system has two different design problems: 1, how to define the interface of the file system to the user, involving the file and its attributes, the operation allowed by the file, the directory structure of the organization file, 2, creating the data structure and algorithm to map the logical file system to the physical external memory device.

There are many file systems currently in use. Most operating systems support multiple file systems.

Second, file system implementation

Implementing a file system requires a variety of disk and memory architectures.
On disk, the file system includes the operating system, the total number of blocks, the number of free blocks and location, directory structure, specific files, etc.
Memory is used for file system management, caching to improve performance. Open a file, the in-memory open File table will add an entry, vice versa. Call Open () to return a pointer to the file that Windows calls a file handle (handle).

Disk partitions are divided into root partitions and other partitions. The root partition has an operating system kernel or other system files, which boot into memory when booting. Other partitions can be mounted manually after booting or after the boot. The bootstrap information itself has its own format, but it does not explain the file system. The boot loader of the system first loads the boot information and then directs the operating system on the disk, which interprets and uses the file system by the operating system.

There are also partitions that do not have a file system, that is, not yet formatted, this predicate partition (or raw, raw). If there is a file system, it is cooked (cooked). Raw disks can be used by the operating system for special purposes where there is no suitable file system, and some databases bypass the operating system's own format for direct use and can also be used for raid.

The operating system supports multiple file system types at the same time. Most operating systems use object-oriented technology to achieve this by implementing different implementations of file system interfaces. This also includes network file system types, such as NFS.

Third, the directory implementation
Directory allocation and directory management algorithms have a significant impact on file system efficiency, performance, and reliability.

1. Linear list
Stores a linear list of file names and data block pointers. Simple, but not high performance. When you add or delete files in this directory, you need a linear search to make sure that no duplicate names or files are found. There are, of course, many ways to improve, such as caching, sorting, or tagging, transferring items, and so on.

2. Hash table
Significantly reduce directory search time, insert and delete is also relatively simple. Hashing algorithms are actually designed to resolve conflicts.

Iv. Methods of distribution
One disk can store multiple files, so how can you allocate space for them to effectively use disk space and fast access? There are three common methods: continuous, link, and index.
1. Continuous distribution
The file occupies a contiguous block on the disk, so the head movement is minimal when accessed, and the number of seek and seek time is minimal.
However, this allocation method is prone to external fragmentation, and the file size is difficult to expand.

2. Links
solves all problems of continuous allocation: Each file is a linked list of blocks that can be distributed anywhere on the disk. The directory contains the first and last block pointers for each file. The disadvantage is that it can only be used effectively for sequential access to files and cannot support direct access. The pointer needs extra space at the same time. In addition, missing or damaged pointers can cause file content to be inaccessible.
The answer is to use clusters instead of blocks, such as 4 blocks as a cluster, and a file allocation table (FAT) instead of pointers.

3. Index allocation
A link is a pointer and a block distributed across a disk, and an index assignment puts all the pointers together. Each file has an index block. The index block is an array, and element I points to the block I of the file, so I can directly access the file Block I by index i.
The index allocation pattern is not fragmented, and the question is how large is the index block? A file an index block, so should be as small as possible, but too small, large file pointers are not enough. Here are some mechanisms, such as support for multiple index block links, or multi-tier indexes, or hierarchical indexes, such as Linux scenarios.

4. Performance
What kind of disk allocation method is used is how to use the system. A system that is primarily sequential-access should not use the same approach as a system that is primarily random-access.

Five, free space management
Free space includes space for unused space and deleted files. In order to log free disk space, to be assigned to file creation, or to be recycled when a file is deleted, this free table has several forms:

1, Bit vector
Each disk block is represented by 0 or 1:0 means allocated, and 1 represents idle. The advantage is that it is relatively simple and efficient to find the first free block or a contiguous free block. Because the computer has a bit operation instruction. The disadvantage is that it consumes more memory. A 40GB, each 1K disk needs more than 5M of space to store the bitmap.

2. Linked list
All free blocks are concatenated with a linked list, and pointers to the first block are stored in a special location on the disk and also cached in memory. Each piece points to the next piece, so continue. The efficiency is not high, to traverse the entire idle table, to read into each block. However, this is a small opportunity, and usually the operating system simply allocates a free block to a file, each time taking only the first block.

3. Group
Improved list of linked lists. Divide the entire list into groups, with each group using only one block of the other blocks in the storage group, and then storing the next set of addresses with a piece, so continue. This will quickly find lots of free blocks.

4. Counting
Records the address of the first block and the number of contiguous free blocks immediately following it. In this way, each entry for the free table contains the disk address and number, which is larger than the original entry, but the total length of the table will be shorter because the number of contiguous blocks is large. Applies to cases where multiple contiguous blocks are allocated or disposed at the same time.

Vi. Efficiency and performance
Disks are the slowest part of a computer's main component and often become a bottleneck in system performance.
1. Efficiency
The efficient use of disk space depends primarily on the disk allocation and directory management algorithms used. For example, pre-allocating indexes, depending on the size of the file to choose a large cluster or small cluster, considering whether the memory of the file entry data structure, file pointer size, and so on.

2. Performance
Even if you choose a basic file system, you can still improve performance in many ways: disk caching, page caching (virtual memory), asynchronous write, read-ahead, and so on.

Vii. Recovery
Files and directories are stored in memory and on disk, and you must ensure that system failures do not cause data loss and inconsistencies.
1. Consistency check
Due to caching and asynchronous writes, many changes to files and directories are lost after a computer crash, which may result in inconsistent state. Consistency checker often runs at system restart, attempting to fix. The degree of correction is related to the method of file allocation.

2. Backup and Recovery
The safest way is to back up.

Eight, log structure-based file system
Using a similar database log mechanism to deal with system crashes, inconsistencies and other issues.

IX. NFS
NFS is a widely used network file system. It is a software implementation and specification for accessing remote files over a local area network (or even WAN).

1. Overview
NFS combines a set of interconnected workstations as a separate file system machine to allow the file system to be shared transparently. Sharing is Client-server mode, and a machine can be either a client or a server. Sharing can be done between any two peer machines.
A machine if you want to access a remote file system, first install the remote directory locally. Once the installation is complete, the remote directory and the local filesystem are organically integrated to replace the original local directory. The local directory becomes the root of the new installation directory. The NFS specification is divided into two: one is the installation protocol, the other is the remote file access protocol.

2. Installation protocol
The installation protocol establishes a logical connection between the client and the server. The installation action includes the remote directory name and the storage server name to be installed. The server maintains an output list that lists which file systems allow output for installation. These may include permissions.

3. NFS Protocol
The NFS protocol provides a set of RPCs for remote file operations. Including:
Search for files
Read directory entry
Manipulating links and Catalogs
Accessing file properties
Read and write files

One notable feature of the NFS server is the stateless state, which provides the full parameters for each request. A serial number is also needed. On the server side, crashes or restores are not visible to the client, so the writes sent by the client need to be synchronized, thus using non-volatile caches.
A single NFS write is atomic, but NFS does not provide concurrency control, and the NFS upper layer service needs to provide a lock.
NFS is integrated with the operating system via VFS and the client is peer to server.

4. Path name Conversion
This includes parsing the path name into a separate directory entry or component. During parsing, a separate NFS lookup call is performed for each constituent name and directory virtual node, and once the installation point is encountered, an RPC is sent to the server. Clients can cache remote directories. However, when the properties returned from the server do not match the attributes in the cache, the directory cache is updated.

5. Remote operation
NFS adheres to remote services, but uses buffering and caching techniques to improve performance.

Operating System Learning Notes: File system implementation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Operating System Learning Notes: File system implementation

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support