disk, partition, and Linux file systems

Last Update:2015-05-22 Source: Internet

Author: User

Tags what file system

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Basic Disk Knowledge

1.1 Physical StructureThe physical structure of the hard disk is usually composed of the head and the disc, the motor, the main control chip and the line and so on, when the main motor drives the disc rotation, the auxiliary motor drives a group (Head) to the corresponding disc and determine whether to read the front or the opposite side of the disc, the head suspended on the disc to draw a concentric disc with a circular track (Magnetic Trackor calledCylindrical Surface), the magnetic inductance of the magnetic head is sensed on the disc surface and the use of the disk manufacturer specified read time or data interval positioningFan AreaTo get the data content of the sector. All platters are fixed on a rotating shaft, which is the disc spindle. All platters are absolutely parallel, with a head on each disc's storage surface, and the distance between the head and the platter is smaller than the diameter of the head hair. All heads are attached to a head controller, and the head controller is responsible for the motion of each head. The head can move along the radius of the disc, and the disc rotates at high speed at a speed of thousands of RPM to tens of thousands of times per minute, so that the head can read and write data to the specified position on the platter.

tracks (track)

When the disk is rotated and the head remains in one position, each head will draw a circular trajectory on the disk surface called the track. The information is recorded in the form of a burst of pulses in which the concentric circles are not continuously recorded data, but are divided into segments of arcs (sectors) that have the same angular velocity as the arcs.

Cylindrical surface (Cylinder)

A cylindrical face (Cylinder) consisting of a plurality of tracks of different platters, but one in which the same radius circle is formed in a disk group with multiple platters. The same track on all disks forms a cylinder, usually called a cylinder (Cylinder), and the head on each cylinder is numbered from top to bottom, starting with "0". Data read/write by the cylinder, that is, the head read/write data first in the same cylinder from the "0" head to operate, in turn, down in the same cylinder on different disk surface is the head of the operation, only in the same cylinder all the heads of all read/write completed after the head is transferred to the next cylinder surface, Since the selected head can only be switched electronically, the selected cylinder must be mechanically switched. The electronic switch is quite fast, much faster than the mechanical head moving towards the adjacent track, so the reading/writing of the data is performed on the cylinder rather than on the disk surface. That is, once a track is full of data, it is written on the next face of the same cylinder, and a cylinder is full before moving to the next sector to begin writing data. Reading data is also done in this way, which improves the read/write efficiency of the hard drive.

Sector (Sector)

Each track on the disk is divided into segments that are sectors of the hard disk (Sector). The first sector of a hard disk, called a boot sector. The operating system stores information on the hard disk as a sector (Sector), with 512 bytes of data and some additional information per sector.

Head (head)

In the hard disk system, each disc of the hard disk has two disk faces (Side), that is, the upper and lower disk surface, generally use each disk surface, can store data. The disk number is also called magnetic number one, because each effective disk face has a corresponding read-write head. You can use Fdisk-l to view the physical structure of a disk in Linux: The disk has 255 heads, that is, a total of 255 disk faces. 3,263 cylinders (cylinders), which means there are 3,263 tracks on each disc, and Sectors/track says that there are 63 sectors on each track. The command result also gives a value of sector size of 512bytes. Let's write the size of the disk. 255 disk face * 3263 CYLINDER * 63 sectors * 512bytes per sector = 26839088640byte. The result is 26.8G, which matches the total size of the disk.

1.2 disk read and write principleWhen the system stores the files on disk, in the manner of cylinder, head, and sector, i.e. all sectors that are first under the first head of the 1th track (that is, the first track of the 1th disc), and then the next head of the same cylinder, ..., a cylinder is stored full and then pushed to the next cylinder until the contents of the file are written to disk. The system also reads the data in the same order. read out the data by telling the disk controller to read out the cylinder number, the number of magnets, and the sector area code (three components of the physical address) of the sector. The disk controller directly steps the head assembly into the corresponding cylinder, selecting the corresponding head and waiting for the required sector to move to the head. When the sector arrives, the disk controller reads the header of each sector, compares the address information in these headers with the expected check-out head and cylinder number (i.e. seek), and then looks for the required sector code. When the disk controller finds the sector header, it decides whether to convert the write circuit or read the data and tail records depending on whether the task is a write sector or a read sector. When a sector is found, the disk controller must post-process information for that sector before continuing to look for the next sector. If the data is read, the controller calculates the ECC code for this data, and then compares the ECC code with the recorded ECC code. If the data is written, the controller calculates the ECC code for this data, which is stored with the data. The disk continues to rotate while the controller makes the necessary processing of the data in this sector. In fact, most of our files are broken, when the file is not broken, rocker arm only need to find 1 tracks and read by the head, only 1 times can be read successfully, but if the file is broken into 11, then the rocker arm to look for 11 times the track head for 11 reads to complete reading this file , the reading time becomes verbose when it is not broken. Therefore, the process of disk IO includes:

The first step is to move the head radially to find the track where the data is located. This part of the time is called seek time.
The second step is to locate the target track and rotate it through the disk face, moving the target sector directly below the head.
The third step is to read or write data to the target sector. So far, one disk IO is complete, so:

Therefore, the single disk IO time = seek time + rotation delay + access time.

For rotation delay, the main server is now often used 1W rpm disk, each rotation of the time required for a week is 60*1000/10000=6ms, so its rotational delay is (0-6ms).
For access times, the general time is short, fraction Ms.
For seek time, the modern disk is probably in 3-15ms, in which the seek time is mainly affected by the relative distance between the current position of the head and the location of the target track.

Candidate Disk partitioning scheme:

Scheme one: 255 disks, C disk is 0-100 disc, D disk is 101-200 disc,......
Scenario Two: 3,263 cylinders, C-plate 0-1000 cylinders, D-plate, 1001-20001-cylinder,......

In fact, which one, the main thing to see is that way performance faster. Because the data under the same partition is often read together, if the first one is used, then the head will have to jump over more than 3,000 track, so that the disk's seek time doubles and disk performance degrades. For scenario two, if the disk C, only the head in the 1-1000 tracks between the movement can be, greatly reducing the seek time. (In fact, the partition does not start at 0, and the first track of the disk will be used to install the boot loader and the disk partition table). Therefore, the partitioning method of scenario two can reduce the disk IO time in the seek time part, so all operating systems are using scenario two, there is no plan one.

2. Disk naming and partitioning under LinuxBefore adding a hard disk to a host, you should first understand how to name the hard disk and partition under Linux.

2.1 Disk namingUnder Linux, the SCSI and SATA devices are named SD, the first SCSI device is SDA, the second is SDB, and so on. There are two SCSI interfaces on the motherboard, so you can install four SCSI devices altogether. Two devices on the primary SCSI correspond to SDA and SDB respectively, and two devices on the second SCSI port correspond to SDC and SDD. The general HDD is installed on the primary SCSI main interface, so it is SDA or SDB, and the optical drive is generally installed on the second SCSI main interface, so it is SDC. (IDE interface devices are named after HD, the first device is HDA, the second is a HDB, and so on.) )

IDE disk	Description	Configuration
/dev/hda	1st (Primary) IDE Controller	Master
/dev/hdb	1st (Primary) IDE Controller	Slave
/dev/hdc	2nd (secondary) IDE controller	Master
/dev/hdd	2nd (secondary) IDE controller	Slave

2.2 partition naming

The so-called disk partitioning refers to telling the operating system that "my disk in this slot can be accessed by a magnetic column to the block between the B-column", so that the operating system will be able to know that he can be in the specified block in the file data read/write/search and other actions. That is, disk partitioning means specifying the start and end of a split slot. Partitions are named with the device name plus a number. For example, HDA1 represents HDA on the first partition on this hard drive device. Each hard disk can have a maximum of four primary partitions, which is the primary partition of the 1-4 named hard disk. There can be only one active primary partition as the boot partition in multiple primary partitions. Logical partitions start at 5, with each partition having up to 24 extended partitions on each disk. 2.3 Partitioning Step 1. Run Fdisk to partition: The first box and the second box are the disks that are already partitioned, and the third hard disk does not have a partition.

[root]#Fdisk/dev/sdbCommand (M for help):m(Enter the letter "M" to get list of commands) Command action A toggle a bootable flag B edit BSD Disklabel c toggle the DOS compatibility flag D Delete A partition l list known partition types m print this menu n add a new partition o create a new empty DOS   Partition table P print the partition table Q quit without saving changes s create a new empty Sun Disklabel  t change a partition ' s system ID u change display/entry units v Verify the partition Table W write table to Disk and exit X extra functionality (experts only) Command (M-help):NCommand Action E Extended P primary partition (1-4)PPartition number (1-4):1First cylinder (1-9729, default 1): Using default value 1Last cylinder, +cylinders or +size{k,m,g} (1-9729, default 9729): U Sing default value 9729Command (M for help):W(Write and save partition Table) [root]#mkfs.ext4-l Disk2/dev/sdb

There are several purposes for the division of multiple zones:

The system can be re-installed without loss of data, such as setting up the/home mount point independently, and marking it back to//when reloading the system without any loss of data.
Assign appropriate file systems to different mount points for proper performance, such as using ReiserFS for/Var, using XFS for/home, and/or using EXT4.
Different mount options are turned on for different mount points, such as whether immediate synchronization is required, whether the log is turned on, or whether compression is enabled.
Large hard drive search range, low efficiency
Disk quotas can only be set for partitions
/home,/Var,/usr/local are often separate partitions because they are often manipulated and prone to fragmentation

2. Format partition: Mkfs-t ext3/dev/sda1 Each drive is divided into partitions, each with its own file system. Windows specifies a letter for each of these file systems. However, Gnu/linux uses a unique tree structure to manage the files, and each file system is mounted somewhere in the tree structure. Just as Windows needs a C: drive, Gnu/linux must be able to mount the root file system on the root (/) of the file tree. When the root mount is complete, you can mount additional file systems on various mount points in the tree structure. Any directory under the root structure can be used as a mount point, and you can also mount the same file system on different mount points at the same time. The mount point is actually the entry directory for the disk file system in Linux: Three confusing concepts about file systems:

Create: The process of formatting a disk in some way is the process of building a file system on top of it. When you create a file system, you write control information about the file system at a specific location on the disk.
Register: report to the kernel and declare that you can be supported by the kernel. It is usually registered when the kernel is compiled, or it can be loaded manually. The registration process is actually an instantiation of the struct file_system_type representing the data structure of each actual file system.
Install: that is, we are familiar with the mount operation, the file system is added to the Linux root file system directory tree structure, so that the file system can be accessed.

All Files under Linux! In other words, the Linux operating system manages everything in the system as a file. Our common hardware devices in Windows (printers, network cards, sound cards ...) ), disk partitions and so on, all are considered as files in Linux, access to devices, partitions is to read and write the corresponding files. Format command: MKFS.EXT3/DEV/SDB1//format partition into EXT3MKFS.EXT2/DEV/SDB1//formatted partition into Ext2 3. Mount Mount/dev/sda1/test DF command is used to view the total capacity of mounted disks, usage capacity, remaining capacity, etc., can not add any parameters, by default is displayed in units of K. The du command is used to view the amount of space that a directory occupies. 4. Boot directly mount edit/etc/fstab file, add:/dev/sda1/test ext3 defaults 1 1, restart the selection has been mounted. 5. Summary

The mount point must be a directory.
A partition is mounted on an existing directory, and this directory may not be empty, but the previous content in this directory will not be available after mounting. The same is true for the mounting of file systems created by other operating systems, and after uninstallation, the previous files of the directory are still there and there will be no loss.
The directory occupies only one inode in the disk, storing information such as file attributes.
Any partition must be mounted on a directory.
A directory is a logical distinction. Partitioning is a physical distinction.
Disk Linux partitions must be mounted to a specific directory in the directory tree for read and write operations.
The root directory is where all Linux files and directories reside, and the previous disk partition needs to be mounted.
A partition can be hung in multiple directories, but in turn a directory can only be a mount point for a partition.

3. Linux File System

A file system is a mechanism for organizing data and metadata on a storage device. Its ultimate goal is to organize large amounts of data into persistent (persistant) storage devices, such as hard disks and disks. The file system is the logical organization of a file, which stores individual files in a clearer way. The data is stored in a partition. A typical Linux partition (partition) contains the following sections:A file is a partitioned unit of data for a file system. The file system uses a directory to organize files, giving them a hierarchical hierarchy of files. The key to implementing this hierarchical structure on a hard disk is to use the inode to virtualize common file and directory file objects. In a Linux system, a directory is also a file. So/home/sammy is the absolute path to the directory file Sammy. Disk and file systems:

3.1Inode

Inodes is the key to implementing file storage. In Linux, each object (file or directory) that is managed in the file system is represented as an inode. The inode contains all the metadata that is required to manage the objects in the file system, including the operations that can be performed on the object. In a Linux system, a file can be divided into several blocks of data stored in a partition. In order to collect each chunk of data, we need the inode for that file. Each file corresponds to an inode. This inode contains multiple pointers to the individual data blocks that belong to the file. When the operating system needs to read the file, only need to find the corresponding inode, collect scattered data blocks, you can harvest our files.Read file: In Linux, we find a file by parsing the path and depending on the directory file along the way. The entries in the directory include the file name, along with the inode number. When we enter $cat/var/test.txt, Linux will find the inode number of the Var directory file in the root directory file and then synthesize VAR data based on the inode. Then, according to the records in Var, we find the inode number of text.txt, collect the data block and synthesize the text.txt data along with the pointers in the inode. The entire process will refer to the three inode:

Root file Inode:2, used to find the Inode ID of Var
var directory file inode:10747905, used to find the Inode ID of the Test.txt
inode:10749034 of the Text.txt file, used to locate the data blocks

Therefore, when we read a file, we actually found the inode number of the file in the directory, and then, based on the inode pointers, we put together the data blocks and put them into memory for further processing. When we create a file, we assign a blank inode to the file, write its inode number into the directory where the file belongs, then select the blank block, let the inode pointer point to the data block, and put the data in memory. 3.2 Circulating devices in Unix-like systems, /dev/loopis a pseudo-device that allows files to be accessed as if they were block devices. Mounting a file containing a file system on a directory typically takes two steps:

Connect the file with a looping device node.
Mount the loop device on the directory

Specific steps:

if=/dev/zero of=file.img bs=1k count=10000// Create an initialization file // Create a looping device  10000// Create File system // Create mount point //

3.3 Structure of the file system

The user space contains applications (for example, the users of the file system) and the GNU C Library (GLIBC), which provide a user interface for file system calls (open, read, write, and close). The system invocation interface acts like a switch that sends system calls from the user space to the appropriate endpoints in the kernel space. VFS is the primary interface for the underlying file system, which is a software abstraction layer in the Linux kernel. This component exports a set of interfaces and then abstracts them into individual file systems, and the behavior of individual file systems can vary greatly. There are two caches (Inode and Dentry) for file system objects. They cache recently used file system objects. Due to the existence of VFS, Linux allows many different file systems to coexist and supports file operations across file systems. It provides an interface mechanism to actual file systems such as Ext2,vfat through some data structures and their methods. Each file system implementation (such as ext2, JFS, and so on) exports a common set of interfaces for use by VFS. The buffer cache caches requests between the file system and related block devices. For example, read and write requests to the underlying device driver are passed through the buffer cache. This allows requests to be cached in, reducing the number of times a physical device is accessed and speeding up access. can usesyncThe command sends the request in the buffer cache to the storage media (forcing all the uncommitted data to be sent to the device driver, which is then sent to the storage device). 3.4 VFS (virtual file system) Linux allows for coexistence of many different file systems, such as ext2, ext3, VFAT, etc. Any file in Linux can be manipulated by using the same set of file I/O system calls without regard to the specific file system format in which it resides, and further, the operation of the file may be performed across the file system. As shown, we can use the CP command to copy data from a hard disk in the Vfat file system format to a hard disk in the Ext3 file system format, which involves two different file systems. Procedure: The VFS calls the VFAT Read file method to read the A.txt data into memory, and then maps a.txt in-memory data to b.txt corresponding memory space, the VFS call Ext3 Write file method will b.txt write to disk, thus achieve the final cross-file system copy operation. "Everything is a document" is one of the basic philosophies of unix/linux. Not only ordinary files, directories, character devices, block devices, sockets, etc. are treated as files in Unix/linux, although they are of different types, but they are provided with the same set of operating interfaces. When you open a file, the VFS will know the file system format of the file, and when the VFS passes control to the actual file system, the actual file system makes a specific distinction and performs different operations on different file types. This is the essence of "everything is a document". The specific process of reading a file from a physical medium: When the user application invokes the file I/O read () operation, the system call Sys_read () is fired, Sys_read () locates the specific file system where the file resides, passes control to the file system, and finally interacts with the physical media by the specific file system. Read the data from the media. 3.5 Linux File System types

3.5.1 ReiserFSReiserFS is aFile systemformat. Linux kernelsupport for ReiserFS started from version 2.4.1. ReiserFS was originallyNovellCompany'sSuseThe default file system used by Linux Enterprise until October 12, 2006, when it claims to be in a future versionExt3as the default. Compared to ext2 and ext3 under the Linux Kernel 2.4 version, when processing small files under 4KB (tail packing enable), ReiserFS is 10 to 15 times times faster.[3]. However, some directory operations are not synchronized above the ReiserFS, (including like unlink (2)), which may cause some heavy dependency on the file lock (file-based Lock) mechanism to corrupt the data above the application. ReiserFS in a single compositeB + TreeThe data stored in the file (stat item), directory file information (directory items), and the list of blocks in the index node (indirect items) have a unique identification number as the index value of the B + tree. 3.5.2 ext2 File System

The Ext2 file system (also known as the Second extended file system ) is designed to overcome the drawbacks of the Minix file system used in earlier versions of Linux. Over the years, the file system has been widely used in Linux. However, there are no logs in ext2, which is now largely superseded by EXT3 and the latest EXT4.

3.5.3 ext3 File System

The ext3 file system adds logging capabilities to the standard ext2 file system and is therefore an evolutionary development of a very stable file system. It provides reasonable performance in most cases and is still improving. Because it adds logging on a reliable ext2 file system, you can convert an existing ext2 file system to a ext3 file system and convert it back if necessary.

3.5.4 Ext4 File SystemEXT4 is launched as an extension of ext3, which meets the needs of larger file systems by increasing storage limits and improving performance. In order to preserve the stability of the ext3, in June 2006, the extension was split into a new file system, the EXT4. The Ext4 file system was officially released in December 2008 and is included in the 2.6.28 kernel. 3.5.5 vfat File system Vfat file system (also known as FAT32There is no logging capability, and many of the features required for a complete Linux file system implementation are lacking. It can be used to exchange data between Windows and Linux systems because both Windows and Linux can read it. Don'tUse this file system for Linux, unless you want to share data between Windows and Linux. The 3.5.6 xfs file system XFS File system has logging capabilities, contains some robust features, and is optimized for scalability. XFS is usually quite fast. XFS has been a leader in all tests for large file operations. XFS's performance is very close to ReiserFS and exceeds ext3 on most test metrics. 3.5.7 IBM JFS file SystemIBM's journaled File System (JFS), currently used for IBM Enterprise servers, designed for high throughput server environments. It is available for Linux and is included in several distributions. To create a JFS file system, use the MKFS.JFS command.3.6 Select File Systemchoosing the right Next generation Linux file system has always been easy. Those who seek only primitive performance tend to use ReiserFS, while those who are more concerned with data integrity features prefer EXT3/4. However, with the release of the Linux version of XFS, things suddenly become confusing. In particular, people are beginning to wonder whether ReiserFS is still a leader in the performance of next-generation file systems.

XFS's performance is very close to ReiserFS and exceeds ext3 on most test metrics.
Currently, ReiserFS and ext3 delete files much faster than XFS.

3.7 Creating file System Linux usingmkfscommand to create a file system using themkswapcommand to create the swap space.mkfsCommands are actually the front end of commands for several specific file systems, such as ext3-orientedmkfs.ext3, Ext4-orientedmkfs.ext4And the ReiserFS-orientedmkfs.reiserfs。 What file system support is installed on your file system? Usels /sbin/mk*command to get the answer. Reference Document: http://djt.qq.com/article/view/620http://my.oschina.net/leejun2005/blog/290073http://vbird.dic.ksu.edu.tw /linux_basic/0230filesystem.phphttp://www.cnblogs.com/vamei/p/3506566.htmlhttp://www.ibm.com/developerworks/cn /linux/l-linux-kernel/http://www.ibm.com/developerworks/cn/linux/l-cn-vfs/http://www.ibm.com/developerworks/cn /linux/filesystem/l-fs9/http://zh.wikipedia.org/wiki/reiserfs

disk, partition, and Linux file systems

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More