NTFS file system structure analysis
In the NTFS file system, file access is distributed by cluster. A cluster must be an integer multiple of the physical sector, and is always the integer to the power of 2. The NTFS file system does not care about what is a sector, nor does it care about the size of the sector (such as whether it is 512 bytes ), when formatting programs are used, the cluster size is automatically allocated by the formatting program based on the volume size.
Files are stored on disks using the master file table (MFT. A master file table is a database consisting of a series of file records. Each file in the volume has a file record (for large files, multiple records may correspond to each other ). The main file table also has its own file records.
Each file on the NTFS Volume has a unique identifier of a 64-bit file called the file reference number (also known as the file index number. The document reference number consists of two parts: one is the document number and the other is the document sequence number. The file number is 48 bits, which corresponds to the location of the file in MFT. The document sequence number increases with the reuse of each file record, which is designed for NTFs internal consistency check.
NTFS uses logical cluster number (lcn) and virtual cluster number (VCN) to locate the cluster. Lcn is a simple number for all the clusters in the entire volume from start to end. By multiplying the volume factor by lcn, NTFS can get the physical byte offset of the volume to obtain the physical disk address. VCN indicates the number of clusters in a specific file from start to end, so as to reference data in the file. VCN can be mapped to lcn without requiring physical continuity.
The NTFS directory is just an index of a simple file name and file reference number. If the Directory attribute list is smaller than the length of a record, all information in this directory is stored in the records of the master file table. For directories larger than the records, use the B + tree for management.
There is a pointer in the basic file record in the main file table pointing to an external cluster that stores a very resident index buffer-including all the sub-directories and files under the directory, the B + tree structure facilitates quick search of files and subdirectories in large directories.
In NTFS, all the data stored on the volume is contained in the file, including the data structure used to locate and obtain the file, the Bootstrap program, and records that record the volume (NTFs metadata) which embodies the NTFS principle: everything on the disk is a file. Storing everything in a file makes it easy for the file system to locate and maintain data. In NTFS, all the data stored in the volume is in an array of file records called MFT, known as the master file table, MFT is generated by advanced formatting. While MFT is composed of a file record array. The size of a file record is usually fixed, regardless of the cluster size, which is 1 kb. This concept is equivalent to inode (I node) in Linux ). File record is physically consecutive in the MFT file record array and numbered from 0. MFT is only used by the system's organization and architecture file system. It is called metadata (metadata) in NTFS ). The most basic first 16 records are very important metadata files used by the operating system. Important metadata files of these NTFS main file tables start with $ (dollar sign), so they are hidden files. in Windows 2000, the Dir command (or even the/AH parameter) cannot be used) list these metadata files like normal files. In fact, file system driver (NTFs. sys) maintains a system variable NTFS protect system files used to hide the metadata. However, Microsoft also provides an OEM tool called NFI. EXE, which can be used to dump important metadata files of the NTFS primary file table (metadata: data stored on a volume that supports File System Format management. It cannot be accessed by applications, and it can only provide services for the system). The NFI display result is as follows:
C:/> nfi c: | more
These metadata file files are required for system driver Assembly volumes. in Windows 2000, assigning a drive letter to each partition does not indicate that the partition contains a file system format that can be recognized by Windows 2000, if the primary file table is corrupted, the partition cannot be read in Windows 2000. To enable the partition to be recognized in Windows 2000, you must first create a file system format that Windows 2000 can recognize, that is, the primary file table. This can be done through advanced formatting. As we all know, Windows uses the cluster number to locate the location where the file is stored on the disk. In the fat format file system, the pointer to the cluster number is included in the fat table, in NTFS, the pointer to the cluster number is included in the $ MFT and $ mftmirr files.
NTFS Metafile
With these new features, more metadata files are used to store function-related data. Finally, all metadata files in ntfs5 are listed in the following table:
Each MFT record corresponds to a different file. If a file has many attributes or is dispersed into many fragments, multiple file records may be required. In this case, the first record that stores its file record location is called "base file record ).
The 1st records in MFT are the MFT itself. Because of the importance of MFT files, to ensure the reliability of the file system structure, the system has prepared an image file ($ mftmirr) for it, that is, the 2nd records in MFT.
3rd records are log files ($ logfile ). This file is NTFS designed for recoverability and security. When the system is running, NTFS records all operations that affect the NTFS Volume structure in the log file, including the command for creating and changing the directory structure of the file, such as copying, the NTFS Volume can be restored when the system fails.
4th records are volume files ($ volume ), it contains the volume name, the NTFS version of the formatted volume, and a flag indicating whether the disk is damaged (the NTFS system determines whether to call the chkdsk program for repair ).
The 5th records are attribute definition tables ($ attrdef, attribute definition table), which store all the file attributes supported by the volume and indicate whether they can be indexed and restored.
The 6th records are the root directory (/), which stores the indexes of all files and directories in the root directory of the volume. Once a file is accessed, NTFS retains the MFT reference of the file and can directly access the file for the second time.
The first record is a bitmap file ($ Bitmap ). The distribution status of the NTFS Volume is stored in the bitmap file. Each bit indicates a cluster in the volume, which indicates whether the cluster is idle or allocated, because the file can be easily expanded, NTFS volumes can be easily expanded dynamically, while the fat format file system involves changes to the fat table, therefore, you cannot adjust the partition size at will.
8th records are boot files ($ boot). It is another important system file that stores the boot program code for Windows 2000/XP. The file must be in a specific disk location to boot the system correctly. This file is created when the format program is run. This shows that NTFS regards everything on the disk as a file. This also means that although the file enjoys various security protections of the NTFS system, it can still be modified through common file I/O operations.
The first record is a bad cluster file ($ badclus), which records all corrupted cluster numbers in the volume on the disk to prevent the system from allocating and using them.
The 10th records are security files ($ secure), which store the Security Descriptor database of the entire volume. NTFS files and directories have their own security descriptors. To save space, NTFS stores files and directories with the same descriptors in a public file.
The first record is an uppercase file ($ upcase, upper case file), which contains a case-sensitive character conversion table.
The first record is the extended metadata directory ($ extended metadata directory ).
13th records are reparsed Point files ($ extend/$ reparse ).
The first record is the change log file ($ extend/$ usnjrnl ).
The first record is the quota management File ($ extend/$ quota ).
The first record is the Object ID file ($ extend/$ objid ).
17th ~ 23 records are retained by the system for future expansion.
The first 16 metadata files of MFT are so important. To prevent data loss, the NTFS system backs up them in the center of the file storage part of the volume. See.
NTFS divides the disk into two parts, about 12% of which are allocated to MFT to meet the increasing number of files. To maintain the continuity of the MFT Metafile, MFT has exclusive rights to the 12% space. The remaining 88% of the space is allocated to store files. The remaining disk space includes all the remaining physical space-the remaining MFT space is also included. The usage mechanism of the mft space can be described as follows: When the file is exhausted, the Windows operating system will simply reduce the MFT space and allocate it to the file storage. When there is space available, the space will be divided into MFT again. Although the system tries its best to keep MFT space specific, it sometimes has to make sacrifices. Although MFT fragments are sometimes intolerable, they cannot be prevented.
So how does NTFS access the questionnaire through MFT? First, when NTFS accesses a volume, it must "LOAD" the volume: NTFS will view the boot file (the file defined by the $ boot metadata file in the figure ), find the physical disk address of the MFT. Then, it obtains the ing information from VCN to lcn from the data attributes recorded in the file and stores the information in the memory. This ing information locates the running (run or extent) position of MFT on the disk. Next, NTFS opens the MFT records of several metadata files and opens these files. If necessary, NTFS starts to restore its file system. After NTFS opens the remaining metadata file, you can access the volume.
File and directory records
NTFS processes files as a set of properties/attribute values, which is different from other file systems. File data is the value of the unnamed attribute. Other file attributes include the file name, file owner, and file time mark. Displays an MFT record for small files.
Each attribute is composed of a single stream, that is, a simple character queue. Strictly speaking, NTFS does not operate on files, but only reads and writes attribute streams. NTFS provides various operations on the attribute stream: Create, delete, read (byte range) and write (byte range ). Read/write operations are generally performed on untitled attributes of a file. For named attributes, you can perform operations using the named Data Stream syntax.
A file usually occupies one file record. However, when a file has many attribute values or is fragmented, it may occupy more than one file record. In this case, the first file record is its basic file record, which stores the location of other file records required by the file. Small files and folders (typically 1500 bytes or less) are all stored in the MFT record of the file.
Folder records include index information, and small folder records are completely stored in the MFT structure. However, large folders are organized into B + TREE STRUCTURES AND pointed to an external cluster with a pointer, this cluster is used to store the attributes of folders that cannot be stored in MFT.
Common attributes of files on NTFS volumes are listed in the following table (not all files have these attributes ).
Resident and non-resident attributes
When a file is very small, all its attributes and attribute values can be stored in the MFT file records. When an attribute value can be directly stored in MFT, this attribute is called a resident attribute ). Some attributes are always resident so that NTFS can determine other resident attributes. For example, standard information attributes and root indexes are always resident attributes.
Each attribute starts with a standard header. the header contains information about this attribute and NTFS is usually used to manage the attribute information. This header is always resident and records whether the attribute value is resident. For resident attributes, the header also contains the Partial Weight of the attribute value and the length of the attribute value.
If the attribute value can be directly stored in MFT, NTFS's access time will be greatly shortened. NTFS only needs to access the disk once to obtain data immediately. Instead of looking for a file in the fat table, as the FAT file system does, NTFS reads the continuously allocated unit and finally finds the file data.
All attributes of a small file or directory can be resident in MFT. The unnamed attributes of small files can include all file data. Shows how to create a small file:
Content of this file
File Attributes
For example, if NFI is used to view the file record number 36 of the file "Create a text file .txt", the following content is displayed:
File 36
/Create a text file. txt
$ Standard_information (resident)
$ File_name (resident)
$ File_name (resident)
$ Data (resident)
From the display content, we can see that all the attributes of the file are resident attributes, including data attributes, and there is no resident attribute. Therefore, use winhex to open MFT and view the file records, such as the content.
File records of small files
The index root attribute of a small directory can include indexes of all files and subdirectories. See
MFT records in small Directories
All attributes of a large file or directory cannot be resident in MFT. If an attribute (such as a file data attribute) is too large to be stored in a 1 kb MFT file record, NTFS will allocate a region outside MFT. These regions are usually called run or extent, which can be used to store property values, such as file data. If the attribute value increases later, NTFS will allocate a run to store additional data. The attributes stored in the running file rather than in the MFT file record are called the nonresident attribute ). NTFS determines whether an attribute is resident or extremely resident, and the position of the attribute value is transparent to the process accessing it.
When an attribute is non-resident, such as data in a large file, its header contains information about which NTFS needs to locate the attribute value on the disk. Displays the attributes of a sub-database stored in two running instances.
Very resident attributes stored in two running
In standard attributes, only attributes that can be increased are very resident. For files, the attributes that can be increased include data and attribute lists. Standard Information and file name attributes are always resident.
A large directory may also contain very resident attributes (or attributes). For more information, see. In this example, the MFT file records do not have enough space to store file indexes in large directories. Some indexes are stored in the index root attribute, while the others are stored in the abnormal operation called "index buffer. Here, the index root, index allocation, and bitmap attributes are simplified. These attributes will be detailed later. For a directory, the index root header and partial score should be resident.
MFT records in large Directories
When the properties of a file (or directory) cannot be placed in an MFT file record and need to be allocated separately, NTFS Records run (run) through the ing between VCN-LCN) or Disk Area. Lcn is used to number the clusters in the entire volume from 0 to N in sequence, while VCN is used to number the clusters in a specific file in a logical order from 0 to M. Displays the VCN and lcn numbers used for running a very resident data attribute.
VCN with the data attribute
When the file contains more than two running instances, the third running instance starts from vcn8. The data attribute header contains the mappings between the first two running instances in VCN, which facilitates NTFS's query of disk file allocation. To facilitate NTFS quick lookup, the resident data property header with multiple runtime files contains the ing relationship of the VCN-LCN, see
VCN-LCN ing for resident data properties
Although data attributes are often stored in the running state because they are too large, other attributes may need to be stored in the running state because the MFT file records do not have enough space. In addition, if a file has too many attributes and cannot be stored in the MFT record, the second MFT file record can be used to accommodate these additional attributes (or very resident attribute headers ). In this case, an attribute called "attribute list" is added. The property list includes the file property name and type code, and the file reference of the MFT where the property is located. Property lists are typically used for large or fragmented files that require multiple MFT file records due to a large VCN-LCN ing relationship. Attribute lists are usually required for files with more than 200 running tasks.