NTFS file system detailed analysis

Source: Internet
Author: User

Part 1 What is an NTFS file system

To learn about NTFS, we should first understand fat. "File Allocation Table" means "File Allocation Table. For us, its significance lies in the management of hard disk partitions. Fat16, FAT32, and NTFS are currently the most common file systems.
Fat16: the previously used DOS and Windows 95 all use fat16 file systems. Currently, common Windows 98/2000/XP and other systems support fat16 file systems. It can manage up to 2 GB of partitions, but each partition can have up to 65525 clusters (the cluster is the disk space configuration unit ). As the disk or partition capacity increases, the space occupied by each cluster will become larger and larger, leading to a waste of hard disk space.
FAT32: with the emergence of large-capacity hard disks, FAT32 has become popular since Windows 98. It is an enhanced version of fat16 and supports partitions up to 2 TB (2048 GB. FAT32 clusters are smaller than fat16, which effectively saves hard disk space.
NTFS: A series of operating systems supported by Microsoft Windows NT kernel, a disk format specially designed for management security features such as network and disk quotas and file encryption. With the popularity of Windows 2000/XP with NT as the kernel, many individual users began to use NTFS. NTFS also stores data files in units of clusters, but the size of the NTFS clusters does not depend on the size of disks or partitions. The reduction of the cluster size not only reduces the waste of disk space, but also reduces the possibility of generating disk fragments. NTFS supports file encryption management to provide users with high-level security assurance.

In the NTFS file system, file access is distributed by cluster. A cluster must be an integer multiple of the physical sector, and is always the integer to the power of 2. The NTFS file system does not care about what is a sector, nor does it care about the size of the sector (such as whether it is 512 bytes ), when formatting programs are used, the cluster size is automatically allocated by the formatting program based on the volume size.
Files are stored on disks using the master file table (MFT. A master file table is a database consisting of a series of file records. Each file in the volume has a file record (for large files, multiple records may correspond to each other ). The main file table also has its own file records.
Each file on the NTFS Volume has a unique identifier of a 64-bit file called the file reference number (also known as the file index number. The document reference number consists of two parts: one is the document number and the other is the document sequence number. The file number is 48 bits, which corresponds to the location of the file in MFT. The document sequence number increases with the reuse of each file record, which is designed for NTFs internal consistency check.
NTFS uses logical cluster number (lcn) and virtual cluster number (VCN) to locate the cluster. Lcn is a simple number for all the clusters in the entire volume from start to end. By multiplying the volume factor by lcn, NTFS can get the physical byte offset of the volume to obtain the physical disk address. VCN indicates the number of clusters in a specific file from start to end, so as to reference data in the file. VCN can be mapped to lcn without requiring physical continuity.
The NTFS directory is just an index of a simple file name and file reference number. If the Directory attribute list is smaller than the length of a record, all information in this directory is stored in the records of the master file table. For directories larger than the records, use the B + tree for management.
There is a pointer in the basic file record in the main file table pointing to an external cluster that stores a very resident index buffer-including all the sub-directories and files under the directory, the B + tree structure facilitates quick search of files and subdirectories in large directories.
There is a pointer in the basic file record in the main file table pointing to an external cluster that stores a very resident index buffer-including all the sub-directories and files under the directory, the B + tree structure facilitates quick search of files and subdirectories in large directories.
In NTFS, all the data stored on the volume is contained in the file, including the data structure used to locate and obtain the file, the Bootstrap program, and records that record the volume (NTFs metadata) which embodies the NTFS principle: everything on the disk is a file. Storing everything in a file makes it easy for the file system to locate and maintain data. In NTFS, all the data stored in the volume is in an array of file records called MFT, known as the master file table, MFT is generated by advanced formatting. While MFT is composed of a file record array. The size of a file record is usually fixed, regardless of the cluster size, which is 1 kb. This concept is equivalent to inode (I node) in Linux ). File
Record is physically consecutive in the MFT file record array and numbered from 0. MFT is only used by the system's organization and architecture file system. It is called metadata (metadata) in NTFS ). The most basic first 16 records are very important metadata files used by the operating system. Important metadata files of these NTFS main file tables start with $ (dollar sign), so they are hidden files. in Windows 2000, the Dir command (or even the/AH parameter) cannot be used) list these metadata files like normal files. In fact, file system driver (NTFs. sys) maintains a system variable NTFS Protect System
Files is used to hide the metadata. However, Microsoft also provides an OEM tool called NFI. EXE, which can be used to dump important metadata files of the NTFS primary file table (metadata: data stored on a volume that supports File System Format management. It cannot be accessed by applications, and it can only provide services for the system). The NFI display result is as follows:
C: \> nfi c: | more
These metadata file files are required for system driver Assembly volumes. in Windows 2000, assigning a drive letter to each partition does not indicate that the partition contains a file system format that can be recognized by Windows 2000, if the primary file table is corrupted, the partition cannot be read in Windows 2000. To enable the partition to be recognized in Windows 2000, you must first create a file system format that Windows 2000 can recognize, that is, the primary file table. This can be done through advanced formatting. As we all know, Windows uses the cluster number to locate the location where the file is stored on the disk. In the fat format file system, the pointer to the cluster number is included in the fat table, in NTFS, the pointer to the cluster number is included in the $ MFT and $ mftmirr files.
NTFS Metafile
With these new features, more metadata files are used to store function-related data. Finally, all metadata files in ntfs5 are listed in the following table:
Each MFT record corresponds to a different file. If a file has many attributes or is dispersed into many fragments, multiple file records may be required. In this case, the first record that stores its file record location is called "base file record ).
The 1st records in MFT are the MFT itself. Because of the importance of MFT files, to ensure the reliability of the file system structure, the system has prepared an image file ($ mftmirr) for it, that is, the 2nd records in MFT.
3rd records are log files ($ logfile ). This file is NTFS designed for recoverability and security. When the system is running, NTFS records all operations that affect the NTFS Volume structure in the log file, including the command for creating and changing the directory structure of the file, such as copying, the NTFS Volume can be restored when the system fails.
4th records are volume files ($ volume ), it contains the volume name, the NTFS version of the formatted volume, and a flag indicating whether the disk is damaged (the NTFS system determines whether to call the chkdsk program for repair ).
The 5th records are attribute definition tables ($ attrdef, attribute definition table), which store all the file attributes supported by the volume and indicate whether they can be indexed and restored.
The 6th records are the root directory (\), which stores the indexes of all files and directories in the root directory of the volume. Once a file is accessed, NTFS retains the MFT reference of the file and can directly access the file for the second time.
The first record is a bitmap file ($ Bitmap ). The distribution status of the NTFS Volume is stored in the bitmap file. Each bit indicates a cluster in the volume, which indicates whether the cluster is idle or allocated, because the file can be easily expanded, NTFS volumes can be easily expanded dynamically, while the fat format file system involves changes to the fat table, therefore, you cannot adjust the partition size at will.
8th records are boot files ($ boot). It is another important system file that stores the boot program code for Windows 2000/XP. The file must be in a specific disk location to boot the system correctly. This file is created when the format program is run. This shows that NTFS regards everything on the disk as a file. This also means that although the file enjoys various security protections of the NTFS system, it can still be modified through common file I/O operations.
The first record is a bad cluster file ($ badclus), which records all corrupted cluster numbers in the volume on the disk to prevent the system from allocating and using them.
The 10th records are security files ($ secure), which store the Security Descriptor database of the entire volume. NTFS files and directories have their own security descriptors. To save space, NTFS stores files and directories with the same descriptors in a public file.
The first record is an uppercase file ($ upcase, upper case file), which contains a case-sensitive character conversion table.
The first record is the extended metadata directory ($ extended metadata directory ).
13th records are reparsed Point files ($ eXtend \ $ reparse ).
The first record is the change log file ($ eXtend \ $ usnjrnl ).
The first record is the quota management File ($ eXtend \ $ quota ).
The first record is the Object ID file ($ eXtend \ $ objid ).
17th ~ 23 records are retained by the system for future expansion.
The first 16 metadata files of MFT are so important. To prevent data loss, the NTFS system backs up them in the center of the file storage part of the volume. See.
NTFS divides the disk into two parts, about 12% of which are allocated to MFT to meet the increasing number of files. To maintain the continuity of the MFT Metafile, MFT has exclusive rights to the 12% space. The remaining 88% of the space is allocated to store files. The remaining disk space includes all the remaining physical space-the remaining MFT space is also included. The usage mechanism of the mft space can be described as follows: When the file is exhausted, the Windows operating system will simply reduce the MFT space and allocate it to the file storage. When there is space available, the space will be divided into MFT again. Although the system tries its best to keep MFT space specific, it sometimes has to make sacrifices. Although MFT fragments are sometimes intolerable, they cannot be prevented.
So how does NTFS access the questionnaire through MFT? First, when NTFS accesses a volume, it must "LOAD" the volume: NTFS will view the boot file (the file defined by the $ boot metadata file in the figure ), find the physical disk address of the MFT. Then, it obtains the ing information from VCN to lcn from the data attributes recorded in the file and stores the information in the memory. This ing information locates the running (run or extent) position of MFT on the disk. Next, NTFS opens the MFT records of several metadata files and opens these files. If necessary, NTFS starts to restore its file system. After NTFS opens the remaining metadata file, you can access the volume.


Part 2 file and directory records

NTFS processes files as a set of properties/attribute values, which is different from other file systems. File data is the value of the unnamed attribute. Other file attributes include the file name, file owner, and file time mark. Displays an MFT record for small files.
Each attribute is composed of a single stream, that is, a simple character queue. Strictly speaking, NTFS does not operate on files, but only reads and writes attribute streams. NTFS provides various operations on the attribute stream: Create, delete, read (byte range) and write (byte range ). Read/write operations are generally performed on untitled attributes of a file. For named attributes, you can perform operations using the named Data Stream syntax.
A file usually occupies one file record. However, when a file has many attribute values or is fragmented, it may occupy more than one file record. In this case, the first file record is its basic file record, which stores the location of other file records required by the file. Small files and folders (typically 1500 bytes or less) are all stored in the MFT record of the file.
Folder records include index information, and small folder records are completely stored in the MFT structure. However, large folders are organized into B + TREE STRUCTURES AND pointed to an external cluster with a pointer, this cluster is used to store the attributes of folders that cannot be stored in MFT.
Common attributes of files on NTFS volumes are listed in the following table (not all files have these attributes ).
Resident and non-resident attributes
When a file is very small, all its attributes and attribute values can be stored in the MFT file records. When an attribute value can be directly stored in MFT, this attribute is called a resident attribute ). Some attributes are always resident so that NTFS can determine other resident attributes. For example, standard information attributes and root indexes are always resident attributes.
Each attribute starts with a standard header. the header contains information about this attribute and NTFS is usually used to manage the attribute information. This header is always resident and records whether the attribute value is resident. For resident attributes, the header also contains the Partial Weight of the attribute value and the length of the attribute value.
If the attribute value can be directly stored in MFT, NTFS's access time will be greatly shortened. NTFS only needs to access the disk once to obtain data immediately. Instead of looking for a file in the fat table, as the FAT file system does, NTFS reads the continuously allocated unit and finally finds the file data.
All attributes of a small file or directory can be resident in MFT. The unnamed attributes of small files can include all file data. Shows how to create a small file:
Content of this file
File Attributes
For example, if NFI is used to view the file "new document .txt", the file record number is 36, and the content is shown as follows:
File 36
\ Create an audio file. txt

$ Standard_information (resident)
$ File_name (resident)
$ File_name (resident)
$ Data (resident)
From the display content, we can see that all the attributes of the file are resident attributes, including data attributes, and there is no resident attribute. Therefore, use winhex to open MFT and view the file records, such as the content.
File records of small files
The index root attribute of a small directory can include indexes of all files and subdirectories. See
MFT records in small Directories
All attributes of a large file or directory cannot be resident in MFT. If an attribute (such as a file data attribute) is too large to be stored in a 1 kb MFT file record, NTFS will allocate a region outside MFT. These regions are usually called run or extent, which can be used to store property values, such as file data. If the attribute value increases later, NTFS will allocate a run to store additional data. The attributes stored in the running file rather than in the MFT file record are called the nonresident attribute ). NTFS determines whether an attribute is resident or extremely resident, and the position of the attribute value is transparent to the process accessing it.
When an attribute is non-resident, such as data in a large file, its header contains information about which NTFS needs to locate the attribute value on the disk. Displays the attributes of a sub-database stored in two running instances.
Very resident attributes stored in two running
In standard attributes, only attributes that can be increased are very resident. For files, the attributes that can be increased include data and attribute lists. Standard Information and file name attributes are always resident.
A large directory may also contain very resident attributes (or attributes). For more information, see. In this example, the MFT file records do not have enough space to store file indexes in large directories. Some indexes are stored in the index root attribute, while the others are stored in the abnormal operation called "index buffer. Here, the index root, index allocation, and bitmap attributes are simplified. These attributes will be detailed later. For a directory, the index root header and partial score should be resident.
MFT records in large Directories
When the properties of a file (or directory) cannot be placed in an MFT file record and need to be allocated separately, NTFS Records run (run) through the ing between VCN-LCN) or Disk Area. Lcn is used to number the clusters in the entire volume from 0 to N in sequence, while VCN is used to number the clusters in a specific file in a logical order from 0 to M. Displays the VCN and lcn numbers used for running a very resident data attribute.
VCN with the data attribute
When the file contains more than two running instances, the third running instance starts from vcn8. The data attribute header contains the mappings between the first two running instances in VCN, which facilitates NTFS's query of disk file allocation. To facilitate NTFS quick lookup, the resident data property header with multiple runtime files contains the ing relationship of the VCN-LCN, see
VCN-LCN ing for resident data properties
Although data attributes are often stored in the running state because they are too large, other attributes may need to be stored in the running state because the MFT file records do not have enough space. In addition, if a file has too many attributes and cannot be stored in the MFT record, the second MFT file record can be used to accommodate these additional attributes (or very resident attribute headers ). In this case, an attribute called "attribute list" is added. The property list includes the file property name and type code, and the file reference of the MFT where the property is located. Property lists are typically used for large or fragmented files that require multiple MFT file records due to a large VCN-LCN ing relationship. Attribute lists are usually required for files with more than 200 running tasks.
Structure Analysis of MFT file records
The MFT file record in the main file table consists of the record header and attribute list, ending with "FF". Generally, the size is 1 K, or the size of a cluster (which is generally larger ), the record header contains the following fields:
Offset length (bytes) Attribute
0x00 4 mark, must be "file"
0x04 2 update the offset of the sequence us
0x06 2 update the size and array of the serial number USN, including the first byte
0x08 8 log file serial number lsn
0x10 2 serial number (SN)
0x12 2 hard connections
0x14 2 offset address of the first attribute
0x16 2 flag, 1 indicates that the record is in use, 2 indicates that the record is a directory
0x18 4 The total length of the record header and attribute, that is, the actual length of the file record,
0x1c 4 total length allocated to records
0x20 8 index number in basic file records
0x28 2 next property ID
0x2a 2 XP, boundary
0x2c 4 XP used, the record number of this file
The log file serial number $ logfile sequence number (LSN) is changed every time the record is modified.
The serial number sequence number (SN) is used to record the number of times that the primary file table records are reused.
Hard link count records the number of hard connections and only appears in basic file records.
The actual length of a file record is the actual byte space occupied by the file record on the disk.
The file index number in the basic file record. For the basic file record, its value is 0. If it is not 0, it is the file index number of the main file table, the file record number in the basic file record to which the object belongs. The basic file record contains the extended file record information, which is stored in the "attribute_list" attribute of the "attribute list.
The attribute list is a variable length area ending with "FF". For MFT records with 1 K length, the starting offset of the attribute list is 0x30.

Part 3 index record Structure Analysis

Each index record is composed of a standard index header and blocks containing index keys and index data. The size of the index record is defined in the boot record $ boot, usually 4 kb.
The structure of the Standard Index header is as follows:

Offset description
0x00 4 always "indx"
0x04 2 update sequence offset
0x06 2 update the size and arrangement of the serial number USN, including the first byte
0x08 8 log file serial number lsn
0x10 8 the index buffered in the index allocation VCN
0x18 4 index entry offset (relative to 0x18)
0x1c 4 index entry size (relative to 0x18)
0x20 4 index entry allocation size (relative to 0x18)
0x24 1 Non-page node is 1 (with sub-index)
0x25 3 always 0
0x28 2 update the serial number
0x2a 2s-2 update sequence arrangement

Common Index list

Name index description
$ I30 file name directory usage
$ SDH Security description $ secure
$ Ⅱ security ids $ secure
$ O object IDs $ objid
$ O owner IDs $ quota
$ Q quota $ quota
$ R reresolution point $ reparse

Part 4 how to convert FAT32 to NTFS

For Windows 2000/XP, the partition format conversion tool "“convert.exe" is provided ". Convert.exe is a DOS command line program attached to Windows 2000. It can be used to convert fat to NTFS without damaging the FAT file system. It is easy to use. First, switch to the doscommand line window in Windows 2000, and enter:
D:> drive letter to be converted by convert/Fs: NTFS
For example, if the system edisk is originally fat16/32 and needs to be converted to NTFS, the following format can be used:
D:> convert E:/Fs: NTFS
All conversions will be completed after the system restarts.
In addition, you can also use specialized conversion tools, such as the famous lossless partition tool partition magic, to convert the disk file format easily. Select the partition to be converted from the disk partition list on the page. Select the "Convert partition" button from the interface button bar, or select the "convert" command from the "operations" drop-down menu of the interface menu bar. Activate this function. Select "NTFS" as the conversion output on the interface, and click "OK" to return to the main interface of the program. Click "Apply" in the lower-right corner of the page to add settings. After that, the system restarts and completes the partition format conversion.
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.