1. files, records, and data items
A data item is the basic unit of a file. A record is a set of all attributes of an object, and a file is a set of records in a group.
2. logical and physical structures of Files
The physical structure refers to the storage organization of files stored in external storage, and the logical structure refers to the files seen from the user's perspective. The physical structure is for the system, and the logical structure is to make the computer's file management transparent.
3. For a logical structure, there are two types of files: structured files and non-structured files (streaming files ). A structure refers to a file composed of more than one discipline. A non-structure refers to a string stream file such as a source program.
4. For structured files, the commonly used file organization method is index ordered files.
Divides all records in an ordered file into N groups, each of which has a capacity of M. In this way, when searching for a file, we query from the index based on the keyword, after finding the keyword, you can search for the files under the index in sequence.
5. External Storage allocation methods include continuous allocation, link allocation, and index allocation.
Continuous allocation: (unrealistic)
Apply for a continuous storage space for each file.
Advantage: quick search, because Skip addressing is not required.
Disadvantage: the file size must be known and dynamic files cannot be correctly processed. For the system, a large enough continuous storage space is required.
Link allocation: (Windows External Storage Allocation Policy)
The essence is discrete storage. It can also be divided into: implicit link and explicit link
1) implicit link
The space allocated to each file must contain a pointer to the first disk block and the last disk block of the linked file.
The problem with implicit allocation lies in the complexity of O (n) for access to a certain record, which is too low in efficiency.
2) explicit link
In essence, it is to save a full disk space File Allocation Table (File Allocation Table) in the memory)
Advantage: solved the problem that the system does not have enough continuous space.
Disadvantage: file search is still sequential search, which does not improve much efficiency. It is too costly to store the fat into the memory.
Index allocation:
Because the fat for link allocation is too large, we can use the index block method. In fact, when we need a file, we only need to transfer the disk block number occupied by the file to the memory. Therefore, we use an index block to store all the disk blocks of the file. Then, when creating a file, you only need to add a pointer to the index block in the file directory.
Advantage: solved the problem that the memory cannot be transferred when the fat is too large. Accelerate the speed of accessing a file record (all disk blocks of the file are stored in the index block ).
Disadvantage: extra external storage space (index block) is required ). Each time a file is created, an index block needs to be allocated to it. However, in practical applications, most small and medium files are used. They usually occupy only one or two disk blocks, so the index block serves almost 0.
It is divided into three types: single-level index allocation, multi-level index allocation, and hybrid index allocation. They are essentially the same, but the latter two types are for medium and large files.
Specifically, when a file is too large and there are too many index blocks allocated, we must process a large number of index blocks. So we also use the index block method to create an index block table, store the index block number.
6. Fat Technology and NTFS Technology
Both fat and NTFS are connections in the external storage allocation method.
Fat Technology:
Fat12 is based on the disk block. Therefore, there is a fat in the system, and each table item stores the next disk block number. For a 12-bit fat12 technology, 12-bit storage of 4096 table items. If the size of each disk block is 4096 B, the capacity is * B = 2 MB. If a physical disk supports four Logical Disk Partitions, the maximum disk capacity is 8 Mb.
To adapt to the increasing disk capacity, we use "clusters" as the new basic unit for disk block allocation.
A cluster is a group of consecutive slices with a size of 2n disk blocks. For example, if a cluster contains a sector, the maximum disk capacity is 8 Mb. If a cluster contains two sectors, the maximum disk capacity is 16 MB (the table item stores the cluster number, 4096 cluster * 1kb * 4 ).
Disadvantage: although you can increase the maximum disk capacity by increasing the cluster size, the fragmentation in the cluster will increase dramatically.
NTFS Technology (New Technology File System)
The basic unit of NTFS is also a cluster, and the size is generally 4 kb.
For cluster positioning, NTFS uses the combination of logical cluster numbers and virtual cluster numbers. Logical cluster numbers are sorted in ascending order by volumes (C/D/E disks). During NTFS address ing, the logical cluster numbers are matched by the cluster size and logical error numbers, you can get the address of the volume. At the same time, using a virtual cluster number is for the user file. It refers to numbering the cluster number used in the file. If you know the start address of the file, you can map the virtual cluster number to the logical cluster number to get the address.
The core result of NTFS is the master file table (MST), which stores various information about all files and unallocated space.
Each MST table item is about 1 kb in size. for metadata, if all information can be recorded, it will be saved in MST; if not, it will be saved in the cluster of the corresponding file.
Advantage: because the table item is 2 to the power of 32, when the capacity expands, you can still use a smaller cluster (generally 4 kb) to prevent the increase of fragments in the cluster.
Disadvantage: poor compatibility, not backward compatible
7. UNIX hybrid index allocation
In the index node of UNIX system, a total of 13 address items are set, that is, ADDR (0 )~ Add (12)
1) direct address
To speed up file retrieval, 10 direct address items are set in the index node to store the disk block number of the disk where the file data is located. If the size of each disk block is 4 kb, when the file size is not greater than 40 kb, you can directly read all the disk blocks of the file from the index node.
2) one indirect address
An indirect address can be used for medium-sized or large files. In essence, it is the primary index allocation method. If the size of each file is 4 kb and the disk block number occupies 4 bytes, the size of the file allowed by the first-level index is: 4 kb/4B = 1024 disk blocks, 1024*4 kb = 4 MB.
3) Secondary indirect address
When the file length is greater than 4 MB + 40 kb, secondary indirect addresses can be used. In essence, it is the second-level index allocation method. The maximum file length is 4kb/4B = 1024, * 4kb = 4 GB.
4) Three indirect addresses
When the file size is larger, you can use a third-level index allocation method. The maximum length is 4 TB.