Operating system File Management

Last Update:2016-05-01 Source: Internet

Author: User

Tags file handling

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

operating system File Management

In modern computer systems, to use a large number of programs and data, because of limited memory capacity, and can not be stored for a long time, so usually they are stored in the form of a file in the external memory, when necessary to transfer them into memory at any time. If the user directly manages the files on the external memory, it not only requires the user to be familiar with the external memory features, to understand the properties of the various files, and their location on the external memory, but also to maintain the security and consistency of the data in a multi-user environment. Obviously, this is a job that the user is not capable of or willing to undertake. Then, instead of the operating system added to the file management function, that is, to form a file system, responsible for the management of the files on the storage, and the access to files, sharing and protection, and other means to provide users. This not only facilitates the user, ensures the security of the file, but also can effectively improve the utilization of the system resources.

1. The concept of documentation

file :

An ordered sequence of related elements with symbolic names (filenames), which is a set of programs or data.

File system:

is a unified management of information resources in the operating system of a software, management of file storage, retrieval, update, provide a safe and reliable means of sharing and protection, and user-friendly.
The file system contains the file management program (a collection of files and directories) and all the files that are managed, the interface between the user and the external memory, and the system software provides the user with a uniform method (in logical units of data logging) to access the information stored on the physical media.

Disk knowledge about direct (random) Access devices: the read and write principle of hard disks and the generation of disk fragments

2. Classification of documents

classification by nature and purpose : System files, library files, user files.

system files: files made up of system software that allow users to execute them only through system calls or system-provided special orders, and do not allow them to be read-write or modified. Mainly consists of operating system cores and various system applications or utility programs and data
Library files: files allow users to read and execute them, but they are not allowed to be modified. Mainly consists of various standard sub-Libraries
User files: user files that are saved by the user through the operating system to be used by the owner or authorized user of the file. Mainly by the user's source program source code, executable target program files and user database data and other components.

by Operation Protection Classification: read-only files, readable writable files, executable files.
read-only files: allow only file owners and authorized users to read files, not write files. Tagged as:-r-----
Readable writable file: allows the file owner and authorized users to read and write files. Tagged as:-rw----
executable: allow file owners and authorized users to invoke execution of the file without allowing read and write files, marked by:---x---

classification by User point of View (Unix System file classification)

ordinary file (regular file) : Refers to the most general organization format of the system file, usually a character stream composed of unstructured files
Catalog file: is a special file composed of the directory information of the file, the operating system will also make the directory file, easy to unified management
Special files (device drivers)

according to the file Logical Structure divided into: streaming files (, unstructured operating system files), record files (structured database files).

Streaming Files: This is a file that consists directly of a sequence of characters (character streams), so you are streaming files

A large number of source programs, executable files, library functions, etc., is the use of unstructured file form, that is, streaming files. Its length is in bytes. When accessing a streaming file, a read/write pointer is used to indicate the next character to be accessed. You can think of a streaming file as a special case of a recorded file. In UNIX systems, all files are treated as streaming files, and even if there is a structure file, it is considered a streaming file and the system does not format the file.

A record file: A file composed of several records, which is also known as a record-type file. Also called a database file.

Records can be organized in a variety of ways to form different documents:

① sequential file: A file formed by a series of records in some order.

② index file: When a variable length is recorded, an index table is usually created for it.

③ Index Order File: It creates an index table for the file and sets a table entry for the first record in each group of records.

according to the file Physical Structure classification : Sequential files (also called concatenated files, sequential files), linked files, index files, hash files, index sequence files.

access by file : Sequential access to files, random access files.

in the Management Information System, categorized by the way files are organized : Sequential files, index files, direct access files.

sort by data in a file
Source files: files composed of source programs and data
target file: a file created by the source program compiled by the appropriate computer language compiler, but not yet through the target code linked by the linker program

3. How files are accessed

The way the file is accessed is determined by the nature of the file and by the user's use of the file. 　　1 sequential access. 2 random access (also called direct access).

3 Index Access

Tapes are sequential access. The disk is randomly accessed.

3.1. Sequential AccessSequential access is accessed in the order of the logical address of the file.

Sequential access to fixed-length records is straightforward. The read operation always reads the next record of the last read-out file, and automatically lets the file record read pointer advance to point to the next record location to be read. If the file is readable and writable. Then set a file record pointer, which always points to the next place to write the record, write a record to the end of the file. Allows the operation of a pre-hop or Backward N (integer) record for such a file. sequential access is primarily used for tape files, but also for sequential files on disk.
The sequential file of variable length records, the length information of each record is stored in the previous unit, and its access operation is carried out in two steps. When read, the unit that holds the record length is read first according to the reading pointer value. The current record is then written to the position of the record pointed to by the pointer, and the write pointer value is adjusted.
Because sequential files are sequentially accessed, groups and decomposition operations can be used to speed up the file's input and output.

3.2. Direct access (random access method)Many applications require that a record be read and written directly in any order. For example, the airline booking system, which identifies all information on a particular flight with a flight number, is stored in a physical block, and the user is required to take out the flight information directly when booking a flight. The direct access method is suitable for such applications, which are typically used for disk files.
For direct access, a file can be considered to consist of sequential numbered physical blocks, which are often divided into equal lengths as a minimum unit of positioning and access, such as a block of 1024 bytes, 4096 bytes, depending on the system and application. So the user can request to read block 22, then, write Block 48, then read Block 9 and so on. Direct access files have no restrictions on the order of read or write blocks. The user provides the operating system with a relative block number, which is a displacement relative to the beginning of the file, and the absolute block number is obtained by the system conversion.
3.3. Index AccessThe third type of access is an indexed file-based index access method. Because the record in the file does not follow its location in the file, but presses its record key to address it, the user provides the operating system record key to find the required record.
Usually records are stored in some order of record keys, for example, by alphabetical order of the key. For this kind of file, the method of sequential access or direct access can be used in addition to key access. The address of the information block can be converted by looking up the record key. In the actual system, the multi-level index is used to speed up the record finding process.

4. Several common file physical structures

Several common file physical structures:

Sequential files (also called concatenated files, sequential files), linked files, index files, hash files, index sequence files.

5. Sequential files

Refers to the physical records in a file are created by storing them sequentially in the order in which they are logically recorded in the file. That is, the order of the physical records in the sequential file is consistent with the order of the logical records.

Sequential files can have two different implementations in storage media: continuous structure and chain structure.
Continuous structure: is one of the simplest physical file structures, which store logically contiguous file information in sequential numbered physical blocks. The position of the two physical records in sequential order on the storage medium is adjacent. Also known as continuous files;

Figure 5.19 shows a graphical description of the continuous structure file. In the diagram, a file with a logical block number of 0, 1, 2, and 3 is stored in the physical block 15, 16, 17, 18 in turn.

5.19 components of a continuous structure file
The advantage of a continuous file structure is that once you know the file's start address on the file storage device (the first block number) and the length of the article 5.19 continuous structure file (total block count), it can be accessed quickly. However, the continuous structure file must establish the file information length in the file description information, and can not grow dynamically after the file is created, and after some parts of the files are deleted, it will leave the unusable space. Therefore, the continuous structure should not be used to store user files, database files and other frequently modified files.

The advantages of a continuous structure are:

(1) Simple structure;

(2) The sequential access speed is fast, for sequential files of equal length records can be accessed sequentially, but also can be similar to the random access of binary lookup, but for the continuous files with unequal length records can only be sequentially accessed;
(3) because the data is stored in a contiguous block, the number of seek times and the seek time is less when accessed.

Disadvantages of continuous structure storage:

(1) Since the insertion and deletion of records will cause the movement of other records, doing this in external memory will cause the head to move frequently, so the continuous structure can only insert records at the end of the file, delete the record, only as a token for the tombstone, only when the user specifies the physical deletion to delete the corresponding record, record movement;

(2) sequential files require contiguous block storage data, so when inserting a record when the original allocated disk block has no free space, and its adjacent block is not idle, you need to re-find the new large free space in the external memory, and move the original data into the new space before the new data can be inserted, therefore, Continuous structures are not easy to grow dynamically, and external memory are prone to fragmentation.

The chain structure disperses logically contiguous file information in a number of discontinuous physical blocks, where each physical block has a pointer to another physical block of its subsequent connection. That is, the order of the physical records is represented by a chain of pointers. Also known as tandem files

Figure 5.20 shows the physical structure of the chain structure file. When you use a chain structure, you do not have to specify the length of the file in the file description information, as long as you indicate the first block number of the file to retrieve the entire file by chain pointer. Another feature of the chain structure is that the file length can grow dynamically, as long as the chain pointer is adjusted to insert or delete a block of information between any block of information.

Figure 5.20 of the chain structure file

When a file is in a chain structure, the conversion of a logical block to a physical block is accomplished by the system finding the physical block number corresponding to the logical block number along the chain. For example, in the file structure of Figure 5.20, if the user is going to operate a logical block number of 2, then the system starts with the first physical block 20, along the chain to search for the logical block number 2 of the third block, get its corresponding physical block number 22. Therefore, the chain structure is not suitable for random access access.

The main advantages of chain structure are:

(1) Improve disk space utilization, solve the problem of disk fragmentation;

(2) Facilitate the insertion and deletion of documents;
(3) Facilitate the dynamic growth of documents.
In essence, sequential files are linear tables, so the various operations of sequential files are similar to linear tables, however, external memory access speed is much slower than main memory, in considering the algorithm should be based on minimizing the number of external memory visits, seek the number of times and seek time.

Tapes are typical sequential access devices, so files stored on tape can only be sequential files.

6. Index file

1. Index fileEstablish an index table of correspondence between logical and physical records. This type of file, which includes two parts of the data Go and Index table, is called Make index file。 2. Index Table Composition
The index table consists of several indexed items. A generic index entry consists of the primary key and the physical address of the record where the keyword resides. 6.1 (b). Note: The index table must be ordered by the primary key, while the main file itself can be ordered or unordered by the primary key. 3. Indexed sequential files and indexes non-sequential files
(1) Index order files (Indexed sequential file): The primary file is called the index order file by the primary key in an ordered file.
In an index order file, you can establish an index entry for a set of records. This index table is called a sparse index.
(2) Index non-sequential files (Indexed nonsequentail file): The main file is unordered by the primary key for File name index non-sequential file.
In an index non-sequential file, you must establish an index entry for each record, so that the index table that is established is called a dense index.
Attention:
① typically index non-sequential files simply as index files.
② index non-sequential file main file unordered, sequential access will frequently cause head movement, suitable for random access, not suitable for sequential access.
The primary file of the ③ index sequence file is ordered and is suitable for random access and sequential access.
The index of the ④ index sequential file is a sparse index. The index takes up less space and is the most common type of file organization.
⑤ most commonly used index order files: ISAM files and VSAM files.

4. Index file Operations：

1). Search by: Direct access and keyword access. "Search" will be done in two steps: First look up the index table, use binary lookup method to retrieve the index table, and then according to the index in the pointer to the record (index entry indicates the external memory physical address) read the external memory record.

Note: When the ① Index table is small, the index table can be read into memory at one time, and only two accesses external memory in the index file: one read index, one read record.
② because the index table is ordered, the lookup of the index table can be found in order or binary search method.

2). When you insert a record, "record" is inserted at the end of the main file, and the corresponding "index entry" must be inserted at the appropriate location in the index. Therefore, it is advisable to leave a certain "vacancy" in the Index table.

3). When you delete a record, you only need to delete the corresponding index entry in the Index table.

4). When you update a record, you should insert the updated record at the end of the main file and modify the corresponding index entry.

Figure 6.1 (a) main file (data area) (b) Index Table C (Index table established during the input process)

5. Creating multi-level indexes with lookup tables

1) lookup table :

An index established on an index table, called a lookup table. The lookup table can be established to reduce the number of external memory accesses for the lookup of index tables that occupy multiple page blocks.

Figure 6.1 (b) of the Index table occupies three page blocks of external memory, each page block can hold three index entries, can be shown in Figure 6.2. When retrieving records, look up the lookup table, check the index table, and then read the record, three access to external memory.

Figure 6.2 Figure 6.1 (b) Index of the index table,

2) Multilevel Index
A higher-level index can be established when there are still many items in the lookup table. Typically up to level four indexes:
Data File One index table one lookup table A second lookup table a third lookup table.
The "example" retrieval process starts with the top-level index-the third lookup table and requires 5 accesses to the external memory. ：

Attention:
① Multilevel index is a static index
② Multi-level index index is a sequential table, simple structure, modification is inconvenient, each modification to reorganize the index.

3) Dynamic indexing
When the data file is used in the process of recording more changes, the use of binary sorting tree (or AVL tree), B-tree (or its variant), such as a tree table structure of the index, is a dynamic index.
(1) tree table features
① easy to insert and remove
② itself is a hierarchy without the need to establish a multilevel index
③ the process of establishing an index table is the sort process.
(2) tree table structure selection
① when the number of records in a data file is not large enough to accommodate the entire index table, a binary sort tree (or AVL tree) can be used as an index;
② when a file is large, the index table (the tree table) itself is also stored, and the number of times that the index is accessed is external memory to find the node on the path. It is advisable to use M-order B-trees (or their variants) as index tables (the choice of M depends on the number of index entries and the size of the buffer).
(3) Search performance evaluation of Index Table of external memory
Because the time to access external memory is much larger than in-memory lookups, the lookup performance of the External Memory Index table focuses on the number of times the external memory is accessed, that is, the depth of the index table.

Advantages and Disadvantages

Index structure is an extension of chain structure, in addition to the advantages of chain structure, but also overcomes the disadvantage that it can only be used for sequential access, has the ability to read and write any record directly, and facilitates the insertion, deletion and modification of file records.

The disadvantage of an index file is that it increases the spatial and lookup time of the index table, and the amount of information in the index table may even exceed the amount of information in the file record itself.

There are two typical index-order files :

A. isam file: ISAM (IndexSequential Access Method) (indexed sequential access method) is a file organization method designed for disk access.

ii.. vsam File: VSAM (Vistual Storage Access Method) file is a file organized using the functionality of the virtual memory provided in the operating system, eliminating the user's reading/ When writing records directly to the external memory operation, to the user, the file only control zones and control areas and other logical storage units.

7. isam files and VSAM files

7.1 ISAM File

1. ISAM file composition

ISAM is an abbreviation for indexed sequential access method (indexed sequential access methods), which is designed for disk access files
File organization method, using static index structure.
Because disks are devices that are accessed at the three-level address of a disk group, cylinder, and track, a three-level index of the disk group, cylinder, and track can be established on the data files on the disks.

1) Track Index

Each index entry in the track index consists of two sub-index entries: a base index and an overflow index entry, and each sub-index entry is made up of two items, the keyword and the pointer.
The base index entry keyword records the largest (last record) keyword in the track, and the pointer records the position of the first record in that track;

An overflow index entry records the most-key word of an overflow record in that track, and the Pointer records the first record in the overflow area.

2) Cylinder Index

Cylinder Index Each index entry consists of a keyword and a pointer, which records the largest (last record) keyword in the cylinder, and the pointer records the position of the track index in that cylinder.

3) Primary Index
The cylinder index is stored on a cylinder, and the index of the cylinder index-the primary index-is established if the cylinder index is too large to occupy multiple tracks.

Thus, the ISAM file consists of a multilevel primary index, a cylinder index, a track index, and a master file. The following guidelines apply when files are stored:

When the record is stored on the same disk group, it should be placed on a cylinder first, then stored sequentially on the adjacent cylinder, and the same cylinder should be stored in the order of the disk surface.

The various index entry structures are shown in 7.1:

->->

Figure 7.1

Figure 7.2 shows the structure of a ISAM file, which indicates that the primary index is the index of the cylinder index, where there is only one primary index.

7.2 VSAM File

Multi-level primary indexes can be used when a file occupies a large cylinder index and makes the primary index large. Of course, if the cylinder index is small, the primary index can be omitted. Typically the primary and cylinder indexes are placed on the same cylinder (Figure 7.2 is placed on the No. 0 cylinder), and the primary index is placed on the first track of the cylinder (Figure 7.2
On the 0-cylinder 0 track), and the column index is stored in the subsequent track. Each cylinder holding the main file has a track index placed on the front track T0 of the cylinder, followed by several tracks that are the base area for the master file record, and the last number of tracks on the cylinder are overflow areas. The records in the base area are stored in the primary key size order, and the overflow area is shared by each track in the base area of the entire cylinder, and when a track in the base area overflows, the overflow record of the track is linked by the main key size into a linked list (the overflow list) into the overflow area.

2. ISAM file Retrieval

When retrieving records on the ISAM file, the process is as follows:

1) from the main index, find the corresponding cylinder index;

2) Find the track index of the cylinder on which the record is located from the column surface index;

3) Find the starting address of the track where the record is located from the track index, and proceed to search the track sequentially until it is found.

If this record is not found in the track, it indicates that there is no record in the file, and if the found record is in the overflow area, the header of the overflow list can be obtained from the overflow index entry of the track index item.
Pointer, and then finds the table in order.

For example, to find the record R136 in Figure 7.2, first check the primary index, that is, read into C0t0, because 136<286, then find the c0t1 of the cylinder index, that is, read people c0t1; because 136<145, so further c1t0 read into memory; Check the track index because 90< 136<145, so c1t2 is the R136 stored track, read people c1t2 can be found R136.
In order to improve retrieval efficiency, it is usually possible to have the primary index resident memory and place the cylinder index on a cylinder that occupies the center of the data file, so that the average value of the head movement distance is minimized when looking from the cylinder index to the track index.

Insert operation for 3.ISAM files
When inserting a new record, first find the track it should be inserted, if the track is not satisfied, then insert the new record in the appropriate position of the track, if the track is full, the new record or inserted on the track, or directly into the track overflow linked list. After inserting, you may want to modify the base index entries and overflow index entries in the track index.

(1) inserting R65, the first 1 cylinder 1 tracks in the track of more than 65 to move sequentially, causing the R90 overflow to overflow area T11 ' 0 (11 tracks 0 blocks), resulting in the maximum key of the track T1 become R80, modify the track index, the base item in the Maximum keyword 90 is modified to 80, Change the maximum keyword in the overflow item to 90, pointer to T11 ' 0 (overflow header in 11 track 0 block), and insert R65 at the appropriate location.
For example, in Figure 7.2, insert R65 R95 and R83 in turn.
(2) Insert the R95, so that the T2 in the R145 overflow to overflow area T11 ' 1, modify the corresponding track index. (3) Insert R83, because 80<83<90, then 83 directly into the overflow area T11 ' 2, its pointer to T11 ' 0, and modify the overflow list of track 1, so that the table header points to T11 ' 2. After the insert is complete, result 7.3 is shown.

Figure 7.3 VSAM

4. ISAM file Delete operation
The operation of deleting records in the ISAM file is much simpler than inserting, as long as the record to be deleted is found and the deletion mark is placed on its storage location, without the need to move the record or change the pointer. After several additions and deletions, the structure of the file may become unreasonable. At this point, a large number of records into the overflow area, and the basic area wasted a lot of space. Therefore, it is often necessary to periodically defragment the ISAM file, read the records into memory and rearrange them into a new ISAM file that fills the base area and empties out of the overflow area.

7.2 VSAM File

VSAM is an abbreviation for virtual Storage access method, which is also a group of indexed sequential files
Using B + trees as the dynamic index structure. This kind of file organization takes advantage of the function of the virtual memory provided in the operating system, and the user can read/write records without having to consider the specific storage information such as cylinder and track in the external memory, the file only has logical storage units such as control interval and control area, and this kind of storage may put a control interval in one track. A control interval can also span a track.

1. VSAM File structure

The structure of the VSAM file consists of three parts: the index set sequence set data set

Figure 7.4

Structure of control interval in 2.VSAM file

in the Vsam file, the record can be either fixed-length or variable-length. Thus in the control interval, in addition to the record itself, there is control information for each record (such as the length of the record) and the entire interval of control information (such as the number of records stored in the interval, etc.), the control interval is shown in Structure 7.5. Access to a record on the control interval is to be scanned at the same time from both ends of the control interval.

Figure 7.5 VSAM File control interval structure diagram

inserting a 3.VSAM file
There is no overflow area in the VSAM file, and the way to resolve the insertion is to make room for the initial file: one is that the record is not filled in each control interval, there is a gap between the last record and the control information, and the second is that there are completely empty control zones in each control area, and these are indicated in the index of the sequence set When inserting a new record, most of the new records can be inserted into the corresponding control range, but note: In order to keep the key words recorded in the interval from small to large order, you need to move the intra-interval keyword greater than the record inserted into the record key, to the direction of the control information.

If the control interval is full after several records have been inserted, the control interval is split when the next record is inserted, that is, nearly half of the records are moved to the full-empty control interval within the same control area, and the corresponding indexes in the order set are modified. If the control area does not have an all-empty control interval, then the division of the control area is divided, and the nodes in the sequence set are divided, which requires modifying the node information in the index set. However, because of the large control area, there are usually few cases of division.

4. Deletion of VSAM files

When you delete a record in a VSAM file, you move the record in the same control interval that is larger than the record key, leaving the space for new records to be inserted later. If the entire control interval becomes empty, the recycle is used as an idle interval, and the corresponding index entry in the order set needs to be deleted.

5. Advantages of the VSAM file
compared to ISAM files, the VSAM file based on B + Tree has the following advantages: it can maintain a high search efficiency, find one after inserting records and find an original record with the same speed, dynamically allocate and free storage space, can maintain the average 75% storage utilization , and you never have to re-organize the files. Thus, VSAM files based on the B + tree are generally used as standard organizations for large index sequential files.

8. Hash (direct file) file

1. hash file

A hash (hash) file, also known as a hash file or a direct Access file, is a file organized using the hash function method, which is similar to a hash table, that is, according to the characteristics of the key words recorded by the file, design a hash function and the method of dealing with the conflict, so that the record is hashed to the external memory. Because the hash file is calculated to determine where a record is stored on the storage device, the logical order of records is not contiguous on the physical address, so the hash file is not suitable for tape storage and is only appropriate for use with disk storage and hash file This structure only applies to fixed-length record files and random lookup by primary key access mode.

The way a hash file is organized is a little different than the way the hash table is organized. For a hash file, the file records on disk are usually stored in groups, and several records form a storage unit called buckets. If a bucket can hold M records, that is, records with the same number of M hashes can be stored in the same bucket, and conflicts occur when a record with the same value as the m+1 hash function appears.

2. The method of resolving conflicts by chaining address method is
The method of handling conflicts in a hash file can also take various methods of handling conflicts in a hash table, but the chain address method is a hash file handling conflict
The preferred method.
An "overflow" occurs when a record of the same hash function in a bucket exceeds m, and a bucket is dynamically generated to hold those records with the same value as the hash function that overflowed. Buckets that store records that have the same value as the first m hash function are called buckets, and buckets that store overflow records are called overflow buckets. The structure of the bucket and the overflow bucket is the same, with an array of M Records plus a bucket address pointer.

When a bucket does not overflow, the pointer in the bucket is empty, when the bucket overflows, dynamically generates an overflow bucket to hold the overflow record, the pointer in the bucket is set to point to the overflow bucket, and if a record with the same hash function in the overflow bucket overflows, the second overflow bucket is dynamically generated to hold the overflow record. The pointer in the first overflow bucket is placed to point to the second overflow bucket. This makes up a link

Figure 8.1 Hash file.

For example, suppose a file has 20 records and its keyword collection is {2,23,5,26,1,3,24,18,27,12,7,9,4,19,6,16,33,11,10,13}. capacity of barrels = 3, barrels
Number = 7, using the remainder method as the hash function h (key) =key%7, its corresponding hash file 8.1 is shown.

3. Finding records in a hash file
First, according to the key value of the unknown origin record to obtain the hash address (that is, the bucket address), the bucket record read into memory for sequential lookup, if the key of a record is found to be equal to the unknown Origin record keyword, then the search succeeds; if there is no unknown origin record in the bucket and the pointer in the bucket is empty, the file does not have If there is no unknown origin record in the bucket and the pointer inside the bucket is not empty, the records in the overflow bucket are read into memory for sequential lookups, and if the unknown Origin record is found in an overflow bucket, the lookup succeeds if the unknown Origin record is not located in any overflow bucket chain.

4. Delete a record in the hash file
Only delete the deleted record as the deletion mark can be.

6. The advantages of a hash file are:

(1) The file is stored randomly, the record does not need to be sorted;

(2) Easy to insert and remove;

(3) fast access speed;

(4) Do not need the index area, save storage space.

7. The disadvantage of a hash file is :
(1) cannot be sequential access, only by keyword random access;
(2) Inquiry method is limited to simple inquiry;
(3) After many insertions and deletions, it may result in unreasonable file structure and the need to reorganize the files.

9. Multiple Files

1. Multiple table files
A multi-table file is an organization that combines an indexed method with a link method, establishes a primary index on the primary key, establishes an index on each sub-keyword that needs to be queried, and links records with the same secondary key into a linked list, with the head pointers, linked list lengths, and sub-keywords of this list, As an index entry for the index table. Usually the main file of a multi-table file is a sequential file.

2. Inverted file
The inverted file and the multi-table file are constructed similarly, the main difference being that in the Secondary keyword index, records with the same secondary key are not linked by pointers, but instead all physical record numbers with that secondary keyword record are listed in the inverted table. The Secondary keyword index in the inverted file is called the inverted list. The inverted table, together with the main file, forms the inverted file. The inverted list in the previous example file is shown in table 9.2.

Figure 9.2 Inverted table

3. Application of Inverted files
Inverted file application is very extensive, for example, in the design of Web or other text search engine, when the search engine collects the data to preprocess, the search
The engine often needs an efficient data structure to provide retrieval services to the outside, and the most effective data structure is the inverted file, he is one of the core content of the search engine.

For details, see: Inverted index-the cornerstone of a search engine

Reference:

Data structure (C language version). Min _ 聯繫

"Computer operating system tutorial" Zhang Yao Third edition

Operating system File Management

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More