Introduction to advanced programming in UNIX
---- Programming related to the file system (I)
1. About Directories
Regardless of the operating system, when it comes to the file system, the first thing that comes to mind is directories and files. In Unix systems, everything can be regarded as a file. A directory is a special file. I have introduced the concepts of the user's home directory, working directory, absolute path, and relative path in my previous article "getting started and basics of UNIX operating systems, we also know that using the PWD command can obtain the absolute path of the current working directory. How can we implement functions similar to the PWD command in the program? The getcwd () function is used here. Its definition is:
# Include <unistd. h>
Char * getcwd (char * Buf, size_t size); // If the Buf is returned successfully, if the Buf fails, null is returned.
The first Buf array in this function is used to store the character string of the path name of the current working directory. Size indicates the maximum amount of data that the Buf array can hold, and the return value of the function is the same as that in the Buf. Note that the cache must have enough length to hold the absolute path name and add a "/0" termination character. Otherwise, an error is returned.
For directory operations, the most common is to open the directory, read the directory information, and close the directory. The corresponding functions are:
# Include <sys/types. h>
# Include <dirent. h>
Dir * opendir (const char * dirname); // a pointer is returned when the call succeeds. If the call fails, null is returned.
Struct dirent * readdir (dir * dirp); // the pointer is returned when the call succeeds. If the call fails, null is returned.
Int closedir (dir * dirp); // if the call succeeds, 0 is returned. If the call fails,-1 is returned.
Use the opendir () function to open a non-existing directory or have no access permission to the directory, and use this function for common files will return null. After the opendir () function is successfully operated, a pointer to the Dir structure is returned, and the Dir structure is used to save information about the read directory. The most common member defined in the dirent structure in the header file <dirent. h> is d_name, which can save the file name.
Let's look at the following example:
[Program 1]
# Include <iostream>
# Include <unistd. h>
# Include <sys/types. h>
# Include <dirent. h>
# Include <errno. h>
Using namespace STD;
Int main ()
{
Dir * DP;
Cout <"Please enter a dir name :";
Char name [255];
Memset (name, 0x00,255 );
Cin> name;
Cout <"-----------------" <Endl;
Dp = opendir (name );
If (dp = NULL)
{
Cout <errno <"[" <strerror (errno) <"]" <Endl;
Return-1;
}
Dirent * dirp;
While (dirp = readdir (DP ))! = NULL)
{
Cout <dirp-> d_name <Endl;
}
Closedir (DP );
Return 0;
}
In Unix systems, once a program fails to be executed, the system automatically sets a global variable named errno to record the error ID number. You can use strerror (errno) to obtain the description of the specified error. However, you must include the header file of errno. h. When compiling and executing program 1, if you enter a directory name that already exists and has access permissions, the program will list all the subdirectories and file names under the Directory; if you enter a file name, 20 [not a directory] will be output. If the input is a non-existent directory name, 2 [no such file or directory] will be output.
Ii. Documents
Before introducing a file, we first introduce the file descriptor concept. The file descriptor is a non-negative integer used by the kernel to identify the file being operated by a specific process. Each time an existing file is opened or a new file is created, the kernel returns a file descriptor to the process for reading and writing files. For a process, the kernel maintains a file descriptor table in each process space. All opened files are referenced by the file descriptor in this table.
According to regulations, file descriptor 0 corresponds to standard input Cin, 1 corresponds to standard output cout, and 2 corresponds to standard error output cerr. In addition, each file descriptor in the file descriptor table corresponds to a file table item in the file table. The kernel maintains a file table for all opened files. Each file table item contains the File status signs (read, write, add write, synchronization, non-blocking, etc) current file displacement, V node pointer, where V node Pointer Points to V node table. The V node table consists of V node items, each v node item mainly contains v node information and I node information (including the device where the file is located, a track in a certain sector of the file stored in the hard disk, and other information) and the length of the current file. In Unix systems, a file corresponds to the unique v node entry in the V node table. That is to say, if two independent processes open the same file, there will be two file table items in the file table, but both file table items point to the same V node table item. Figure 1 shows the relationship among file descriptor table, file table, and V node table.
If two independent processes open the same file, the relationship between file descriptor table, file table, and V node table is shown in 2.
In Unix systems, file I/O only requires five functions: open, read, write, lseek, and close. The following is a one-to-one introduction.
1. The open function can open or create a file, which is defined:
# Include <sys/types. h>
# Include <sys/STAT. h>
# Include <fcntl. h>
Int open (const char * pathname, int Oflag,...); // if the operation succeeds, the file descriptor is returned. If the operation fails,-1 is returned.
When the OPEN function is used successfully, the returned file descriptor must be the smallest unused descriptor number. The pathname parameter of the open function is the name of the file to be opened or created. It can be a relative path or an absolute path. If the path is not provided, it is the current directory by default. The Oflag parameter is composed of one or more options to specify the open mode. Common options include o_rdonly (read-only), o_wronly (write-only), and o_rdwr (readable and writable ), these three items are mutually exclusive and should be specified as one of them during use. In addition, you can select the following options: o_append (the data written each time is added to the end of the file) and o_creat (if the file does not exist, create it; when using this option, the third parameter must be provided to indicate the access permission of the new file) and o_excl (if the file does not exist when used together with o_creat, an error is returned if the file already exists ). The third parameter of the open function is written ..., this parameter is used only when a new file is created to indicate the access permission of the created file. For example, 0700 indicates that only the user can read and write the file, and others cannot access the file.
[Procedure 2]
# Include <iostream>
# Include <unistd. h>
# Include <sys/types. h>
# Include <sys/STAT. h>
# Include <fcntl. h>
# Include <errno. h>
Using namespace STD;
Int main ()
{
Int I = open ("info.txt", o_rdwr );
Cout <"FD =" <I <Endl;
Int II = open ("new.txt", o_creat | o_excl, 0700 );
If (II <0) cout <strerror (errno) <Endl;
Else cout <"Create File successful" <Endl;
Return 0;
}
Assume that the info.txt file already exists in the project, compile and execute program 2, and you will see the following running results:
FD = 3
Create File successful
Execute Program 2 again, and the running result is:
FD = 3
File exists
2. The close function can close an open file, which is defined:
# Include <unistd. h>
Int close (int fd); // 0 is returned for success and-1 is returned for failure.
When you close a file by using the close function, all record locks on the file will be released. However, when a process is terminated, all open files are automatically closed by the kernel, so the program usually uses this feature instead of explicitly calling close to close the file.
3. The READ function can read data from open files. Its definition is:
# Include <unistd. h>
Ssize_t read (int fd, void * Buf, size_t nbytes); // The number of actually read bytes is returned when the request succeeds, and-1 is returned if the request fails.
The READ function obtains the current file displacement from the file table corresponding to the file descriptor through the given file descriptor, and starts from the current file displacement, read the data of a given length and put it into the Buf.
4. The Write function can write data to open files. Its definition is:
# Include <unistd. h>
Ssize_t write (int fd, void * Buf, size_t nbytes); // returns the number of actually written bytes upon success, and-1 upon failure.
The Write function starts from the current displacement of the file and writes data in the Buf to the file. Next, modify the code in step 2 and set the existing data in the info.txt file to ABCDE.
[Program 3]
# Include <iostream>
# Include <unistd. h>
# Include <sys/types. h>
# Include <sys/STAT. h>
# Include <fcntl. h>
# Include <errno. h>
Using namespace STD;
Int main ()
{
Int I = open ("info.txt", o_rdwr );
Cout <"FD =" <I <Endl;
Int II = open ("new.txt", o_creat | o_excl, 0700 );
If (II <0) cout <strerror (errno) <Endl;
Else cout <"Create File successful" <Endl;
Char Buf [500];
Memset (BUF, 0x00,500 );
Cout <read (I, Buf, 500) <"\ t ";
Cout <Buf <Endl;
Memset (BUF, 0x00,500 );
Read (0, Buf, 500); // read data from the keyboard
Cout <write (1, Buf, strlen (BUF) <Endl; // write data to the screen
Return 0;
}
Compile and execute program 3 and you will see the following running results:
FD = 3
File exists
6 ABCDE
12345 (keyboard input)
12345
6
5. The lseek function can explicitly adjust the current displacement of an opened file. Its definition is:
# Include <sys/types. h>
# Include <unistd. h>
Off_t lseek (int fd, off_t offset, int whence); // if the operation succeeds, the new file displacement is returned. If the operation fails,-1 is returned.
We have mentioned the term "current file displacement" many times in the above article. Here we will give a brief explanation. The current file displacement is a non-negative integer used to measure the number of bytes from the beginning of the file to a certain position. Each open file has an associated "current file displacement ". When opening a file, unless the o_append option is specified, the system sets the displacement to 0 by default. Generally, read and write operations start from the current file displacement and increase the number of read or write bytes. The second parameter in the lseek function is the number of bytes to be moved, and the third parameter is the moving policy. If whence is seek_set, the file's displacement is set to offset byte from the beginning of the file. If whence is seek_cur, the file's displacement is set to its current value plus offset, offset, which can be positive or negative. If whence is seek_end, the displacement of the file is set to the file length plus offset, offset, which can be positive or negative.
You may be interested in one of the above situations, that is, what will happen when you move the data back with seek_end? With this problem, we make the following changes to program 3.
[Procedure 4]
# Include <iostream>
# Include <unistd. h>
# Include <sys/types. h>
# Include <sys/STAT. h>
# Include <fcntl. h>
# Include <errno. h>
Using namespace STD;
Int main ()
{
Int I = open ("info.txt", o_rdwr );
Cout <"FD =" <I <Endl;
Int II = open ("new.txt", o_creat | o_excl, 0700 );
If (II <0) cout <strerror (errno) <Endl;
Else cout <"Create File successful" <Endl;
Char Buf [500];
Memset (BUF, 0x00,500 );
Cout <read (I, Buf, 500) <"\ t ";
Cout <Buf <Endl;
Memset (BUF, 0x00,500 );
Read (0, Buf, 500 );
Cout <write (1, Buf, strlen (BUF) <Endl;
Errno = 0;
Lseek (I, 3, seek_set); // lseek (I,-3, seek_set); // lseek (I, 3, seek_end );
If (errno! = 0) cout <strerror (errno) <Endl;
Write (I, Buf, strlen (BUF)-1); // strlen (BUF)-1 indicates removing the carriage return
Return 0;
}
Compile execution program 4. Run the result (the content in the info.txt file is ABCDE ):
FD = 3
File exists
6 ABCDE
ABCDE (keyboard input)
ABCDE
6
Run cat info.txt to view the file content: abcabcde.
If you replace the lseek statement in the program with "lseek (I,-3, seek_set);", recompile and execute the statement. After entering 12345 on the keyboard, the system prompts invalid argument. In the info.txt file, you will find that the content has changed to: abcabcde12345. The negative number cannot be used with seek_set. When an error is prompted, the system automatically appends the written data to the end of the file.
Change the lseek statement in the program to lseek (I, 3, seek_end) and re-compile and execute the statement. Enter xyz on the keyboard and check that the file content is abcabcde12345xyz. However, the file size changes from 13 to 19. Use the command OD-C info.txt to find three \ 0 before XYZ. It can be seen that the object continues to move backward at the end of the object. If data is written, \ 0 is automatically inserted before the object, which is called a void object.