Linux Files, file descriptors, and DUP () and dup2 ()

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Files in Linux

There are four types of files: common files, directory files, link files, and device files.

1. Common files

Is the most commonly used file, including text files, shell scripts, binary executable and various types of data.

Ls-LH is used to view the properties of a file. We can see that there is a file like-RW-r --. It is worth noting that the first symbol is -, such a file is a common file in Linux. These files are generally created using related applications, such as tools, document tools, archiving tools... or CP tools. To delete such files, run the RM command;

2. Directory files

In Linux, directories are files, including file names and sub-directory names, and pointers to those files and sub-directories.

When we run the command in a directory, we can see a file like drwxr-XR-X, which is a directory. The directory is a special file in Linux. Note that its first character is D. Run mkdir or CP to create a directory. CP can copy one directory to another. Delete the file by using the RM or rmdir command.

3. link files

The linked file is similar to the "shortcut" in windows ".

It is created by using the new name of the Ln-S source file.

4. Device Files

There are two types: block device files and character device files.

Block device files are read/write data. They are block-based devices, such as hard drive

Character devices mainly refer to Interface Devices of serial ports, such as NICs.

Ii. file descriptor

1. file descriptor and its Functions

The kernel uses file descriptor to access files. The file descriptor is a non-negative integer. When an existing file is opened or a new file is created, the kernel returns a file descriptor. To read and write a file, you also need to use the file descriptor to specify the file to be read and written. For Linux, all operations on devices and files are performed using file descriptors. The file descriptor is a non-negative integer, which is an index value and points to the record table for each process in the kernel to open the file. When an existing file is opened or a new file is created, the kernel returns a file descriptor to the process. When you need to read and write files,

You also need to pass the file descriptor as a parameter to the corresponding function.

Generally, when a process starts, three files are opened: standard input, standard output, and standard error handling. These three files correspond to the file descriptors 0, 1, and 2 respectively, that is, replacing stdin_fileno, stdout_fileno, and stderr_fileno with macros, encouraging readers to use these macros ).

Check the default file descriptors of Linux. There are a total of 1024 file descriptors, which are sufficient in most cases:

# Ulimit-n

View process ID

# Ps aux

Obtains the file descriptor of a process.

CD/proc/[pid]/FD
[Pid] is the PID of the corresponding process.

# Cd/proc/1473/FD

# Sysctl-A | grep fs. File

Nr is used

Reference: Encyclopedia http://baike.baidu.com/view/1303430.htm.

Iii. DUP and dup2

DUP and dup2 are two very useful calls. They are used to copy the descriptor of a file.
They are often used to redirect stdin, stdout, and stderr of processes.
The original form of these two functions is as follows:
# Include <unistd. h>
Int DUP (INT oldfd );
Int dup2 (INT oldfd, int targetfd)
Using the DUP function, we can copy a descriptor. Pass it to the function an existing Descriptor and it will return a new descriptor,
This new descriptor is a copy of the descriptor passed to it. This means that the two descriptors share the same data structure. For example,
If we perform the lseek operation on a file descriptor, the location of the first file is the same as that of the second one.
The following code snippet describes how to use the DUP function:
Int fd1, fd2;
...
Fd2 = DUP (fd1 );

Note that we can create a descriptor before calling fork, which is the same as calling DUP to create a descriptor,
The sub-process will also receive a copied descriptor.
The dup2 function is similar to the DUP function, but the dup2 function allows the caller to specify the ID of a valid Descriptor and a target descriptor. When the dup2 function is returned successfully,
The target Descriptor (the second parameter of the dup2 function) will become a replica of the source Descriptor (the first parameter of the dup2 function). In other words,
The two file descriptors now point to the same file and are the files pointed to by the first parameter of the function. The following code is used to describe:
Int oldfd;
Oldfd = open ("app_log", (o_rdwr | o_create), 0644 );
Dup2 (oldfd, 1 );
Close (oldfd );
In this example, we open a new file called "app_log" and receive a file descriptor named fd1. We call the dup2 function,
The parameters are oldfd and 1, which causes the file descriptor represented by 1 to be replaced with the newly opened file descriptor (that is, stdout, because the ID of the standard output file is 1 ).
Everything written to stdout is now written into the file named "app_log.
It should be noted that after the dup2 function copies the oldfd, it will immediately close it, but will not turn off the newly opened file descriptor, because file descriptor 1 now points to it.
Next we will introduce a more in-depth sample code. Recall the command line pipeline mentioned earlier in this article, where we use the standard output of the LS-1 command as the standard input.
Connect to the WC-l command. Next, we will use a C program to illustrate the implementation of this process. Sample Code 3 is shown in the following code.
In Example code 3, first create an MPS queue in the code of line 3, and then divide the application into two processes: one sub-process (line 13-16)
And a parent process (LINE 20-23 ). Next, first disable the stdout Descriptor (13th rows) in the sub-process, and then provide the LS-1 command function,
However, it is not written to stdout (row 13th), but to the input end of the pipeline we created, which is redirected through the DUP function. In row 14th,
Use the dup2 function to redirect stdout to the pipeline (PFDS [1]). Then, immediately turn off the input of the pipeline. Then, use the execlp function
Replace the image with the process image of the command LS-1. Once the command is executed, any output will be sent to the input end of the pipeline.
Now let's look at the receiver of the pipeline. It can be seen from the code that the receiving end of the pipeline is undertaken by the parent process. First, disable the stdin Descriptor (20th rows ),
Because we will not receive data input from standard device files such as the keyboard of the machine, but from the output of other programs. Then, use the dup2 function again (row 21st ),
Change stdin to the output end of the pipeline by making the file descriptor 0 (that is, the conventional stdin) equal to PFDS [0. Close the stdout end of the MPs Queue (PFDS [1]),
Because it is not used here. Finally, use the execlp function to replace the image of the parent process with the process image of the WC-1 command. The command WC-1 uses the content of the pipeline as its input (line 1 ).
Sample Code: code that uses C to implement command line operations
# Include <stdio. h>
# Include <stdlib. h>
# Include <unistd. h>

Int main ()
{
Int PFDS [2];

If (pipe (PFDS) = 0) {// create a pipeline

If (Fork () = 0) {// sub-process

Close (1); // close the stdout Descriptor
Dup2 (PFDS [1], 1); // redirects stdout to the MPs Queue (PFDS [1])
Close (PFDS [0]); // close the input of the Pipeline
Execlp ("ls", "ls", "-1", null); // Replace the image of the sub-process with the process image of the command LS-1

} Else {// parent process

Close (0); // close the stdin Descriptor
Dup2 (PFDS [0], 0); // Changes stdin to the output end of the MPs queue.
Close (PFDS [1]); // close the stdout end of the MPs Queue (PFDS [1])
Execlp ("WC", "WC", "-l", null); // Replace the image of the parent process with the process image of the WC-1 command.

}

}

Return 0;
}

In this program, we need to pay special attention to the fact that our child process redirects its output to the input of the pipeline, and then the parent process redirects its input to the output of the pipeline.
This is a very useful technology in actual application development.
1. Data Structure of file descriptors in the kernel
Before specifying DUP/dup2, I think it is necessary to first understand the form of file descriptors in the kernel.
When a process exists, some files are opened, and some file descriptors are returned.
By default, three file descriptors exist (0, 1, 2). 0 is associated with the standard input of the process,
1. Associated with standard output of a process; 2. Associated with standard error output of a process.
You can view the file descriptor in the/proc/process ID/FD directory. You can clearly explain the problem:
Progress table item

----------------
FD mark file pointer
_____________________

FD 0: | ________ | ____________ | ------------> file table

FD 1: | ________ | ____________ |

FD 2: | ________ | ____________ |

FD 3: | ________ | ____________ |

| ...... |

| _____________________ |

Figure 1
The file table contains: File status mark, current file offset, and V node pointer, which are not discussed in this article.
The key point is that we only need to know that each open file descriptor (FD mark) has its own file table in the progress table.
Object Pointer.
2. DUP/dup2 Functions
The apue and man documents use a simple sentence to express the functions of these two functions: copying an existing file descriptor.
# Include <unistd. h>
Int DUP (INT oldfd );
Int dup2 (INT oldfd, int newfd );
This process is analyzed from figure 1. When the DUP function is called, the kernel creates a new file descriptor in the process.
The descriptor is the minimum value of the currently available file descriptor. This file descriptor points to the file table items owned by oldfd.
Progress table item

----------------

FD mark file pointer

_____________________

FD 0: | ________ | ____________ | ______

FD 1: | ________ | ____________ | ---------------- >||

FD 2: | ________ | ____________ | file table |

FD 3: | ________ | ____________ | ---------------- >|______ |

| ...... |

| _____________________ |

Figure 2:
2. If the value of oldfd is 1 and the minimum value of the current file descriptor is 3, the new descriptor 3 points
File Table items owned by descriptor 1.
The difference between dup2 and DUP is that you can use the newfd parameter to specify the value of the new descriptor. If newfd is enabled
Disable it first. If newfd is equal to oldfd, dup2 returns newfd without disabling it. The new value returned by the dup2 Function
The file descriptor shares the same file table item with the oldfd parameter.
Apue illustrates this problem using another method:
In fact, DUP (oldfd) is called );
Equivalent
Fcntl (oldfd, f_dupfd, 0)
Call dup2 (oldfd, newfd );
Equivalent
Close (oldfd );
Fcntl (oldfd, f_dupfd, newfd );
3. dup2 in CGI
Anyone who has written CGI programs knows that when the browser uses the POST method to submit form data, CGI reads data from the standard
Input stdin and write data to stdout (C language uses the printf function ). According to our normal principle
Solution: printf output should be displayed on the terminal. The original CGI program uses the dup2 function to convert stdout_finleno (this
Macro defined in unitstd. H, is 1) This file descriptor is redirected to the connection socket.
Dup2 (connfd, stdout_fileno);/* the actual situation also involves pipelines, not the focus of this article */
As stated in section 1, the default file descriptor 1 (stdout_fileno) of a process is consistent with the standard output stdout.
Associated. For the kernel, all open files are referenced by file descriptors, and the kernel does not know the stream
Exist (such as stdin, stdout), so the data output by the printf function to stdout is finally written to the file description
Character 1. File descriptors 0, 1, and 2 are associated with standard input, standard output, and standard error output.
It's just shell and many applications, but it's not related to the kernel.
The following flow chart can be used to illustrate the problem: (PS: although it is not a flow chart relationship, it is helpful to understand)
Printf-> stdout-> stdout_fileno (1)-> terminal (TTY)
The final output of printf is to the terminal device. The file descriptor 1 points to the current terminal, which can be understood as follows:
Stdout_fileno = open ("/dev/tty", o_rdwr );
After dup2 is used, stdout_fileno no longer points to the terminal device, but to connfd.
The output is written to connfd. Is it beautiful? :)
4. How to restore stdout_fileno In the Fork sub-process of the CGI program
If you can see this, thank you for your patience. I know that many people may feel a little complicated.
A complex problem is a collection of small problems. So it's okay to figure out every small problem. Section 3
Stdout_fileno is redirected to the connfd socket. Sometimes we may want
And some input and output are inevitable in these scripts. After fork is known,
The child process inherits all the file descriptors of the parent process, so the input and output of these scripts are not as expected.
Output to the terminal device, but associated with connfd, this will obviously disrupt the output of the webpage. So how?
Restore stdout_fileno and terminal Association?
Method 1: Save the original file descriptor before dup2 and restore it.
The code is implemented as follows:
Savefd = DUP (stdout_fileno);/* savefd points to the terminal */
Dup2 (connfd, stdout_fileno);/* stdout_fileno (1) is redirected to connfd */
.../* Handle some things */
Dup2 (savefd, stdout_fileno);/* stdout_fileno (1) restore to savefd */
Unfortunately, the CGI program cannot use this method, because dup2 is not completed in the CGI program, but in
It is not a good idea to modify the web server.
Method 2: trace the source and open the current terminal to restore stdout_fileno.
How is stdout_fileno associated with the terminal when analyzing the flow chart in Section 3? Let's just try again.
The code is implemented as follows:
Ttyfd = open ("/dev/tty", o_rdwr );
Dup2 (ttyfd, stdout_fileno );
Close (ttyfd );
/Dev/tty is the terminal where the program runs, which should be obtained in one way. Practice has proved this method
It is feasible, but I always feel a bit inappropriate. I don't know why, maybe some potential problems haven't appeared yet.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux Files, file descriptors, and DUP () and dup2 ()

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linux Files, file descriptors, and DUP () and dup2 ()

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support