Introduction: Pipelines are the oldest IPC method on Unix systems, and pipelines provide an elegant solution: given the two processes that run different programs, how can the output of one process in the shell be used as input to another process? Pipelines can be used to 相关(一个共同的祖先进程创建管道)
pass data between processes. FIFO is a variant of the pipeline concept, and one important difference between them is the communication that FIFO can use 任意进程间
.
Overview
Each shell user is familiar with using pipelines in commands, for example, to count the number of files in a directory:
-l
explanation : In order to execute the above command, the shell created two processes to execute the LS and WC separately (by using fork () and exec (). As shown in the following:
Characteristics of the pipe
-
A pipeline is a stream of bytes (no bounds, order)
means that there is no notion of a message or message boundary when using a pipeline, and a process that reads data from a pipeline can read chunks of any size. Regardless of the size of the data block that the write process writes to the pipeline. In addition, the data passed through the pipeline is sequential, and the byte order that is read from the pipeline is exactly the same as the order in which they were written to the pipeline, and Lseek () is not used in the pipeline to randomly access the data.
If you need to implement the delivery of discrete messages in a pipeline, you must do so in your application, but for such requirements it is best to use other IPC mechanisms, such as Message Queuing, datagram sockets.
-
reads data from the pipeline (the read-empty pipeline will block and read 0 is off)
attempts to read data from a pipe that is currently empty will be blocked
Until at least one of the bytes is written to the pipeline. If the write end of the pipeline is closed, the process that reads the data from the pipeline will see the end of the file after reading all the data remaining in the pipeline (that is, read () returns 0). The
-
pipeline is one-way
The data in the pipeline is passed in a unidirectional direction, one end of the pipe is used for writing, and the other end is for reading.
-
to ensure that operations that write no more than Pipe_buf bytes are atomic
If multiple processes write to the same pipeline, Then, if the amount of data written by each process does not exceed pipe_buf bytes at one time, then you can ensure that the data being written does not intersect with each other . SUSv3 requires pipe_buf at least _POSIX_PIPE_BUF (512), different implementations of Unix pipe_buf different, on Linux, the PIPE_BUF value is 4096 .
-
The capacity of the pipeline is limited the
pipeline is actually a buffer that is maintained in the kernel, which has limited storage capacity. Once the pipeline is filled, subsequent write operations to the pipeline are blocked until the reader removes some data from the pipeline.
SUSv3 does not specify the storage capacity of the pipeline, from the Linux2.6.11, the storage capacity of the pipeline is 65536 bytes (64KB), other UNIX implementations of the pipeline storage capacity may be different. In general, an application does not need to know the actual storage capacity of the pipeline, and if it is necessary to prevent the writer process from blocking, the process of reading the data in the pipeline should be designed to read the data from the pipeline as quickly as possible.
The reason for using larger buffers in the kernel for pipelines is: efficiency. Each time a writer fills a pipeline, the kernel must perform a context switch to allow the reader to be dispatched to consume some of the data in the pipeline. Using a larger buffer means fewer context switches to perform.
You can change the storage capacity of a pipeline from Linux2.6.35. Linux-specific fcntl(fd, F_SETPIPE_SZ, size)
calls modify the storage capacity of the pipeline referenced by FD to at least size bytes. A non-privileged process can modify the storage capacity of a pipeline to any value within the system's page size to the /proc/sys/fs/pipe-max-size
value specified in. pipe-max-size
The default value is 1048576
byte (1MB). fcntl(fd, F_GETPIPE_SZ)
the call returns the actual size allocated for the pipeline.
Use of pipelines
#include <unistd.h>int pipe(int filedes[2]);
A successful pipe()
call returns two open file descriptors in the array filedes: one that represents the read side of the pipeline (Filedes[0]) and another that represents the write end of the pipeline (Filedes[1]). As with all file descriptors, you can use read()
and write()
system calls to perform I/O on a pipe, and calls on the pipeline read()
read the amount of data that is smaller between the number of bytes requested and the number of bytes currently present in the pipeline, but block when the pipeline is empty.
ioctl(fd, FIONREAD, &cnt)
The call returns the number of unread bytes in the pipeline or FIFO referenced by the file descriptor FD. Some other implementations also provide this feature, but SUSv3 does not prescribe it.
Typically, you use a pipeline to communicate with two process processes, in order to allow two processes to connect through a pipeline, which pipe()
can be called after the call is complete fork()
. fork()
during the period, the child process inherits the file descriptor of the parent process. Although both parent and child processes can read and write data from the pipeline, this practice is uncommon, so that after the call, one of the fork()
processes should immediately close the write-side descriptor of the pipe, and another process should close the read-side descriptor.
intfiledes[2];if(pipe (filedes) = =-1) Errexit ("Pipe");Switch(Fork ()) { Case-1: Errexit ("fork"); Case 0:/ * Child * / //Close unused write end if(Close (filedes[1]) == -1) Errexit ("Close");// child now reads from pipe Break;default:/ * Parent * / //Close unused read End if(Close (filedes[0]) == -1) Errexit ("Close");//Parent now writes to pipe Break; }
If you need two-way communication, you can use a simpler approach: Create two pipelines and use one of the two directions in which data is sent between two processes. (If you use this technique, you need to consider 死锁
the issue, because if two processes are trying to read data from an empty pipe or try to write data to a full pipeline, a deadlock can occur.) )
- Starting with the 2.6.27 kernel, Linux supports a completely new non-standard system call
pipe2()
, which performs the same tasks as the one pipe()
, but supports the additional parameters flags.
- Pipelines can only be used for communication between related processes, with the exception of passing the file descriptor of the pipeline to a non-related process using the UNIX domain socket.
Why would you close a file descriptor that is not used by the pipeline?
A process that reads data from a pipeline closes the write descriptor of the pipe it holds, so that when the other process finishes outputting and closes its write descriptor, the reader will be able to see the end of the file. If the read process does not close the write side of the pipeline, the reader will not see the end of the file after other processes have closed the write descriptor, even if it has read all the data in the pipeline. This read()
will block to wait for the data because the kernel knows that at least one of the pipeline's write descriptors is open (the read process itself opens this descriptor).
The write process closes the read descriptor of the pipe it holds for a different reason. When a process attempts to write data to a pipeline but no process has an open read descriptor for that pipe, the kernel sends a signal to the write process, which, SIGPIPE
by default, kills a process, but the process can capture or ignore the signal, which causes write()
the The operation failed because of an EPIPE
error, receiving a SIGPIPE
signal or getting an EPIPE
error is useful for marking the state of the pipe. If the write process does not close the read side of the pipeline, the write process can still write data to the pipeline even after the other process has closed the read side of the pipeline, and the final write process fills the data with the entire pipeline, and subsequent write requests are permanently blocked.
The last reason to turn off unused file descriptors is that all unread data in the pipeline is lost when all processes referencing one of the pipelines are closed before the pipeline is destroyed and the resources consumed by the pipeline are freed for reuse by other processes.
Example: Use pipeline communication between parent and child processes.
Https://github.com/gerryyang/TLPI/blob/master/src/pipes/simple_pipe.c
FIFO (Named pipes)
FIFO is similar to pipelines, and the biggest difference is that the FIFO has a name in the file system and is opened in the same way as opening a normal file, allowing the FIFO to be used for communication between non-related processes.
# 使用mkfifo命令可以在shell中创建一个fifo$ mkfifo [-m mode] pathname
mkfifo()
function to create a new FIFO named pathname. Most UNIX implementations provide a mkfifo()
library function that is built on mknod()
top of it. Once a FIFO is created, any process can open it, as long as it can be detected by regular file permissions.
#include <sys/stat.h>int mkfifo(constchar *pathname, mode_t mode);
On most UNIX implementations (including Linux), when a FIFO is opened, O_RDWR
the blocking behavior of the open FIFO can be bypassed by specifying a token, which open()
returns immediately, but the data cannot be read and written to the FIFO using the returned file descriptor. This approach destroys the FIFO I/O model, and SUSv3 explicitly indicates that the O_RDWR
result of opening a FIFO with a tag is unknown, so developers should not use this technology for portability reasons. For those that need to avoid blocking requirements when opening FIFO, open()
the O_NONBLOCK
markup provides a standardized way to accomplish this task.
Using pipelines to implement a client/server application
All clients use a server FIFO to send requests to the server, the header file defines the well-known name (/TMP/SEQNUM_SV), and the server's FIFO will use this name. This name is fixed, so all clients know how to contact the server. (Creating a file in a publicly writable directory like/TMP can lead to a variety of security implications, so programs in the app should not use this directory)
It is not possible to send a response to all clients using a single FIFO, because multiple clients compete with each other when reading data from the FIFO, so that individual clients can read the response messages from other clients. Therefore, each client needs to create a unique FIFO that the server uses to deliver the response to the client. and the server needs to know how to find the FIFO for each client.
One way to solve this problem is to have the client generate its own FIFO pathname, and then pass the pathname as part of the request message to the server. Alternatively, the client and the server can contract a rule that builds the client FIFO pathname, and the client can then send the relevant information needed to build its pathname to the server as part of the request.
Remember that the data in the pipeline and FIFO is byte stream, and there is no boundary between messages. This means that when multiple messages are delivered to a process, both the sender and the receiver must contract a rule to separate the message. This can be used in a number of ways:
Each message ends with a split character such as a line feed.
Feature: The process that reads the message must be parsed byte by bit until the delimiter is found when the data is scanned from the FIFO.
Contains a fixed-size header in each message that contains a field representing the length of the message that specifies the length of the remaining part of the message. This allows the read process to first read the header from the FIFO and then use the Length field in the header to determine the number of bytes in the remaining portion of the message that needs to be read.
Features: This method is able to efficiently read messages of any size
Use a fixed-length message and have the server always read this fixed-size message.
Features: The advantage of this approach is simplicity, but it sets an upper limit on the size of the message, which means that some of the channel capacity is wasted (because shorter messages need to be populated for fixed lengths), and if one of the clients accidentally or intentionally sends a message that is not of the wrong length, Then all subsequent messages will be inconsistent, and in this case the server is hard to recover.
Note that regardless of which of these three technologies are used, the total length of each message must be less than PIPE_BUF
bytes to prevent the kernel from splitting the message from causing a message to be sent to the other writer.
Todo
Non-blocking I/O
When a process opens one end of a FIFO, if the other end of the FIFO is not open, the process is blocked, but sometimes blocking is not the desired behavior, which can be achieved by specifying a tag at the time of invocation open()
O_NONBLOCK
.
fd = open("filepath", O_RDONLY | O_NONBLOCK);if (fd == -1) errExit("open");
If the other end of the FIFO is already open, there is no O_NONBLOCK
effect on the open()
call. The token will only work if the other end of the FIFO has not been opened, O_NONBLOCK
and the specific effect depends on whether the open FIFO is used for reading or for writing:
- If it is to be read, the call succeeds immediately, regardless of whether the FIFO's write side is currently open
open()
.
- If it is for writing, and the other end of the FIFO has not been opened to read the data, the
open()
call fails and the errno is set to ENXIO
.
There are two purposes for using a token when opening a FIFO O_NONBLOCK
:
1. It allows a single process to open both ends of a FIFO.
2. It prevents deadlocks from opening two FIFO processes.
Semantics for the FIFO upgrade open()
Semantics of Read () and write () in pipelines and FIFO
Reads the semantics of n bytes from a pipe or FIFO that contains a P-byte
Write n-byte semantics to a pipe or FIFO
Linux IPC Pipeline and FIFO