1. Basic knowledge Preparation 1.1. Linux background process
UNIX is a multitasking system that allows multiple users to run several programs at the same time. The Shell's metacharacters &
provides a way to run programs in the background that do not require keyboard input. After the command is entered, followed by &
the character, the command is sent to the Linux background and the terminal can continue to enter the next command.
Like what:
sh a.sh &sh b.sh &sh c.sh &
These three commands will be sent to the Linux background at the same time , to the extent that these three commands are executed concurrently .
1.2. linux file descriptors
The file Descriptor (abbreviated FD) is formally a non-negative integer. In fact, it is an index value that points to the record table in which the kernel opens a file for each process maintained by the process. When a program opens an existing file or creates a new file, the kernel returns a file descriptor to the process. Each UNIX process will have three standard file descriptors to correspond to three different streams:
File Descriptor |
name |
0 |
Standard Input |
1 |
Standard Output |
2 |
Standard Error |
Each file descriptor corresponds to an open file, and different file descriptors can have the same open file, the same file can be opened by a different process, or it can be opened multiple times by the same process.
In /proc/PID/fd
, the PID
file descriptors owned by the process are listed, for example
#!/bin/bash source /etc/profile; # $$ represents the PID of the current process Pid=$$# view the file descriptor of the current process point to ll/proc/ $PID /fdecho "-------------------" ; echo # file descriptor 1 is bound to file Tempfd1 ([-e ./tempfd1] | | touch./TEMPFD1) && exec
1 <>./tempfd1# view the file descriptor of the current process point to ll/ Proc/ $PID /fdecho "-------------------" ; echo ;
[[Email protected]lhost learn_linux]$ sh learn_redirect.sh Total0LRWX------.1Ouyangyewei Ouyangyewei -Jan4 A: - 0/dev/pts/0LRWX------.1Ouyangyewei Ouyangyewei -Jan4 A: - 1/dev/pts/0LRWX------.1Ouyangyewei Ouyangyewei -Jan4 A: - 2/dev/pts/0Lr-x------.1Ouyangyewei Ouyangyewei -Jan4 A: - 255/home/ouyangyewei/workspace/learn_linux/learn_redirect.sh-------------------[[email protected] learn_linux] $ cat Tempfd1 Total0LRWX------.1Ouyangyewei Ouyangyewei -Jan4 A: - 0/dev/pts/0LRWX------.1Ouyangyewei Ouyangyewei -Jan4 A: - 1/HOME/OUYANGYEWEI/WORKSPACE/LEARN_LINUX/TEMPFD1LRWX------.1Ouyangyewei Ouyangyewei -Jan4 A: - 2/dev/pts/0Lr-x------.1Ouyangyewei Ouyangyewei -Jan4 A: - 255/home/ouyangyewei/workspace/learn_linux/learn_redirect.sh-------------------
In the example above, line 12th binds the file descriptor 1 to the file, and tempfile
after that, the file descriptor 1 points to the tempfile
file, and the standard output is redirected to the file tempfile
.
1.3. Linux Pipelines
In Unix or Unix-like operating systems, pipelines are a collection of processes that are linked by standard input and output, so that each process's output is directly input to the next process.
The Linux pipeline consists of two types:
- Anonymous pipeline
- Named pipes
Pipeline has a feature, if there is no data in the pipeline, then the operation of the pipeline data will be stuck, until the pipeline into the data, and then read out before terminating this operation; Similarly, a write pipeline operation without a read pipeline operation will be stuck.
1.3.1. Anonymous Pipelines
In the command line of Unix or Unix-like operating systems, anonymous pipes use the vertical line in ASCII |
as the anonymous pipe character, and the anonymous pipe ends up with two ordinary, anonymous, open file descriptors: A read-only and a write-only end , This allows other processes to connect to the anonymous pipeline.
For example:
cat file | less
To execute the instructions above, the shell creates two processes to execute separately cat
and less
. Shows how the two processes use the pipeline:
It is worth noting that two processes are connected to the pipeline so that the write process connects cat
its standard output (file descriptor fd 1
) to the write side of the pipeline, and the read process less
connects its standard input (file descriptor fd 0
) to the read-in side of the pipeline. In fact, these two processes do not know the existence of pipelines, they simply read the data from the standard file descriptor and write the data. The shell has to do the work involved.
1.3.2. Named Pipes (Fifo,first in first out)
Named Pipes are also called FIFO, semantically speaking, FIFO is actually similar to anonymous pipelines, but it is worth noting that:
- In the file system, the FIFO has a name and is in the form of a device-specific file;
- Any process can share data through FIFO;
- The FIFO data flow will be blocked unless the FIFO has both read and write processes;
- Anonymous pipelines are created automatically by the shell and exist in the kernel, whereas FIFO is created by a program (such as a
mkfifo
command) that exists in the file system;
- The anonymous pipeline is a one-way byte stream, while the FIFO is a bidirectional byte stream;
For example, you can use FIFO to implement single-server, multi-client applications:
With the above knowledge preparation, it is now possible to begin to tellhow the number of processes per concurrent can be controlled when Linux multi-process concurrency occurs.
2. Multi-process concurrency control of Linux
Recently small a needs to produce the 2015 full-year KPI Data report, now small A has written the production script, production script can only produce a specified day of KPI data, assuming that a production script to run 5 minutes, then:
* If the loop sequence is executed, then it takes time: 5 * 365 = 1825 minutes, approximately equal to 6 days
* If it is put into the Linux background concurrent execution, 365 background tasks, the system can not withstand Oh!
Since it is not possible to put 365 tasks into Linux background execution at a time, can it be possible to automatically put n tasks into the background and execute concurrently? Of course it's OK.
#!/bin/bashSource/etc/profile;# -----------------------------Tempfifo=$$.fifo# $$ Indicates the PID of the currently executing fileBegin_date= $ # Start TimeEnd_date= $ # End Timeif[$# -eq 2] Then if["$begin _date"\>"$end _date"] Then Echo "error! $begin _date is greater than $end _date" Exit 1;fiElse Echo "error! Not enough params. " Echo "Sample:sh loop_kpi 2015-12-01 2015-12-07" Exit 2;fi# -----------------------------Trap"exec 1000>&-;exec 1000<&-;exit 0" 2Mkfifo$tempfifoexec +<>$tempfifoRm-rf$tempfifo for((i=1; i<=8; i++)) Do Echo>& + Done while[$begin _date!=$end _date] Do Read-u1000 {Echo $begin _dateHive- FKpi_report.sql--hivevar date=$begin _date Echo>& +} & begin_date= ' date- D "+1 Day $begin _date"+"%y-%m-%d"` DoneWaitEcho "Done!!!!!!!!!!"
- Line 6th to 22nd: for example:
sh loop_kpi_report.sh 2015-01-01 2015-12-01
$1
Represents the first parameter of a script entry, equal to 2015-01-01
$2
Represents the second parameter of a script entry, equal to 2015-12-01
$#
Indicates the number of script parameters, equal to 2
- The 13th row is used to compare the size of the incoming two dates, which
\>
is escaped
- Line 26th: Indicates that when the script is run, if the interrupt command is received, the
Ctrl+C
read and write of file descriptor 1000 is closed and exits normally
exec 1000>&-;
The write that closes the file descriptor 1000
exec 1000<&-;
Indicates that the read of closing the file descriptor 1000
- Trap is the capture interrupt command
- Line 27th to 29th:
- Line 27th, create a pipeline file
- Line 28th, bind the file descriptor 1000 with the FIFO,
<
read the binding, >
write the binding, <>
then identify all operations on the file descriptor 1000 equal to the operation of the pipe file $tempfifo
- Line 29th, there may be a question: Why not use the pipe file directly? In fact, this is not superfluous, an important feature of the pipeline is that the read and write must exist simultaneously, that one operation is missing, the other operation is stuck, and the binding file descriptor (read, write binding) on line 28th solves the problem.
- Line 31st to 34th: write to file descriptor 1000. By looping through 8 empty rows, this 8 is the number of threads we want to define for the background concurrency. Why write blank lines instead of writing other characters? Because the pipe file is read, it is in the behavior unit
- Line 37th to 42nd:
- The 37th line,
read -u1000
the function is to read a line in the pipeline, here is to read a blank line, each read the pipeline will reduce a blank line
- Line 39th to 41st, notice the end of line 42nd?
&
It indicates that the process is placed in the Linux background to execute
- Line 41st, after performing the background task, writes a blank line to the file descriptor 1000. This is the key, because
read -u1000
each operation will cause the pipeline to reduce a blank line, when the Linux background into 8 tasks, because the file descriptor 1000 has no readable blank line, will cause the read -u1000
wait.
3. References
- Unix Power Tools
- Unix System Programming Manual
- UNIX Pipeline: Https://zh.wikipedia.org/wiki/%E7%AE%A1%E9%81%93_ (Unix)
Linux Shell Multi-process concurrency and concurrency number control