Pipeline and xargs commands

Source: Internet
Author: User

MPs queue and Xargs Command1. stdin, stdout, and stderr can use stdin, stdout, and stderr file pointers to access standard input, standard output, and error files for any newly generated process. Their types are all file * and belong to the C Runtime Library type. The kernel uses file descriptors to represent files. Stdin_fileno, stdout_fileno, and stderr_fileno are defined as 0, 1, and 2 respectively. Dup2 (srcfd, destfd) is used to copy the srcfd file descriptor and make destfd represent the copied file descriptor. In this way, srcfd points to a common file table entry and has a common inode. When calling this function, destfd can be either stdin_fileno, stdout_fileno, or stderr_fileno in addition to common file descriptors. The pipeline command in place provides good support. Note: When dup2 is called, if destfd is already a valid file descriptor, destfd will be disabled first (call the close function ). 2. The forkfork function can generate sub-processes and share all the file descriptors, Data, heap, and stack of the parent process. Child is actually a part of the parent clone. In fact, Linux uses the fork implemented by clone. Combine fork, dup2, and pipe to implement the pipelines supported by shell. |. [1] The parent process first calls pipe and returns fd0 (used to read data from the pipeline). fd1 (used to write data like the pipeline) [2] the parent process calls fork to generate a child process, at this time, the sub-process also shares the fd0 of the parent process, and the fd1 [3] sub-process calls dup2 (fd0, stdin_fileno), which causes the sub-process to read data from stdin, actually, the data is read from the pipeline. [4] The sub-process calls the exec series functions, execute the corresponding program [5] The parent process writes data to the pipeline through fd1 [6] The child process reads the data written by the parent process to the pipeline through stdin. In fact, this is the basic process of shell execution pipeline. 3, | generally, the console program reads parameters or data to be processed from stdin and outputs the results to stdout. The UNIX philosophy is: [1] Rule of modularity: Write simple parts connected by clean interfaces [2] Rule of composition: design programs to be connected to other programs [3] Rule of clarity: clarity is better than cleVerness [4] Rule of Simplicity: Design for simplicity; add complexity only where you must [5] rule of transparency: Design for visibility to make inspection and debugging easier [6] Rule of Robustness: Robustness is the child of transparency and simplicity [7] Rule of Least Surprise: In interface design, always do the least surprising thing [8] Rule of repair: When you must fail, fail noisily and as soon as possible [9] Rule of Economy: programmer time is expensive; conserve it in preference to machine time [10] Rule of generation: Avoid hand-hacking; write programs to write programs when you can [1 1] Rule of representation: use smart data so program logic can be stupid and robust [12] Rule of separation: separate policy from mechanic; Separate interfaces from engines [13] Rule of optimization: prototype before polishing. get it working before you optimize it [14] Rule of diversity: distrust all claims for "One True Way" [15] Rule of extensibility: design for the future, because it will be here Based on these philosophy, sooner than you think has resulted in many small tools in Unix systems. Each tool has a single function and can be combined to complete complex functions. Such as find, grep, awk, xargs, and so on. These functions are combined through pipelines, that is, the output of the previous program, as the input of the next program. Let's take a look at grep's command line parameter: Usage: grep [Option]... pattern [file]... search for Pattern in each file or standard input. example: grep-I 'Hello world' menu. h main. c we can see that grep can search strings from multiple files, and can also read and search strings from stdin. In the above example, search for Hello world from menu. h main. C. If you enter grep-I 'hello' and press enter, grep waits for the user to enter a row of data and searches for "hello" in the row of data. If yes, the string is output. $ Grep-I 'hel' abcdefghi hellohi hellostop ^ d $ the blue character is the character entered by the user. The red one is the character output after grep finds the matched string. We can guess the implementation of grep. If there is a file name, or multiple file names, or paths at the end of the command line, the content will be read from these files and matched with RegEx. Otherwise, the data will be read from stdin and matched with RegEx. Contact the pipe, fork, and dup2 functions mentioned above. We can see that this is the basic execution process of shell. It should be noted that grep must support reading data from stdin, otherwise the pipeline will not be able to implement it. Of course, programs that want to support pipeline operations must follow this rule. 4. If xargs has such a requirement, we need to search for the string "hello" from the entire file system. We wrote the following shell: $ grep-I 'hello'/*, which causes too many command line parameters. Each system has a limit on the size of the parameter list. For example, arg_max is generally defined as at least 4096 bytes. If arg_max is exceeded, a shell error occurs: argument list too lang. To avoid this problem, you can use the xargs command. The format is xargs [opt] [Command [initial-arguments]. Opt is the command line parameter of xargs. The function is to build and execute command lines from standard input. It reads strings separated by spaces from stdin (assuming arg0, arg1 ,... And Run Command [initial-arguments] arg0 arg1... If there are too many parameters, xargs ensures that the parameter size does not exceed the size of arg_max bytes limited by the system and runs the command [initial-arguments] command one or more times. For example, run the following command: $ find/-name '*. h' | xargs grep-I 'stdin '| if less is executed twice, the first time is grep-I 'stdin' a1.h a2.h... A3000.h | less is grep-I 'stdin 'a3001.h a3002.h... A4000.h | the actual execution of the less command is: (speculative) Shell executes the find, xargs, and less programs. Xargs executes the grep program twice in sequence. It is common for xargs to read data from the find result without additional explanation. Xargs and less data transmission seem a little troublesome, but it is actually quite simple. Xargs reads data from the MPs queue from stdin and splits the data according to arg_max. Execute fork and execv ("grep") once or multiple times. Because grep uses a normal printf to output the result, and such a result is used as a less input. Although xargs and grep have a parent-child relationship, their stdout is the same stdout. For less programs, there is no difference between grep output and xargs output. Note: [1] XXX | grep-I 'hello' [2] XXX | xargs grep-I 'hello' [1, grep reads the output result of xxxx through the pipeline and searches for "hello" in the result. In the case of [2], xargs reads the output result of xxxx through the pipeline and uses the result as the final file parameter of grep, and grep-I 'hello' are combined into a complete command (such as grep-I 'hello' stdio. h stdlib. h. Grep searches for hello from the stdio. h stdlib. h file. The two process different MPs queue outputs. The former is grep, which directly reads MPs queue data from stdin and searches for the data. The latter is xargs (whose command line option is grep-I 'hello'), which directly reads pipeline data from stdin and combines it with grep-I 'Hello, then, execute the command by calling exec. This difference is reflected in the difference between grep and xargs in processing the data in the MPs queue and the mechanism of the MPs queue.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.