Recently in Reading Apue, the side to see also have to do to have effect. Just Linux under a lot of commands are open source, you can directly see the source code. The GNU coreutils is a good choice. SOURCE package has our most commonly used LS, cat and other command source code, each command is relatively short, suitable for reading. Here's a little note I read about cat commands.
Download the source code here. In the root directory of the source code./configure; Make can be compiled directly, and make can be compiled after modification. Command source in the src/directory, there are some auxiliary functions and constants defined in the lib/directory.
1. Command-line parsing
Basically all Linux commands use the GETOPT function to parse the command-line arguments, and cat is no exception, and cat uses the Getopt_long function to parse the long parameters and use some bool variables to store the option values. There's nothing to say.
2. Detect if the input and output files are the same
For example, in the case of cat test.txt > Test.txt, the input and output files are the same, which is not legal.
The input stream for cat is given by the command line, which defaults to standard input (stdin) and the output stream is standard output (stdout). Therefore, the method of string comparison is not able to determine whether the input output is the same. Also for some special files, such as TTY, we are allowed to have the same input and output as Cat/dev/tty >/dev/tty is legal. Cat takes the same approach as regular file, detecting the device number and I-node. The detection of non-regular file is ignored. The code for this section is as follows:
Gets the file attributes.
if (Fstat (Stdout_fileno, &stat_buf) < 0) error (Exit_failure, errno, _ ("standard output"));
Extract the file device number and I-node. For non-regular types of files, the detection is ignored.
if (S_isreg (Stat_buf.st_mode)) { Out_dev = Stat_buf.st_dev; Out_ino = Stat_buf.st_ino; } else { check_redirection = false; }
For inspection. Check_redirection is not checked for false.
if (Fstat (Input_desc, &stat_buf) < 0) <span style= "White-space:pre" ></span>//input_desc as input file descriptor { error (0, errno, "%s", infile); OK = false; Goto Contin; }
if (check_redirection && Stat_buf.st_dev = = Out_dev && Stat_buf.st_ino = = Out_ino && ( Input_desc! = Stdin_fileno)) { error (0, 0, _ ("%s:input file is output file"), infile); OK = false; Goto Contin; }
Tips: '-' represents a standard input, such as a cat-command that actually reads bytes from a standard input. So cat can work with pipe commands like this: echo ABCD | Cat File1-file2. Entering only the cat command defaults to reading bytes from the standard input.
3. Number of bytes read and write at one time
Cat is implemented on the basis of the read, write function, and the number of bytes read and written at a time also affects the performance of the program.
The insize and outsize variables represent the number of bytes read and written, respectively.
Insize = Io_blksize (STAT_BUF);
enum {io_bufsize = 128*1024};static inline size_tio_blksize (struct stat sb) { return MAX (Io_bufsize, St_blksize (SB ); <span style= "White-space:pre" ></span>/* st_blksize () The value of the macro depends on the system, defined in Lib/stat-size.h */}
The setting of the outsize value is similar to Insize.
4. Simple_cat
such as the Cat command does not use any format parameters, such as-V,-T. Then call Simple_cat to complete the operation, the advantage of Simple_cat is that it is fast because it is possible to read and write files in binary mode on some systems. Refer to Man 3 freopen.
if (! (number | | show_ends | | squeeze_blank)) { File_open_mode |= o_binary;<span style= "White-space:pre" ></span>/* under Linux o_binary 0, without any effect, But some systems are the binary form of open files * /if (o_binary &&! isatty (Stdout_fileno)) <span style= "White-space:pre" ></ span>/* calls Freopen, contains error handling, changes the output stream mode to "WB" * /Xfreopen (NULL, "WB", stdout); }
Without any format arguments, the simple_cat is called
if (! (number | | show_ends | | show_nonprinting | | Show_tabs | | Squeeze_blank) { insize = MAX (insize, outsize); <span style= "White-space:pre" ></span>/* Xzz allocating memory , Failure calls Xmalloc-die () to terminate the program and report the error */ inbuf = Xmalloc (insize + page_size-1); OK &= simple_cat (<strong>ptr_align</strong> (Inbuf, page_size), insize); }
Ptr_align is an auxiliary function. Because the IO operation reads one page at a time, Ptr_align is an integer multiple of the starting address of the buffer array to increase IO efficiency.
static inline void *ptr_align (void const *ptr, size_t alignment) { char const *P0 = ptr; char Const *P1 = p0 + alignment-1; return (void *) (P1-(size_t) P1% alignment);}
The Simple_cat function is simple
Static Boolsimple_cat (/* Pointer to the buffer, used by reads and writes. */char *buf,/* Number of characters preferably read or written by each read and write call. */size_t BufSize) {/* Actual number of characters read, and therefore written. */size_t N_read; /* Loop until the end of the file. */while (true) {/* Read a block of input. *//* Normal read may be interrupted by signal */N_read = Safe_read (Input_desc, buf, bufsize); if (N_read = = Safe_read_error) {ERROR (0, errno, "%s", infile); return false; }/* End of this file? */if (N_read = = 0) return true; /* Write this block out. */{/* the following is OK, since we know 0 < N_read. */size_t n = n_read;/* full_write and safe_read are called SAFE_SW, implemented with macros, * Viewing safe_write.c can find the key to its implementation. */if (Full_write (Stdout_fileno, BUF, n)! = N) error (Exit_failure, errno, _ ("Write error")); } }}
5. SAFE_RW, FULL_RW function
The read and write functions may be interrupted by signal before reading and writing the first character, SAFE_RW can resume interrupted read and write procedures. This function is very tricky, its name SAFE_RW and RW are actually macro definitions, conditional compilation can compile this function into Safe_read, safe_write two functions.
<strong >size_t </strong>/* Original read () function return value is ssize_t */safe_rw (int fd, void const *BUF, size_t count) {/* work around a Bug in Tru64 5.1. Attempting to read more than Int_max bytes fails with errno = = EINVAL. See
Read, write read and write process may be interrupted by signal, FULL_RW can resume the read and write process until the specified number of bytes read or write to reach the end of the file (EOF), or read and write errors. Returns the number of bytes currently read and written. The FULL_RW () function name is also defined by the macro, which actually implements the Full_read () Full_write ()./* Write ( Read) COUNT bytes at BUF to (from) descriptor FD, retrying if interrupted or if a partial write (read) occurs. Return the number of bytes transferred. When writing, set errno if fewer than COUNT bytes is written. When reading, if fewer than COUNT bytes is read, you must examine errno to distinguish failure from EOF (errno = = 0). */SIZE_TFULL_RW (int fd, const void *buf, size_t count) {size_t total = 0; const char *PTR = (const char *) BUF; while (Count > 0) {size_t N_RW = SAFE_RW (FD, PTR, count); if (N_RW = = (size_t)-1) <span style= "White-space:pre" ></span>/* error */break; if (N_RW = = 0) <span style= "White-space:pre" ></span>/* reach EOF */{errno = ZERO_BYTE_TRANSFE R_errno; Break } Total + = N_RW; PTR + = N_RW; Count-= N_RW; } return total;}
Tips: See the SAFE_READ.C and safe_write.c files in the Lib directory to see how this function is expanded into two different functions.
6. Cat function, processing formatted output
Simple_cat just inputs the output intact, without any processing, and all content related to the formatted output is placed in the cat function.
The implementation of the cat function contains many tricks. For example, use a Sentinel ' \ n ' to mark the end of the input buffer. In addition, a character array is used to count the number of rows, so that systems that do not support 64-bit integers can also use a large range of numbers.
The following is the code for this line counter.
/* Position in ' line_buf ' where printing starts. This would not be unless the number of lines is larger than 999999. */static char *line_num_print = line_buf + line_counter_buf_len-8;/* Position of the first digit in ' Line_buf '. */static char *line_num_start = line_buf + line_counter_buf_len-3;/* Position of the last digit in ' line_buf '. */static char *line_num_end = line_buf + line_counter_buf_len-3;
Static Voidnext_line_num (void) { char *ENDP = line_num_end; Do { if ((*ENDP) + < ' 9 ') return; *endp--= ' 0 '; } while (ENDP >= line_num_start); if (Line_num_start > Line_buf) *--line_num_start = ' 1 '; else *line_buf = ' > '; if (Line_num_start < line_num_print) line_num_print--;}
The key to understanding this function is to understand the role of newlines, cat format output main operation to determine the line and continuous blank lines, newlines this variable is marked by the number of empty lines, a value of 0 means that at this time the Inbuf reading position at the beginning of a line, 1 means there is a blank line, 1 Indicates that a row has just been parsed and is ready to go to the next line, and you can see that the two break statements of the last while (true) of the cat function set newlines to-1.
The process of cat formatted output is essentially the process of scanning the input buffer array one by one and storing the converted characters in the output buffer array during the scan.
Analysis of Linux Cat command source code