Three major text processing tools: grep, sed, awk, and grepawk
1. Use grep to search for text in a file
Grep can accept regular expressions to generate output in various formats. In addition, it has a lot of interesting options.
1. Search for text lines that contain a specific mode:
2. Read from stdin:
3. A single grep command can search multiple files:
4. The -- color option highlights the matched words in the output line:
5. Use the Regular Expression in grep (grep-E or egrep)
6. output only the text matching in the file. You can use-o:
7. to display all rows except matching rows, use the-v option:
8. The Statistical File or text contains the number of lines matching strings.-c (multiple matches appear in a single line, only once ):
9. Print the line number containing the matching string,-n:
10. Search for multiple files and find the file where the matching text is located.-l (-L is opposite to it ):
11. recursive search file,-r (-R serves the same purpose ):
12. Ignore the case in the style.-I:
13. Use grep to match multiple styles.-e:
14. Specify (-- include) or exclude (-- exclude) files in grep search:
Recursively search all. c and. cpp files in the directory
Exclude all README files from the search
To exclude directories, use the -- exclude-dir option.
15. grep silent output,-q:
NO content is output. If the matching succeeds, 0 is returned. If the matching fails, a non-0 value is returned.
16. Print the rows before or after the matching text:
[Root @ localhost tmp] # seq 1012345678910 [root @ localhost tmp] # seq 10 | grep 5-A 3 # print the matched number of specified rows 5678 [root @ localhost tmp] # seq 10 | grep 5-B 3 # print the specified number of rows before matching 2345 [root @ localhost tmp] # seq 10 | grep 5-C 3 # print the number of rows before and after matching 2345678
Ii. Use sed for text replacement
Sed is the abbreviation of stream editor. Sed is used to replace text.
[Root @ cairui ~] # Sed -- helpUsage: sed [OPTION]... {script-only-if-no-other-script} [input-file]... -n, -- quiet, -- silent suppress automatic printing of pattern space # cancel the automatic print mode space-e script, -- expression = script add the script to the commands to be executed # add "script" to the program running list-f script-file, -- file = script-file add the contents of script-file to the commands to be executed # add the "script file" to the program running list -- follow-symlinks follow symlinks when processing in place; hard links will still be broken. -I [SUFFIX], -- in-place [= SUFFIX] edit files in place (makes backup if extension supplied ). the default operation mode is to break symbolic and hard links. this can be changed with -- follow-symlinks and -- copy. -c, -- copy use copy instead of rename when shuffling files in-I mode. while this will avoid breaking links (symbolic or hard), the resulting editing operation is not atomic. this is rarely the desired mode; -- follow-symlinks is usually enough, and it is both faster and more secure. -l N, -- line-length = N specify the desired line-wrap length for the 'l' command -- posix disable all GNU extensions. -r, -- regexp-extended use extended regular expressions in the script. -s, -- separate consider files as separate rather than as a single continuous long stream. -u, -- unbuffered load minimal amounts of data from the input files and flush the output buffers more often -- help display this help and exit -- version output version information and exitIf no-e, -- expression,-f, or -- file option is given, then the firstnon-option argument is taken as the sed script to interpret. allremaining arguments are names of input files; if no input files arespecified, then the standard input is read. GNU sed home page: 1. sed can replace the string of the given text:
Read the input from stdin without affecting the original content.
2. By default, the sed command prints the replaced text. If you want to add the-I command together with the original text,-I:
3. Previously, sed replaces the first matched content. to replace all the content, add g at the end:
Replace from nth match
The '/' in sed is a separator, which can be replaced by any other symbol.
4. Remove blank rows
3. Use awk for advanced Text Processing
Awk is a tool designed for data flow. It operates on columns and rows. Awk has many built-in functions, such as arrays and functions. It has many similarities with C. The biggest advantage of awk is flexibility.
[root@cairui ~]# awk --helpUsage: awk [POSIX or GNU style options] -f progfile [--] file ...Usage: awk [POSIX or GNU style options] [--] 'program' file ...POSIX options: GNU long options: -f progfile --file=progfile -F fs --field-separator=fs -v var=val --assign=var=val -m[fr] val -O --optimize -W compat --compat -W copyleft --copyleft -W copyright --copyright -W dump-variables[=file] --dump-variables[=file] -W exec=file --exec=file -W gen-po --gen-po -W help --help -W lint[=fatal] --lint[=fatal] -W lint-old --lint-old -W non-decimal-data --non-decimal-data -W profile[=file] --profile[=file] -W posix --posix -W re-interval --re-interval -W source=program-text --source=program-text -W traditional --traditional -W usage --usage -W use-lc-numeric --use-lc-numeric -W version --versionTo report bugs, see node `Bugs' in `gawk.info', which issection `Reporting Problems and Bugs' in the printed version.gawk is a pattern scanning and processing language.By default it reads standard input and writes standard output.Examples: gawk '{ sum += $1 }; END { print sum }' file gawk -F: '{ print $1 }' /etc/passwd
The structure of the awk script is basically as follows:
Awk 'in in {print "start"} pattern {commands} END {print "end"} 'file
The awk script is usually composed of three parts. BEGIN, END, and common statement blocks with pattern matching options. The three parts are optional.
1. Working Principle
(1) execute the statements in the in {commands} statement block.
(2) read a row from a file or stdin and execute pattern {commands }. Repeat this process until all files are read.
(3) When reading to the END of the input stream, execute the END {commands} statement block.
The most important part is the general commands in the pattern statement block. This statement block is also optional. If this statement block is not provided, {print} is executed by default to print each row read. Awk will execute this statement block for each row. This is like a while loop used to read rows. A corresponding statement is provided in the loop.