4.1 getting started with regular expressions
1. Regular Expressions are the key to Text Processing Based on style matching.
2. Regular expressions are used in most text processing tools.
3. basic components of a regular expression
Regular Expression description
^ Starting line mark
$ Mark the end of a row
. Match any character
[] Match any character contained in []
[^] Match any character []
[-] Match any character in the specified range []
? Match the previous item once or 0 times
+ Match the previous item once or multiple times
* Match the previous item 0 or multiple times
{N} matches the previous item n times.
{N,} matches the previous item at least n times
{N, m} matches the previous item at least n times at most m times
4. to match an IP address, use the following regular expression.
[0-9] {1, 3} \. [0-9] {1, 3} \. [0-9] {1, 3} \. [0-9] {1, 3}
We know that the IP address is usually a four integer separated by a dot. The value of each integer is 0-255.
5. Put a "\" in front of the character, which is called character conversion.
4.2 search text in a file with grep
1 grep is a Master Tool for text search in UNIX. It can accept regular expressions and wildcards.
2. Search for a word in the file: grep word file
The 3 grep command can also search for multiple files: grep word file1 file2 file3...
4. To print all rows other than word, use grep-v word file.
5. Number of lines in the statistics file or text that contain matching strings: grep-c word file
6. print the number of lines that contain matching strings: grep-n word file.
7 if you need to perform recursive search for text in a multilevel directory, you can use: grep word path-R-n
8 ignore the case in the style: grep-I word file
9 print the rows before or after the matching File
To print three rows after matching A result: grep word-A 3 file
To print the three rows before matching a result: grep word-B 3 file
To print the three rows before and after matching a result: grep word-C 3 file
4.3 split files by column with cut
1 cut is a small tool that helps us split text by column. It can also be used to specify the delimiters
2 to extract the first field or column, you can use the following method:
Cut-f field_list file // field_list is the column to be displayed
3 cut-f 2, 3 file // This command will display 2nd, 3rd Columns
4. We can also use the-complement option to perform the complement operation on the extracted fields. If there are multiple fields and you want to print all columns except 3rd columns, you can use: cut-f3 -- complement file
5. Use the-d option: cut-f2-d ";" file // command to display the 2nd columns. Use ";" as the delimiter.
6 if you want to split the n to m characters, you can use: cut-cn-m file
4.4 sed getting started
1 sed is the abbreviation of stream editor. It is a very important tool in text processing. It can be used perfectly with regular expressions and has different functions.
2 sed can be replaced with a string in the text, which can be matched using a regular expression
Sed's/pattern/replace_string/'file
3 if you use the-I option, you can apply the replacement result to the original file: sed-I's/pattern/replace_string/'file
4 Use sed to remove blank lines: sed '/^ $/d' file
5. This method can be replaced by multiple sed commands in the pipeline group in the following way: sed expression | sed expression is equivalent to sed 'Expression; expression'
4.5 awk entry
1 The structure of the awk script is basically as follows: awk 'in in {statement} pattern {statement} END {statement }'
2 An awk script usually consists of three parts: the BEGIN statement block, the END statement block, and the statement block that can use pattern matching. These three parts are optional, and none of them can be included in the script. The script is usually included in single quotation marks or double quotation marks.
3 How the awk command works
(1) execute the statements in the BEGIN {statement} statement Block
(2) read a row from a file or stdin, and then execute pattern {statement}. repeat this process until all the files are read.
(3) When reading to the END of the input stream, execute the END {statement} statement Block
4. The BEGIN statement block is executed before the awk starts to read rows from the input stream. This is an optional statement block, such as variable initialization and output table header statement.
5. The END statement block is similar to the BEGIN block. The END statement is executed after awk reads data from the input stream. information such as printing the results of all rows is generally placed in the END statement block for execution.
6. The final part is part of the pattern statement block. This statement block is also optional. If it is not provided, print is executed by default, that is, print each read row.
7. When pring without parameters is used, the current row is displayed. For print, remember that when pring parameters are separated by commas, parameter printing uses spaces as the delimiters. In awk print, double quotation marks are used as Concatenation Operators.
8. Some special variables of awk
NR: indicates the number of records, which is equivalent to the current row number during execution.
NF: number of fields, which correspond to the number of fields in the current row during execution
$0: The variable contains the text of the current row during execution.
$1: The variable contains the text content of the first field.
$2: The variable contains the text of the second field.
9 normally, grep reads all rows of a file by default. If you only want to read a row, you can use getline
10 awk has many built-in functions
Length (string): returns the length of the string.
Index (string, search_string): returns the position where search_string appears in the string.
Split (string, array, delimiter): generate a string list with delimiters and store the list to an array.
4.6 replace strings in text or files.
1. You can use the following method to replace a string or style.
Sed's/pattern/replace_string/G' file // This command replaces all matched items. The/g consciousness is global ), this means that it will replace all the Matching content in the file.
2. When the file name is passed to sed, sed writes the output to stdout. If you do not want to deliver the output to stdout, instead, save the changes to the original file. You can use the-I option sed's/pattern/replace_string/G'-I file.
4.7 Merge files by Column
1. You can use the paste command to splice by column: paste file1 file2 file3
2. The default delimiters are TAB characters. You can also use-d to specify the delimiters explicitly.
Paste file1 file2-d "," // specifies that the Delimiter is ","
4.8 print the nth word or column in a file or row
1. Run the following command to print the Fifth Column.
Awk '{print $5}' file // This command prints the fifth column of the file
2. Use the following syntax to print all text in the range from Row M to row N.
Awk 'nr = N, NR = M' file
3. The command tac can print files in reverse order.
Seq 5 | tac // This command will output 5 4 3 2 1
Tac file // This command prints the file in reverse order
4. Use awk to implement head, tail, and tac
Awk 'nr <= 10' file // This command prints the first 10 lines of file by default.
Awk '{buffer [NR % 10] = $0} END {for (I = 1; I <11; I ++) print buffer [I % 10]} file // This command prints the last 10 lines of file by default.
Awk '{buffer [NR] = $0} END {for (I = NR; I> 0; I --) print buffer [I]} 'file // This command prints the file in reverse order by default.