The use of the following tools is based on the simple use of regular expressions on the basis of not understanding the group to attack their own homework.
Sed
Sed is a streaming editor, a text editing tool that performs text operations in a behavioral unit. Sed defaults to match with the basic rules.
The common command format is as follows:
option‘/pattern/action‘ file
Pattern: is a regular expression used to match the line of string to be manipulated.
Action: Is the action method. Common methods of operation are:
- p, print the matched content two times, and print the other unmatched prints once.
- D, delete the matched string.
- s, to match the character substitution. Common party command format is as follows:
SED option ' patter/s/patter1/patter2/g ' file
In the string line that matches the patter, replace the patter1 with Patter2. The plus G option is replace all, and the no G option replaces only the first one.
- N, reads the next line into the pattern buffer, emptying the original contents of the pattern space.
- N, the next line of the read file is appended to the mode space, and the original content is not invited.
Option: Options, the corresponding operation of the data, commonly used are:
- -N will match the line output to the string, the other is not output.
- -I synchronously modifies the modified content to the source file.
- -e multiple Edit command to connect multiple SED instructions to the same row.
Addressing
Addressing is used to decide which rows to edit. The representation of an address can be a combination of numbers, regular expressions, or both. If no address is specified, SED processes each line of the input file.
eg
Sed-n ' 3p ' file prints the third line of files.
Sed-n ' 100,200p ' file prints the information of line 100th on line ~200.
Addresses are separated by commas, and the address that needs to be processed is the range between the two lines, including these two lines. The range can be represented by an array of regular expressions or a combination of the two.
Sed ' 2,5d ' file delete line 2nd exactly 5 lines
The sed '/start/,/end/d ' file deletes rows that contain ' start ' lines and ' end ' rows.
Sed '/start/,10d ' file
Mode space
When SED operates on a file, it reads each copy of the file in turn into a special buffer, called the pattern space, and sequentially reads one row. After reading the regular expression of the match, if matching, then the action action, if the mismatch is discarded. The next line is read immediately after processing is finished. So the SED operation on the file line is within the schema space and does not modify the source file.
Keep space
We can think of preserving space as a warehouse, as a staging area for data, but remember that processing data still requires the data to be loaded into the pattern space by the holding space.
Keeping space is not a common use, only the following instructions will be used to maintain space.
G: Copies the contents of the hold space into the pattern space and clears the original contents of the pattern space.
G: Appends the contents of the hold space to the pattern space without erasing the original content.
H: Copy the contents of the pattern space to the hold space and clear the contents of the original hold space.
H: Appends the contents of the pattern space to the hold space, without erasing the original content.
D: Delete all rows of the pattern space and read into the next new line to pattern space.
D: Delete the first line of the multiline pattern, and do not read into the next line.
X: Swap the contents of the space with the pattern space.
eg
① add a blank line to the end of the file
② reverse the file output (simulate TAC instructions)
③ appends the matching rows to the end of the file.
-e‘/hello/H‘-e‘$G‘ file ###类似于复制功能
-e‘/hello/{H;d}‘-e‘$G‘## 类似于剪切功能
④ Row and column conversions
‘H;{x;s/\n/ /g;p}‘file
Mode space by default to remove the \ n of each line, so to want to only in the pattern space will be replaced is not feasible, to keep the space inside there are two lines or more than two lines of content, after each line is added \ n, so now hold the data to maintain space, in the execution of x instruction, mode space and maintain the space content exchange, And then replace them.
⑤ and the 1~100
seq100‘H;${x;s/\n/+/g;s/^+//;p}‘### bc指令是对表达式求和。
s/^+//indicates that the extra plus sign is replaced with an empty opening.
⑥ reading parity lines
The n command is used here to read the next line to the pattern space.
‘p;n‘file ### 读取奇数行‘n;p‘file ### 读取偶数行
Label
Define a Label:
:a ### 定义标签规则为冒号加标签名,例标签名为a
Jump to tag: B + Sign
ba ### 跳转到标签a
Achieve the sum of 1 to 100 again:
Sed-n ': A; n;s/\n/+/g; {!ba};p ' # # # # #!BA indicates that the last line does not jump to label a
n implements the ability to append the next line to the pattern space, such as the first execution of the statement, which reads 1 into the pattern space, at which point execution N appends the next line to the pattern space, where the contents of the pattern space become 1\n2, the + sign is replaced, and so on, until the last line.
Awk
Awk is both a text analysis tool and a scripting language. As a text analysis tool, it is much more powerful than grep or SED, but its usage is similar to sed. As a scripting language, it is similar to the C language syntax, with the same branch and loop structure as the C language, and is a Class C language.
In contrast to SED, the power of awk is that it can be used to edit text in a unit of behavior, or as a unit. The default line delimiter for awk is line break \ n, and the default column delimiter is a contiguous space or tab. Like what:
In addition to using spaces and tab as separators, you can also customize delimiters, such as delimiters with colons.
When a unit is listed,$0 represents the contents of the entire row, and$1 represents the first column ... the $n represents the nth column.
Format of the awk command line:
option‘/pattern/{action}‘option -f scriptfile file ### 用 -f 指定脚本文件
Pattern is a regular expression that matches the row to be manipulated. Action is the act to be performed.
Here to say a-f option, the-F option can specify the input field delimiter, when we use our own specified delimiter within the file, the default awk directive is not recognized, we need to use the-F option to specify the delimiter we need to identify. If the delimiter we used above is a colon ': ', below I want to print the contents of the second column:
‘{print $2;}‘file### 失败,系统无法辨识分隔符。
-F:
‘{print $2;}‘file### 成功,指定分隔符为:
Regular expressions
use regular expressions for row matching:
① Find out the contents of the PRODUCTC line:
② find the contents of the number 2 (third column, ending with 2):
Specifies the domain for regular expression matching. ~ with! ~
You can use the ~ to specify a fixed field (column) for regular matching. ~ with ~! Same. Used with the IF statement.
① find the row of data starting with 1 in the second column.
awk‘{if($2 ~ /^1/){print $0;}}‘ file
② find data rows in the second column that do not start with 1.
awk‘{if ($2 !~ /^1/){print $0}}‘ file
Condition matching
In addition to using regular expressions for row matching, you can also perform conditional matching, with the following command format:
option‘condition{action}‘ file
For example, mark the second column with a value less than 100 as no, and the other mark to Yes.
awk -F: ‘$2$0,"NO";}$2$0,"YES";}‘ file
Note the {} notation, comma ', ' as the output field delimiter, is converted to a space when output.
Befin and End
To understand begin and end, first understand the three processes that awk performs, respectively, before text processing, text processing, and after text processing.
Begin is the action that is performed before the text is processed, and end is the action that is performed after the text is processed.
Eg: calculates the number of rows using begin and end.
awk -F: ‘BEGIN{x=0}{print $0;x++}END{print "total:",}‘
Begin,end can be used separately. As follows: Use end alone to output line numbers.
As we said above, awk is also a Class C language, a weakly typed language whose variables do not need to be defined and can be used directly, and the default initial value of x in the following example is 0.
awk Script
Awk can also be used as a shell script in addition to the command-line usage described above. Because awk is also a scripting language, AWK has its own command interpreter,/bin/awk, or/bin/awk-f.
Test.awk:
#!/bin/awk-fbegin{count1=0;//Note the definition format of the variable count2=0; count3=0; Total=0;} {Print $;if( $< -) {count1++; }Else if( $>= -&& $< $) {count2++; }Else if( $>= $) {count3++; }total++;} end{printf("<100: %d\ n", count1);# # # Class C language, can be directly used printf. printf(">=100 && <: %d\ n", Count2);printf(">=200: %d\ n", COUNT3);printf("Total: %d\ n", total);}
The awk script file is called in the following format:
-f awkfile file
Cases:
-F-f test.awk file
The result of the execution is:
awk built-in variables
ARGC 命令行参数的个数ENVIRON 支持队列中系统环境变量的使用FILENAME awk浏览的文件名FNR 浏览文件中的记录数(行数)FS 设置输入域分隔符,等价于命令行 -F选项NF 浏览记录的域的个数NR 已读的记录数OFS 输出域分隔符ORS 输出记录分隔符RS 控制记录分隔符
printf and print
Awk is a Class C language, so you can use printf on a script or command line. Sometimes using printf can make the output format more neat.
eg
‘{printf("filename:%s count:%d data:%s\n",FILENAME,FNR,$0)}‘file
Results:
[Email protected] ~]$ awk-f: ' {printf ("filename:%s count:%d data:%s\n", filename,fnr,$0)}‘fileFileNamefile Count:1Data:producta:123:1FileNamefile Count:2DATA:PRODUCTB: A:2FileNamefile Count:3DATA:PRODUCTC: at:3FileNamefile Count:4DATA:PRODUCTD:3:4FileNamefile Count:5DATA:PRODUCTE:223:5
Exercise: Count the number of bytes in a normal file under the statistics directory
ls‘^-‘‘{print $9,$5;total+=$5}END{print total}‘
Results:
In addition, you can use the Find command to find files of the corresponding size, but you cannot sum them.
.-size+100-a-size-1000-exec-l\;
Cut
The function of cut is ' cut ', and the text is processed in the unit of behavior. The command format is as follows:
There are three main options:
-B: Cut according to Byte.
-C: cut by character
The difference between-B and-C is that-B cannot cut Chinese, and a Chinese is a character, so-C can cut Chinese. In relation to English characters, they have the same two functions.
-F cuts by field. Used with-D, specifies the delimiter,-f specifies the domain.
Sort
The function of sort is to sort the specified files according to certain rules. The format is:
sortfile
1. Use sort by default to sort by the value of the character's ACCIS code.
2,-u, according to the character Accis code in ascending order, and remove duplicate rows.
3,-R, reverse order
4. Sort File-o file, sorting and modifying source files.
5.-N Sorts by numeric size.
6. Sort by the specified column,-t specifies the delimiter,-K specifies the number of columns.
7,-F, the lowercase letters are converted to uppercase for comparison, that is, the case is ignored.
8,-c check whether the file has been ordered, if disorderly order, then output the first disorderly sequence of the relevant information, and finally return 1.
9, check whether the file is ordered, if disorderly, do not output content, only return 1.
10,-m Sorted by Month
11,-B, ignores the blank parts preceding each line, starting with the first visible character.
eg
[lzk@localhost ~]$ 110500010050005030001004500
1. Sort by the number of people in the second column.
[lzk@localhostsort -t‘ ‘2file503000100500010045001105000
2, according to the number of people, when the number of the same, according to the third column of wages to sort.
[lzk@localhostsort -t‘ ‘23file503000100450010050001105000
3, according to the company name the second letter began to compare (that is, according to the first field of the 2nd letter until the end of this domain)
[lzk@localhostsort -t‘ ‘1.2file100500010045001105000503000
4. Sort by the second letter of the company name only, if the same, according to the number of employees.
Because only the second letter of the first column is sorted, it is represented in 1.2,1.2, 2,2 means that only the 2nd field is sorted, and if only one 2 is written, it is sorted by the 2nd field to the last field.
[lzk@localhostsort -t‘ ‘1.2,1.22,2file100500010045001105000503000
Uniq
This command reads the input file and compares adjacent rows, under normal circumstances, the second after which more repeated rows are to be mountainous, and the row comparison is based on the sort sequence of the character set used. The result of the command processing is written to the output file. The input file and output file must be all. If the input file is represented by '-', it is read from the standard input.
The common options are as follows:
-C: Remove consecutive duplicate rows, and at the beginning of each line with the bank repeated occurrences of this time. You can replace the-u or-d option.
-D: Displays only duplicate rows.
-U: Displays only rows that are not duplicates in the file.
Linux common text editing tools and common directives