Linux common text editing tools and common directives

Source: Internet
Author: User

The use of the following tools is based on the simple use of regular expressions on the basis of not understanding the group to attack their own homework.

Sed

Sed is a streaming editor, a text editing tool that performs text operations in a behavioral unit. Sed defaults to match with the basic rules.
The common command format is as follows:

option‘/pattern/action‘ file

Pattern: is a regular expression used to match the line of string to be manipulated.

Action: Is the action method. Common methods of operation are:

    • p, print the matched content two times, and print the other unmatched prints once.
    • D, delete the matched string.
    • s, to match the character substitution. Common party command format is as follows:
      SED option ' patter/s/patter1/patter2/g ' file
      In the string line that matches the patter, replace the patter1 with Patter2. The plus G option is replace all, and the no G option replaces only the first one.
    • N, reads the next line into the pattern buffer, emptying the original contents of the pattern space.
    • N, the next line of the read file is appended to the mode space, and the original content is not invited.

Option: Options, the corresponding operation of the data, commonly used are:

    • -N will match the line output to the string, the other is not output.
    • -I synchronously modifies the modified content to the source file.
    • -e multiple Edit command to connect multiple SED instructions to the same row.
Addressing

Addressing is used to decide which rows to edit. The representation of an address can be a combination of numbers, regular expressions, or both. If no address is specified, SED processes each line of the input file.
eg
Sed-n ' 3p ' file prints the third line of files.
Sed-n ' 100,200p ' file prints the information of line 100th on line ~200.
Addresses are separated by commas, and the address that needs to be processed is the range between the two lines, including these two lines. The range can be represented by an array of regular expressions or a combination of the two.
Sed ' 2,5d ' file delete line 2nd exactly 5 lines
The sed '/start/,/end/d ' file deletes rows that contain ' start ' lines and ' end ' rows.
Sed '/start/,10d ' file

Mode space

When SED operates on a file, it reads each copy of the file in turn into a special buffer, called the pattern space, and sequentially reads one row. After reading the regular expression of the match, if matching, then the action action, if the mismatch is discarded. The next line is read immediately after processing is finished. So the SED operation on the file line is within the schema space and does not modify the source file.

Keep space

We can think of preserving space as a warehouse, as a staging area for data, but remember that processing data still requires the data to be loaded into the pattern space by the holding space.
Keeping space is not a common use, only the following instructions will be used to maintain space.
G: Copies the contents of the hold space into the pattern space and clears the original contents of the pattern space.
G: Appends the contents of the hold space to the pattern space without erasing the original content.
H: Copy the contents of the pattern space to the hold space and clear the contents of the original hold space.
H: Appends the contents of the pattern space to the hold space, without erasing the original content.
D: Delete all rows of the pattern space and read into the next new line to pattern space.
D: Delete the first line of the multiline pattern, and do not read into the next line.
X: Swap the contents of the space with the pattern space.

eg
① add a blank line to the end of the file

② reverse the file output (simulate TAC instructions)

③ appends the matching rows to the end of the file.

-e‘/hello/H‘-e‘$G‘ file  ###类似于复制功能

-e‘/hello/{H;d}‘-e‘$G‘## 类似于剪切功能

④ Row and column conversions

‘H;{x;s/\n/ /g;p}‘file

Mode space by default to remove the \ n of each line, so to want to only in the pattern space will be replaced is not feasible, to keep the space inside there are two lines or more than two lines of content, after each line is added \ n, so now hold the data to maintain space, in the execution of x instruction, mode space and maintain the space content exchange, And then replace them.

⑤ and the 1~100

seq100‘H;${x;s/\n/+/g;s/^+//;p}‘### bc指令是对表达式求和。

s/^+//indicates that the extra plus sign is replaced with an empty opening.

⑥ reading parity lines
The n command is used here to read the next line to the pattern space.

‘p;n‘file  ### 读取奇数行‘n;p‘file  ### 读取偶数行

Label

Define a Label:

:a   ### 定义标签规则为冒号加标签名,例标签名为a

Jump to tag: B + Sign

ba   ### 跳转到标签a

Achieve the sum of 1 to 100 again:

Sed-n ': A; n;s/\n/+/g; {!ba};p ' # # # # #!BA indicates that the last line does not jump to label a

n implements the ability to append the next line to the pattern space, such as the first execution of the statement, which reads 1 into the pattern space, at which point execution N appends the next line to the pattern space, where the contents of the pattern space become 1\n2, the + sign is replaced, and so on, until the last line.

Awk

Awk is both a text analysis tool and a scripting language. As a text analysis tool, it is much more powerful than grep or SED, but its usage is similar to sed. As a scripting language, it is similar to the C language syntax, with the same branch and loop structure as the C language, and is a Class C language.

In contrast to SED, the power of awk is that it can be used to edit text in a unit of behavior, or as a unit. The default line delimiter for awk is line break \ n, and the default column delimiter is a contiguous space or tab. Like what:

In addition to using spaces and tab as separators, you can also customize delimiters, such as delimiters with colons.

When a unit is listed,$0 represents the contents of the entire row, and$1 represents the first column ... the $n represents the nth column.

Format of the awk command line:

option‘/pattern/{action}‘option -f scriptfile file  ### 用 -f 指定脚本文件

Pattern is a regular expression that matches the row to be manipulated. Action is the act to be performed.

Here to say a-f option, the-F option can specify the input field delimiter, when we use our own specified delimiter within the file, the default awk directive is not recognized, we need to use the-F option to specify the delimiter we need to identify. If the delimiter we used above is a colon ': ', below I want to print the contents of the second column:

‘{print $2;}‘file### 失败,系统无法辨识分隔符。

-F:

‘{print $2;}‘file### 成功,指定分隔符为:

Regular expressions use regular expressions for row matching:

① Find out the contents of the PRODUCTC line:

② find the contents of the number 2 (third column, ending with 2):

Specifies the domain for regular expression matching. ~ with! ~

You can use the ~ to specify a fixed field (column) for regular matching. ~ with ~! Same. Used with the IF statement.
① find the row of data starting with 1 in the second column.

awk‘{if($2 ~ /^1/){print $0;}}‘ file

② find data rows in the second column that do not start with 1.

awk‘{if ($2 !~ /^1/){print $0}}‘ file

Condition matching

In addition to using regular expressions for row matching, you can also perform conditional matching, with the following command format:

option‘condition{action}‘ file

For example, mark the second column with a value less than 100 as no, and the other mark to Yes.

awk -F: ‘$2$0,"NO";}$2$0,"YES";}‘ file

Note the {} notation, comma ', ' as the output field delimiter, is converted to a space when output.

Befin and End

To understand begin and end, first understand the three processes that awk performs, respectively, before text processing, text processing, and after text processing.
Begin is the action that is performed before the text is processed, and end is the action that is performed after the text is processed.

Eg: calculates the number of rows using begin and end.

awk -F: ‘BEGIN{x=0}{print $0;x++}END{print "total:",}‘


Begin,end can be used separately. As follows: Use end alone to output line numbers.
As we said above, awk is also a Class C language, a weakly typed language whose variables do not need to be defined and can be used directly, and the default initial value of x in the following example is 0.

awk Script

Awk can also be used as a shell script in addition to the command-line usage described above. Because awk is also a scripting language, AWK has its own command interpreter,/bin/awk, or/bin/awk-f.

Test.awk:

#!/bin/awk-fbegin{count1=0;//Note the definition format of the variable count2=0; count3=0; Total=0;} {Print  $;if( $< -) {count1++; }Else if( $>= -&& $< $) {count2++; }Else if( $>= $) {count3++; }total++;} end{printf("<100: %d\ n", count1);# # # Class C language, can be directly used printf.     printf(">=100 && <: %d\ n", Count2);printf(">=200: %d\ n", COUNT3);printf("Total: %d\ n", total);}

The awk script file is called in the following format:

-f awkfile file

Cases:

-F-f test.awk file

The result of the execution is:

awk built-in variables
ARGC 命令行参数的个数ENVIRON 支持队列中系统环境变量的使用FILENAME awk浏览的文件名FNR  浏览文件中的记录数(行数)FS 设置输入域分隔符,等价于命令行 -F选项NF 浏览记录的域的个数NR 已读的记录数OFS 输出域分隔符ORS 输出记录分隔符RS 控制记录分隔符
printf and print

Awk is a Class C language, so you can use printf on a script or command line. Sometimes using printf can make the output format more neat.
eg

‘{printf("filename:%s count:%d data:%s\n",FILENAME,FNR,$0)}‘file

Results:

[Email protected] ~]$ awk-f: ' {printf ("filename:%s count:%d data:%s\n", filename,fnr,$0)}‘fileFileNamefile Count:1Data:producta:123:1FileNamefile Count:2DATA:PRODUCTB: A:2FileNamefile Count:3DATA:PRODUCTC: at:3FileNamefile Count:4DATA:PRODUCTD:3:4FileNamefile Count:5DATA:PRODUCTE:223:5
Exercise: Count the number of bytes in a normal file under the statistics directory
ls‘^-‘‘{print $9,$5;total+=$5}END{print total}‘

Results:

In addition, you can use the Find command to find files of the corresponding size, but you cannot sum them.

.-size+100-a-size-1000-exec-l\;
Cut

The function of cut is ' cut ', and the text is processed in the unit of behavior. The command format is as follows:
There are three main options:
-B: Cut according to Byte.
-C: cut by character
The difference between-B and-C is that-B cannot cut Chinese, and a Chinese is a character, so-C can cut Chinese. In relation to English characters, they have the same two functions.

-F cuts by field. Used with-D, specifies the delimiter,-f specifies the domain.

Sort

The function of sort is to sort the specified files according to certain rules. The format is:

sortfile

1. Use sort by default to sort by the value of the character's ACCIS code.
2,-u, according to the character Accis code in ascending order, and remove duplicate rows.
3,-R, reverse order
4. Sort File-o file, sorting and modifying source files.
5.-N Sorts by numeric size.
6. Sort by the specified column,-t specifies the delimiter,-K specifies the number of columns.

7,-F, the lowercase letters are converted to uppercase for comparison, that is, the case is ignored.
8,-c check whether the file has been ordered, if disorderly order, then output the first disorderly sequence of the relevant information, and finally return 1.
9, check whether the file is ordered, if disorderly, do not output content, only return 1.
10,-m Sorted by Month

11,-B, ignores the blank parts preceding each line, starting with the first visible character.

eg

[lzk@localhost ~]$ 110500010050005030001004500

1. Sort by the number of people in the second column.

[lzk@localhostsort -t‘ ‘2file503000100500010045001105000

2, according to the number of people, when the number of the same, according to the third column of wages to sort.

[lzk@localhostsort -t‘ ‘23file503000100450010050001105000

3, according to the company name the second letter began to compare (that is, according to the first field of the 2nd letter until the end of this domain)

[lzk@localhostsort -t‘ ‘1.2file100500010045001105000503000

4. Sort by the second letter of the company name only, if the same, according to the number of employees.
Because only the second letter of the first column is sorted, it is represented in 1.2,1.2, 2,2 means that only the 2nd field is sorted, and if only one 2 is written, it is sorted by the 2nd field to the last field.

[lzk@localhostsort -t‘ ‘1.2,1.22,2file100500010045001105000503000
Uniq

This command reads the input file and compares adjacent rows, under normal circumstances, the second after which more repeated rows are to be mountainous, and the row comparison is based on the sort sequence of the character set used. The result of the command processing is written to the output file. The input file and output file must be all. If the input file is represented by '-', it is read from the standard input.
The common options are as follows:
-C: Remove consecutive duplicate rows, and at the beginning of each line with the bank repeated occurrences of this time. You can replace the-u or-d option.

-D: Displays only duplicate rows.

-U: Displays only rows that are not duplicates in the file.

Linux common text editing tools and common directives

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.