Shell knowledge point supplement (3)-Modify the language/special characters/printf/SED tool/awk tool/diff/CMP

Last Update:2018-12-05 Source: Internet

Author: User

Tags print format egrep

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. The method for modifying the language family is:

[Root @ test root] # lang = en (specify other syntaxes as needed, such as C)

[Root @ test root] # export Lang

Linux VI deletes all specified characters

Press ESC to return the command status

Enter the following command, for example, delete the first line in the file | symbol

: % S/| other characters are similar

2. Characters)

Re character meaning and example

^ Word
The string (Word) to be searched is at the beginning of the line! Example: grep-n' ^ # 'regular_express.txt searches for the line starting!

Word $
The string (Word) to be searched is at the end of the row! Example: grep-n '! $ 'Regular_express.txt will end the line! Print the line!

. (Point number)
Represents any character. It must be any character! Example: grep-N 'E. E' regular_express.txt
The searched string can be (Eve) (family) (eee) (e), but not only (EE )! That is, there is only one character between E and E, and the space character is also a character!
Escape characters to remove the special meanings of special symbols!

\
Example: grep-n \ 'regular_express.txt searches for the line containing single quotation marks!
Duplicate zero or multiple previous re characters

*
Example: grep-N 'ess * 'regular_express.txt find strings containing (ES) (ESS) (esss) and so on. Note that * Can be 0, therefore, elasticsearch matches the search string. In addition, because * is used to repeat the symbol of the previous re character, you must add a re character before! For example, if any character is set 『.*』!
For N to M consecutive "First Re character", if it is \ {n \}, It is n consecutive first re characters. If it is \ {n, \} is more than N consecutive first re characters!

\ {N, m \}
Example: grep-N 'go \ {2, 3 \} G' regular_express.txt contains two to three o strings between G and G, that is, (GOOG) (gooog)
Character Set character set re special character symbol

[]
[LIST] example: grep-N 'G [ld] 'regular_express.txt searches for the row containing (GL) or (GD ~ Note that in [], "represents a character to be searched". For example, a [AFL] y indicates that the searched string can be AAY, afy, aly, that is, [AFL] represents A, F, or l! [Ch1-ch2] example: grep-n '[0-9]' regular_express.txt search for the row containing any number! Pay special attention to the subtraction in character set combination []
-It indicates all consecutive characters between two characters! However, this continuity is related to ASCII encoding. Therefore, you must set the encoding correctly (in bash, check whether the Lang and language variables are correct !) For example, all uppercase characters are [A-Z] [^] example: grep-N 'oo [^ t] 'regular_express.txt the searched string can be (Oog) (OOD) but it cannot be (OOT). The ^ in [] represents the meaning of "reverse selection ~ For example, if I do not use uppercase characters, it is [^ A-Z] ~ However, you must note that
Grep-N [^ A-Z] regular_express.txt to search, but found that all rows in the file are listed, why? Because this [^ A-Z] means "non-capital characters", because each line has non-capital characters, for example, the first line of "open source" has P, E, n, O .... and so on lower case characters, as well as double quotes (") and other characters, so of course meet the [^ A-Z] search!

Note that the special characters in regular notation are not the same as the universal characters generally used to input commands in the Command column. For example, * Indicates 0 ~ The meaning of multiple characters is unlimited, but in regular notation, * Indicates repeating 0 to multiple previous re characters ~ The meaning of use is different. Don't confuse it!

Extensibility:

+
Repeat the previous re character of "one or more". Example: egrep-N 'go + d' regular_express.txt search (God) (good) (goood )... and so on. The O + represents "more than one o". Therefore, the above execution results will be ranked among 1st, 9, and 13.
The first re character of "zero or one"
?
The first re character of "zero or one". Example: egrep-n' go? D 'regular_express.txt search (GD) (God) The two strings. Which o? It indicates "null or 1 O". Therefore, the above execution results will be ranked by 13th and 14. Have you found that these two cases ('go + d' and 'go? D ') is the result set the same as 'go * d? Think about it. Why! Pai_^

|
Use the or method to locate several strings. Example: egrep-N 'gd | good 'regular_express.txt searches for the GD or good strings. Note that it is "or 』! Therefore, lines 1, 9, and 14 can be printed! What if I still want to find the dog? In this case: egrep-n'gd | good | dog 'regular_express.txt

()
Find the "Group" string. Example: egrep-n 'G (La | OO) d' regular_express.txt search for the two strings (gglad) or (good, because G and D are repeated, so I
Then we can list la and OO in () and separate them with |! In addition, this function can be used to identify multiple duplicate groups! For example: Echo 'axyzxyzxyzc' | egrep 'a (XYZ) + C' in the above example, it means that I want to find the name starting with a and ending with C, there is more than one "XYZ" string in the middle ~

3. Format and print: printf

[Root @ Linux ~] # Printf 'print format' actual content
Parameters:
Several special styles about the format:
\ A warning sound output
\ B backspace)
\ F form feed)
\ N output a new row
\ R or enter
\ T horizontal [Tab] button
\ V vertical [tabl] buttons
\ Xnn is a two-digit number, which can be converted into characters.
Common variable formats in C Programming Languages
% NS the N is a number, and s represents the string, that is, the number of characters;
% Ni the N is a number, and I represents integer, that is, the number of integer words;
% N. NF the N and n are numbers, and f Represents floating (floating point). If there are decimal words,
Suppose I want a total of ten digits, but there are two decimal places, that is, % 10.2f!
Example:
Example 1: Convert the data in the previous step into an archive. Only names and scores are listed: (separated by [Tab ])
[Root @ Linux ~] # Printf '% s \ t \ n' cat printf.txt'
Name Chinese English math average
Dmtsai 80 60 92 77.33
Vbird 75 55 80 70.00
Ken 60 90 70 73.33
# If I save the above file as the name of the printf.txt file, you can use the above case,
# Separate each word with a [Tab] key. From the output above, although the second line is OK,
# However, the first line cannot be aligned because some words have a long length! While % s represents a string (string)
. Each content is separated by \ t, that is, [Tab!
Example 2: display the information after the second line with a string, integer, or decimal point:
[Root @ Linux ~] # Printf '% 10 S % 5I % 5I % 5I % 8.2f \ n' cat printf.txt | \
> Grep-V name'
Dmtsai 80 60 92 77.33
Vbird 75 55 80 70.00
Ken 60 90 70 73.33
# The output at this time is interesting! I divide several contents into different data formats for output,
# The most interesting thing is the % 8.2f project! I can output different decimal words in the format,
# For example, what is the output result when you look at it on your own!
# Printf '% 10 S % 5I % 5I % 5I % 8.1f \ n' cat printf.txt | grep-V name'
Example 3: List the characters represented by the value 45?
[Root @ Linux ~] # Printf' \ x45 \ N'
E

4. Introduction to SED tools

Sed can analyze standard input (stdin) data, and then output the data to a standrad out (stdout) tool after processing;

[Root @ Linux ~] # Sed [-nefr] [action]
Parameters:
-N: silent mode. In general sed usage, all data from stdin
The data is usually listed on the screen. However, if the-n parameter is added
The row (or action) specially processed by SED will be listed.
-E: directly edit the SED action in the Command column mode;
-F: Write the SED action directly in a file.-f filename can execute
Sed action;
-R: sed supports the syntax of extended regular notation. (The default is the basic regular expression syntax)
Action Description: [N1 [, N2] Function
N1, N2: does not necessarily exist. It generally indicates "number of rows selected for the action". For example, if my action
If it is between 10 and 20 rows, then "10, 20 [Action Behavior]"
Function has the following functions:
A: new. A can be followed by strings. These strings will appear in the new row (the next row currently )~
C: replace. C can be followed by strings. These strings can replace rows between N1 and N2!
D: delete. Because it is deleted, D is usually followed by no comment;
I: insert, I can be followed by strings, and these strings will appear in the new line (the previous line currently );
P: print the selected data. Usually P will work with the sed-N parameter ~
S: replace, you can directly replace the work! Generally, this s action can be used together.
Regular representation! For example, 1, 20 s/old/new/g!
Example:
Example 1: List the content of/etc/passwd and print the row number ~ Delete 5 rows!
[Root @ Linux ~] # NL/etc/passwd | sed '2, 5d'
1 root: X: 0: 0: Root:/root:/bin/bash
6 Sync: X: 5: 0: Sync:/sbin:/bin/Sync
7 shutdown: X: 6: 0: shutdown:/sbin/Shutdown
... (Omitted later ).....
# Have you seen it? Because 2-5 rows are deleted from the table, NO 2-5 rows are displayed ~
# In addition, note that sed-E should have been issued, and no-E is required!
# At the same time, note that the action followed by sed must be enclosed in two single quotes!
# If you only need to delete 2nd rows, you can use NL/etc/passwd | sed '2d,
# For the last line from 3rd to, NL/etc/passwd | sed '3, $ d!
Example 2: Add "drink tea?" To the question behind the second line (that is, to the third line ?』 Words!
[Root @ Linux ~] # NL/etc/passwd | sed '2a drink tea'
1 root: X: 0: 0: Root:/root:/bin/bash
2 bin: X: 1: 1: Bin:/bin:/sbin/nologin
Drink tea
3 daemon: X: 2: 2: daemon:/sbin/nologin
# Hey hey! The string added after line A will already appear after line 2! What if it is before the second line?
# NL/etc/passwd | sed '2i drink tea 'is right!
Example 3: Add two lines after the second line, for example, "Drink tea or..." Drink beer ?』
[Root @ Linux ~] # NL/etc/passwd | sed '2a drink tea or ......\
> Drink beer? '
1 root: X: 0: 0: Root:/root:/bin/bash
2 bin: X: 1: 1: Bin:/bin:/sbin/nologin
Drink tea or ......
Drink beer?
3 daemon: X: 2: 2: daemon:/sbin/nologin
# The focus of this example is that we can add more than one line! Several lines can be added ~
# However, each row must be added with a backslash! Therefore, in the above example,
# We can find that \ exists at the end of the first line! That must be done!
Example 4: I want to replace the content in line 2-5 with "NO 2-5 number?
[Root @ Linux ~] # NL/etc/passwd | sed '2, 5C NO 2-5 number'
1 root: X: 0: 0: Root:/root:/bin/bash
No 2-5 Number
6 Sync: X: 5: 0: Sync:/sbin:/bin/Sync
# No rows 2-5, hey! The data we need will show up!
Example 5: only 5th-7 rows are listed.
[Root @ Linux ~] # NL/etc/passwd | sed-n'5, 7p'
5 LP: X: 4: 7: LP:/var/spool/lpd:/sbin/nologin
6 Sync: X: 5: 0: Sync:/sbin:/bin/Sync
7 shutdown: X: 6: 0: shutdown:/sbin/Shutdown
# Why do we need to add the-n parameter? You can issue sed '5, 7p' on your own! (Repeated output in rows 5-7)
# When the-n parameter is added, the output data is much worse!
Example 6: We can use ifconfig to list IP addresses?
[Root @ Linux ~] # Ifconfig eth0
Eth0 link encap: Ethernet hwaddr 00: 51: FD: 52: 9A: CA
Inet ADDR: 192.168.1.12 bcast: 192.168.1.255 mask: 255.255.255.0
Inet6 ADDR: fe80: 250: fcff: fe22: 9acb/64 scope: Link
Up broadcast running Multicast MTU: 1500 Metric: 1
... (Omitted below ).....
# Actually, all we need is the inet ADDR:... line. So, use grep and sed to catch it.
[Root @ Linux ~] # Ifconfig eth0 | grep 'inet '| SED's/^. * ADDR: // G' | \
> SED's/bcast. * $ // G'
# You can run the process of each pipeline (|) separately to find out the cause!
# After we start and end, we will get the IP address we need, that is, 192.168.1.12 ~
Example 7: extract the man settings in the/etc/man. config file, but do not describe the content.
[Root @ Linux ~] # Cat/etc/man. config | grep 'man '| SED's/#. * $ // G' | \
> SED '/^ $/d'
# If there is # In each row, it indicates the behavior annotation, but note that sometimes,
# Annotation is not written in the first character, that is, after a command, as shown below:
# "Shutdown-H now # This is the command to shut down", and the comment # is behind the command.
# Therefore, we will use the regular notation #. * $!

Sed removes the number at the beginning of the line:

Sed's/^ [0-9] * // G' filename

5. Introduction to awk tools

Awk tends to divide a row into several "fields" for processing;

[Root @ Linux ~] # Awk 'condition type 1 {Action 1} Condition Type 2 {Action 2}... 'filename
Awk can process subsequent files or read standard output from the previous command. However, as mentioned above, awk mainly processes "data in fields in each row", while the preset "field separator is" Space key"
Or "[Tab] Key "』! (If it is intercepted by other characters, you need to preset the variables, as shown in the following article.) For example, we can use last to retrieve the data of the login user. The result is as follows:
[Root @ Linux ~] # Last
Dmtsai pts/0 192.168.1.12 Mon Aug 22 still logged in
Root tty1 Mon Aug 15)
Reboot system boot 2.6.11 sun Aug 14 (7 + 15: 41)
Dmtsai pts/0 192.168.1.12 Fri Aug 12)
If I want to retrieve the IP address of the account and the login user, and separate the account and IP address with a [Tab], it will become like this:
[Root @ Linux ~] # Last | awk '{print $1 "\ t" $3}' the braces are enclosed in single quotes. Do not confuse them with the priority. printf can also be used, it will not wrap when used;
Dmtsai 192.168.1.12
Root mon
Reboot boot
Dmtsai 192.168.1.12
No matter which row I want to process, there is no need to limit the "Condition Type! What I want is the first and third columns, but the content in the second and third rows is strange ~ This is because of the data format problem! So ~ When using awk, please first confirm that your data contains continuity data without spaces or [tabs]. Otherwise, it will be like this example, misjudgment will occur! In addition, you will also know from the above example that every field in each row has a variable name, that is, $1, $2... for example, dmtsai is $1.
Because he is in the first column! 192.168.1.12 is the third column, so it is $3! And so on ~ Haha! There is another variable! That is, $0 and $0 represent the meaning of "a full column of data ~ In the preceding example, $0 in the first line represents the line "dmtsai pts/0! We can see that the entire awk processing process in the above four rows is:
1. Read the first line, and fill in the data in the first line with variables such as $0, $1, $2;
2. Determine whether to perform the subsequent "action" based on the "Condition Type" restriction ";
3. complete all the actions and condition types;
4. If there is any subsequent "row" data, repeat the above 1 ~ Step 3 until all data is read.

How does awk know how many lines of data I have? How many columns are there? This requires the help of the built-in variables of awk ~
Variable name meaning
The total number of fields owned by each line ($0) of NF is how many fields are counted at the end of the sentence $

NR currently awk processes the "row number" data, that is, adding a row number at the end of the sentence.
The delimiter of FS. The default Delimiter is the Space key equivalent to the output space.
Let's continue with the above example. If I want to list the accounts for each row and list the number of currently processed rows, and specify the number of fields in the row, then (note, all subsequent actions of the awk are enclosed by '. Therefore, if you want to print the content in print, remember that the text section of the non-variable contains the format mentioned in printf in the previous section, double quotation marks are required for definition !)
[Root @ Linux ~] # Last | awk '{print $1 "\ t lines:" Nr "\ t columes:" NF }'
Dmtsai lines: 1 columes:
10
Root lines: 2 columes: 9
Reboot lines: 3 columes: 9
Dmtsai lines: 4 columes: 10
In this way, we can understand the differences between NR and NF.

Logical operation character of awk
Now that you need to use the "condition" type, you need some logical operations ~ For example:
Computing Unit meaning
> Greater
<Less
> = Greater than or equal
<= Less than or equal
= Equal
! = Not equal
It is worth noting that the = symbol, because in the "logical operation", it is the so-called attention formula that is greater than, less than, equal to, etc., we are used to represent =, if a value is directly given, for example, when a variable is set, the value = is used directly. Well, let's use the logic to judge it! For example, in/etc/passwd, the fields are separated by colons ":". For example, if I want to check the data below 10 in the third column, and only list the accounts and the third column, you can do this:
[Root @ Linux ~] # Cat/etc/passwd | \
> Awk '{FS = ":" }3 3 <10 {print $1 "\ t" $3}' truncated by specified characters
Root: X: 0: 0: Root:/root:/bin/bash
Bin 1
Daemon 2
...... (Omitted below )......
Fun! However, why is the first line not displayed correctly? This is because when we read the first line, the variables $1, $2... the default value is still separated by the Space key, so although we have defined FS = ":", it takes effect only after the second line. So what should we do? We can set awk variables in advance! Use the begin keyword! In this way:
[Root @ Linux ~] # Cat/etc/passwd | \
> Awk 'in in {FS = ":"} $3 <10 {print $1 "\ t" $3 }'
Root 0
Bin 1
Daemon 2
...... (Omitted below )......
Interesting! Besides begin, we also have end! What if we want to use awk for the "computing function? In the following example, assume that I have a salary data table with the following content:
Name 1st 2nd 3th
Vbird 23000 24000 25000
Dmtsai 21000 20000 23000
Bird2 43000 42000 41000
How can I calculate the total amount of each person? I also want to format the output! You can store your data into a file named "pay.txt", then:
[Root @ Linux ~] # Cat pay.txt | \
> Awk 'nr = 1 {printf "% 10 S % 10 S % 10 S % 10 S % 10s \ n", $1, $2, $3, $4, "Total "}
NR> = 2 {Total = $2 + $3 + $4
Printf "% 10 S % 10d % 10d % 10d % 10.2f \ n", $1, $2, $3, $4, total }'
Name 1st 2nd 3th total
Vbird 23000 24000 25000 72000.00
Dmtsai 21000 20000 23000 64000.00
Bird2 43000 42000 41000 126000.00
The above example has several important things that should be explained first:
• All actions, that is, actions within {}, can be separated by semicolons (;) if multiple instructions are needed, or directly use the [enter] key to separate each command. For example, the above Nr> = 2 followed by the action, use Total =... that command is used to specify the sum, while printf will be used to format the output later!
• In logical operations, if it is "equal to", you must use two equal signs "= 』!
• When formatting the output, \ n must be added to the printf format setting before the output can be split!
• Unlike bash shell variables, variables can be directly used in awk without the need to add the $ symbol.
Awk can help us deal with a lot of daily work! It's really easy to use ~ In addition, the awk output format
Printf is often used for assistance. Therefore, it is better to familiarize yourself with printf! In addition, {} in the awk action also supports if (condition! For example, the preceding command can be changed to the following:
[Root @ Linux ~] # Cat pay.txt | \
> Awk '{If (Nr = 1) printf "% 10 S % 10 S % 10 S % 10 S % 10s \ n", $1, $2, $3, $4, "Total "}
NR> = 2 {Total = $2 + $3 + $4
Printf "% 10 S % 10d % 10d % 10d % 10.2f \ n", $1, $2, $3, $4, total }'

Practice:

We know that/etc/passwd is separated by:, and the first column is the account name. Please write a program to retrieve the first column of/etc/passwd, and each column is displayed with a line of string "The 1 account is" root, that 1 indicates the number of rows.

Method 1:

#! /Bin/bash

Accounts = 'cat/etc/passwd | cut-d': '-f1'

For account in $ accounts

Declare-I I = $ I + 1

Echo "the $ I account is \" $ account \""

Done

Method 2:

LOONG:/home/Yee/shell # Cat passwd | awk 'in in {FS = ":"} {print "the" Nr "account is" $1 }'
We can use awk in one sentence.

The 1 account is root
The 2 account is daemon
The 3 account is Bin
The 4 account is sys
The 5 account is sync
The 6 account is games
The 7 account is man

.............

5. file comparison

Diff
Diff is used to compare the differences between two files. It is generally used to compare ASCII plain text files. We need to pre-process the next file first. Assume that I want to delete the content of/etc/passwd from the fourth line, and the sixth line is replaced with "no six line". The new file is placed in/tmp/test, so what should we do?
[Root @ Linux ~] # Mkdir-P/tmp/test
[Root @ Linux ~] # Cat/etc/passwd | \
> SED-e '4d '-e' 6C no six line'>/tmp/test/passwd
# Note: If SED is followed by more than two actions, you must add-E before each action!
Next, let's discuss how to use diff!
[Root @ Linux ~] # Diff [-BBI] From-file to-file
Parameters:
From-file: a file name, used as the file name of the original comparison file;
To-file: a file name, used as the file name for object comparison;
Note: From-file or to-file can be replaced by-, which represents the meaning of "standard input.
-B: Ignore the single row, only the difference between multiple blank spaces (for example, "about me" and "about me" are considered the same
-B: ignore the differences between blank rows.
-I: case insensitive.
Example:
Example 1: Compare the differences between/tmp/test/passwd and/etc/passwd:
[Root @ Linux ~] # Diff/etc/passwd/tmp/test/passwd
4d3 <= here, the fourth row of the file on the left (/etc/passwd) is deleted (d)
<ADM: X: 3: 4: ADM:/var/adm:/sbin/nologin
6c5 <= here, the sixth row of the left object is replaced by the fifth row of the Right Object (/tmp/test/passwd ).
<Sync: X: 5: 0: Sync:/sbin:/bin/Sync
---
> No six line
# Very clever! Use diff to compare the processing we just finished!
It's really easy to compare files with diff! In addition, diff can compare the differences in the entire directory.

[Root @ Linux ~] # Diff/etc/tmp/test

You can also compare the content of the same file name in different directories.

CMP
Compared with the extensive use of diff, CMP does not seem to be used so much ~ CMP is mainly comparing two files. It mainly uses the "bit" unit for comparison. Therefore, it can also be compared with binary file ~ (We still need to remind you that diff is mainly compared in the unit of "row", while CMP is compared in the unit of "bit". This is not the same !)
[Root @ Linux ~] # CMP [-S] file1 file2
Parameters:
-S: lists the positions of all different points. Because CMP presets only output the first difference of discovery.
Example:
Example 1: Compare/etc/passwd and/tmp/test/passwd with CMP
[Root @ Linux ~] # CMP/etc/passwd/tmp/test/passwd
/Etc/passwd/tmp/test/passwd differ: byte 106, line 4
Have you seen it? The difference of the first discovery is in the fourth row, and the number of digits is in the third place!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More