Transferred from: https://www.cnblogs.com/sheeva/p/6406285.html
Introduction
As a preference for Windows programmers, in the past do text processing is always like under Windows with notepad++ and other graphical tools, such as sometimes need to put a Linux server file a global string replacement such a simple operation, You have to go down to the local editor and pass it back. These two days bought this "bird elder brother's Linux private dish", finally quiet heart system learning a bit of Linux under the text processing, feel actually not imagined difficult, if earlier learned to save a lot of time certainly far more than the time spent studying.
Overview
Let's talk about what this article is about:
- Simply review the regular expression, and if you are familiar with the regular, at least know that the regular is divided into basic regular and extended regular can skip that part.
- The main body of this article: 4 Linux Text Processing commands: grep, sed, printf, awk.
The following officially begins.
Regular Expression Review
This part is for the regular expression of the reader to briefly review the regular expression, if you have not learned the regular expression of the reader suggested first to find relevant information to learn the regular expression and then look at this article.
Regular expressions are divided into basic regular expressions and extended regular expressions, as follows:
Basic Regular Expressions
Regular expression characters |
Meaning |
^word |
Find text that starts with Word |
word$ |
Find text that ends in Word |
. |
Represents an arbitrary character |
\ |
Escape character |
* |
0 to multiple characters |
[ABC] |
Represents a character, the character is a or B or C |
[A-z];[0-9] |
Represents a character from A to Z; a number from 0 to 9 |
[^ABC] |
Represents a character other than a, B, c |
{M,n} |
M to n characters |
Extending regular Expressions
Regular expression characters |
Meaning |
+ |
One or more characters |
? |
0 or one character |
| |
Or |
() |
Group |
Text Processing command grep
The role of grep is to look up characters by row and output lines that contain characters.
grep usage:
grep is generally used in two ways, from a file lookup to a pipeline input,
- grep ' word ' file.txt
- Cat File.txt|grep ' word '
Common parameters for grep:
Parameters |
Meanings and examples |
-N |
Output result plus line number |
--color=auto |
matching keyword highlighting |
-a3 |
Outputs the following three rows of the matching row |
-b2 |
Outputs the first two rows of a matching row |
-V |
Reverse lookup, that is, the output of lines that do not contain keywords |
-I. |
Ignore keyword capitalization when keyword matches |
grep uses tips:
In most cases we want to highlight keywords (using the--color=auto parameter), so you can add them in the ~/.BASHRC file:
Alias grep= ' grep--color=auto '
, and then use
SOURCE ~/.BASHRC
Let the configuration take effect. This automatically takes the--color=auto parameter when we use grep.
GREP uses the example:
grep lookup is mainly based on the basic regular expression matching, the following is simply to give some common examples for reference.
grep ' t[ae]st '//Find tast or test
grep ' [0-9] '//Find numbers
grep ' [^a-z]oo '//Find Xoo, where x is a non-a-to-Z character
grep ' ^the '//Find the character starting with the, note here that the ^ appears in [] to represent a "non-character", as in the previous example, appearing outside [] to represent "start with a character", as in this example.
grep ' ^$ '//Find blank line
grep ' o\{2\} '//Find two O, it is important to note that {} has special meaning in the shell and therefore needs to be escaped, which is different from the normal use of the general, need to be noted.
Egrep
We know that regular expressions are divided into basic regular expressions and extended regular expressions, but grep supports only basic regular expressions, and if you use an extended regular expression, you need to use the Egrep command.
A few examples:
Egrep ' Gd|good '//Find GD or good
Egrep ' G (la|oo) d '//find glad or good
Egrep ' A (XYZ) +c '//Find AXC, where x is one or more ' xyz ' strings.
Sed
SED is a powerful command that can be used for 5 operations such as row deletion , row additions , row selection , row substitution , and string substitution .
Sed is a pipeline command that can handle pipeline input.
1. Delete rows
nl/etc/passwd | Sed ' 2d '//delete line 2nd
The input pipeline is omitted below
Sed ' 2,5d '//delete line 2nd to 5th
Sed ' 3, $d '//delete 3rd to last line, $ for last line
Sed '/^$/d '//delete empty lines
2. Line New
Sed ' 2a drink tea '//Add one line below the second line "Drink tea", A for Append
Sed ' 2i drink tea '//insert a row above the second line "Drink tea", I for insert
SED ' 2a a\
B\
C '//Add three lines "a", "B", "C" below the second line, only the end of each line will be added "\".
3. Row selection
Sed-n ' 5,7p '//Select the 5th to 7th line of output, you must add the-n parameter, otherwise the effect is that all rows are output, and 5 to 7 lines output two times.
4. Line substitution
Sed ' 2,5c no 2~5 lines '//Replace line 2nd to 5th with a line of string "No 2~5 lines"
5. String substitution
Sed ' s/string to be replaced/new string/g '//fixed format, beginning with S ending is G, middle three/separating the string to be replaced and the new string, note that the string to be replaced here can be a regular expression.
Write the results of the operation directly to the file
The default is to use SED to modify the file, just output the modified file, you can use > write to the new file. However, if you want to modify the original file, do not > to the original file, so that the result is the original file is emptied directly. To modify the original file, you can use the-i parameter, such as:
Sed-i ' 2d ' file.txt//directly deletes the second line in the original file.
It is very dangerous to modify the original file directly and cannot be restored once the error is corrected. You can print out the modified result without the-I parameter, and then add the-I parameter to confirm the error.
Printf
printf This command is not well described in words, but it is understood by the hands of the hand.
Save the following content as Printf.txt:
Name Chinese 中文版 Math averagedmtsai 77.33VBird 70.00Ken 60 90 70 73.33
Cat look first, the following effect:
Now take a look at the printf directive and add some parameters to perform
printf '%10s%10s%10s%10s%10s \ n ' Cat printf.txt '
Output Result:
is not much more beautiful than the results of cat output.
%10s represents this column with a fixed width of 10 characters. More formats are not introduced, this article we master a%10s is enough.
printf is not a pipe command, and to use it, you must use ' cat printf.txt ' to bring up the contents of the file as in the command above.
printf is widely used and is also applied to the printf command later in the awk command.
Awk
The awk command mainly handles the separation of files by separating them into columns, and can also be used to perform different processing of different rows through conditional judgments, even for numerical calculations ~
We also learn by example.
Let's take a look at the last logged 5 user information:
The first column in the diagram is the user name, the third column is the user IP, and now we want to pick out these two columns, which can be done with awk:
Last-5|awk ' {print $ \ t ' $ $} '
Output:
The command looks quite complicated, don't worry, it's simple.
First, awk uses a fixed format: awk ' {command} ', single quotes and curly braces are fixed formats.
And then the above command is
print "\ T" $ //awk The default is to separate each row into N columns with spaces and tabs, representing the first column and $ A for the third column.
It's a lot easier to see.
The data generated by the last command is tab delimited by default, and now we look at another example, executing cat/etc/passwd:
The data generated for each row is delimited, so you want to use awk to output the first and third columns to execute the delimiter:
Cat/etc/passwd|awk-f ': ' {print $ \ t ' $ '// /-F ': ' represents the specified use: as a delimiter
Execution Result:
In addition to special symbols such as $1,$3,
The following special symbols can also be used in Awk's commands:
NF: Number of columns separated by each row
NR: Line number
Here is a comprehensive example of AWK's conditional judgments and numerical calculations, with a set of data saved as Pay.txt:
Name 1st 2nd 3rdVBird 23000 24000 25000DMTsai 21000 20000 23000bird2 43000 42000 41000
Now you want to add a column "Total" to calculate the sum of the values for each row.
This requirement can be accomplished with awk:
Cat Pay.txt |awk ' nr==1 {printf "%10s%10s%10s%10s%10s \ n", $1,$2,$3,$4, "total"}; nr>1 {printf "%10s%10s%10s%10s%10s \ n", $1,$2,$3,$4,$2+$3+$4} '
Operation Result:
Here are a few points:
- When the condition is added, awk is in the format: awk ' condition 1 {command 1}; condition 2{command 2} '
- The conditional judgment has the following logical operation:
- >
- <
- >=
- <=
- = =//Note equal to two equals
- !=
- You can directly calculate the values ($, $, $) in the row column.
Summarize
This article first reviews the regular expressions (basic regular expressions, extended regular expressions), then introduces 4 common commands, and finally we summarize the purpose of the four commands:
Command |
Use |
Grep/egrep |
Keyword search |
Sed |
- Row Delete, add, replace, select
- Keyword substitution
|
Printf |
File format output |
Awk |
- Split each row into columns by delimiter and select some columns
- Different processing of non-peers by logical judgement
- Calculate a number of columns in a row
|
File Summary of common commands for Linux text processing