Detailed description of regular expressions and three Linux text processing tools, regular expressions linux
Grep, sed, and awk are both text processing tools. Although both are text processing tools, they both have their own advantages and disadvantages. A text processing command cannot be completely replaced by another one, otherwise, there will be no three text processing commands.
I. Regular Expressions
1. Types of matching characters
[A-z]: lowercase letter
[A-Z]: uppercase letters
[A-Z]: small or uppercase letters
[0-9]: Number
[A-zA-Z0-9]: a character that matches a letter or number
.: Match 1 arbitrary character, except for spaces
[0-f]: hexadecimal number
Abc | def: abc or def
A (bc | de) f: abcf or adef
\ <: A word is generally separated by spaces or special characters. Consecutive strings are treated as words.
\>: End of a word
[^ Expression]: All characters except lowercase letters, and so on.
2. Use the following symbol to control the matching quantity.
The expression at the first point must be on the left side of the symbol.
Expression *: 0 or n characters
Expression +: 1 or n characters
Expression? : 0 or 1 Character
Expression {n}: n characters
Expression {n: m}: n to m characters
Expression {n ,}: at least n characters
[Example] [a-z] * indicates that 0 or more lower-case letters are matched.
3. Control matching characters at the beginning and end
^ Expression: the header matches
Expression $: tail matches
Ii. Three Linux text processing tools
1. egrep filtering Tool
Extended version of grep, which can use regular expressions
Syntax:
Egrep-option 'regular expression' file name
Option:
-N: displays the row number.
-O: only the Matching content is displayed.
-Q: silent mode, no output, $? To determine whether the execution is successful, that is, whether the desired content is filtered.
-L: If the match succeeds, only the file name is printed. If the match fails, the file name is not printed. Generally,-rl is used together. grep-rl 'root'/etc
-A: If the match succeeds, the matching rows and the last n rows are printed together.
-B: If the match is successful, the matching rows and the first n rows are printed together.
-C: If the match is successful, the matching rows and the n rows before and after them are printed together.
-- Color
-C: If the matching succeeds, the number of matched rows is printed.
-I: case insensitive
-V: reverse. Mismatch
-W: match words
2. sed stream Editor
Syntax:
Syntax 1: sed-option 'digit location + command 'file name
Option:
-N: silent mode, no output
-E: Multiple edits. This is not clear.
-I: directly modify the file content instead of the output content.
-R: Extended Mode. You can use a regular expression.
-F: Specifies the file name and writes the action in the new file.
Command:
A: append,
C: change,
D: delete,
I: insert. I can be followed by strings. These strings will appear in the new row (the previous row)
P: print
S: replace substitute and you can directly replace it. Generally, this s action can be combined with a regular expression. For example, 1, 20 s/old/new/g
* S command special instructions:
Use {command 1: Command 2: Command 3} to add multiple commands
Syntax 2: sed-R' replace command s/regular expression/replace content/greedy option G' file name
Two Methods for locating:
① Digital positioning (input row serial number positioning)
Decimal number
1: Single Row
1, 3: The range is from the first row to the third row.
2, + 4: match several rows
4 ,~ 3: multiple rows from the fourth row to the next 3
2 ~ 3: The rows at intervals of three rows starting from the second row
$: Tail row
1! : Except the first line
[Example] sed-n '1p'/etc/passwd
② Regular Expression Positioning
Regular Expressions must be wrapped in //
To extend the regular expression, you must use the-r parameter or escape it.
Replace the child mode that can use a regular expression, that is, Parentheses (). \ 1 and \ 2 can represent the child mode.
[Example] sed-r's/(.) (.)/\ 2 \ 1/file1 indicates replacing the first part and the second part.
* Greedy option: Fill in g to replace all matching items in a row.
3. awk Text Analysis Tool
It is a combination of commands, regular expressions (which need to be surrounded by //), comparison, and relational operations.
Use the-F parameter in option to define the delimiter
In the order of $1, $2, and $3, each line in files is separated by an interval symbol for different columns. The NF variable indicates the number of fields in the current record.
Syntax
Awk-option parameter 'logical judgment {command variable 1, variable 2, variable 3} 'file name
Option
-F defines the field separator. The default Delimiter is consecutive spaces or tabs.
-V defines variables and assigns values. You can also use the following method to introduce them from shell variables.
AWK variable
Number of current records of NR (Statistics after all files are connected)
Number of current records of FNR (only statistics of the current file, not all)
The default delimiter of the FS Field is a consecutive space or tab. You can use multiple symbols as the separator-F [:/]
The default delimiter of OFS output characters is space.
# Awk-F: 'ofs = "====" {print $1, $2} '/etc/passwd
Root = x
Number of fields in the row currently read by NF
The default delimiter of ORS output records is line feed.
# Awk-F: 'ors = "====" {print $1, $2} '/etc/passwd
Root x ==== bin x ====
Current FILENAME file name
[Example 1] use the AWK variable
# awk '{print NR,FNR,$1}' file1 file2 1 1 aaaaa2 2 bbbbb3 3 ccccc4 1 dddddd5 2 eeeeee6 3 ffffff#
[Example 2] Method for referencing shell Variables
# A = root # awk-v var = $ a-F: '$1 = var {print $0}'/etc/passwd or pass the entire command Apart, expose shell variables, # awk-F: '$1 = "' $ a'" {print $0} '/etc/passwd # a = NF # awk-F: '{print $' $ a'} '/etc/passwd
Logical operation (operations can be performed by directly referencing a domain)
= + =-=/= * = Value assignment
& |! Logical and logical or non-logical
~ !~ Match the regular expression or do not match the regular expression. The regular expression must be enclosed by/regular /.
<<=>>=! === When comparing a string with a link, double quotation marks must be used to enclose the string.
$ Field Reference: $ is required for field reference, and variable reference is directly obtained using the variable name.
+-*/% ++ -- Operator
Escape Sequence
\\\ Itself
\ $ Escape $
\ T Tab
\ B Return character
\ R carriage return
\ N linefeed
\ C cancel line feed
The above is the regular expression and three Linux text processing tools introduced by xiaobian. I hope it will help you. If you have any questions, please leave a message and I will reply to you in a timely manner. Thank you very much for your support for the help House website!