Detailed description of regular expressions and three Linux text processing tools, regular expressions linux

Source: Internet
Author: User
Tags egrep

Detailed description of regular expressions and three Linux text processing tools, regular expressions linux

Grep, sed, and awk are both text processing tools. Although both are text processing tools, they both have their own advantages and disadvantages. A text processing command cannot be completely replaced by another one, otherwise, there will be no three text processing commands.

I. Regular Expressions

1. Types of matching characters

[A-z]: lowercase letter

[A-Z]: uppercase letters

[A-Z]: small or uppercase letters

[0-9]: Number

[A-zA-Z0-9]: a character that matches a letter or number

.: Match 1 arbitrary character, except for spaces

[0-f]: hexadecimal number

Abc | def: abc or def
A (bc | de) f: abcf or adef

\ <: A word is generally separated by spaces or special characters. Consecutive strings are treated as words.

\>: End of a word

[^ Expression]: All characters except lowercase letters, and so on.

2. Use the following symbol to control the matching quantity.

The expression at the first point must be on the left side of the symbol.

Expression *: 0 or n characters

Expression +: 1 or n characters

Expression? : 0 or 1 Character

Expression {n}: n characters

Expression {n: m}: n to m characters

Expression {n ,}: at least n characters

[Example] [a-z] * indicates that 0 or more lower-case letters are matched.

3. Control matching characters at the beginning and end
^ Expression: the header matches

Expression $: tail matches

Ii. Three Linux text processing tools

1. egrep filtering Tool

Extended version of grep, which can use regular expressions

Syntax:

Egrep-option 'regular expression' file name

Option:

-N: displays the row number.
-O: only the Matching content is displayed.
-Q: silent mode, no output, $? To determine whether the execution is successful, that is, whether the desired content is filtered.
-L: If the match succeeds, only the file name is printed. If the match fails, the file name is not printed. Generally,-rl is used together. grep-rl 'root'/etc
-A: If the match succeeds, the matching rows and the last n rows are printed together.
-B: If the match is successful, the matching rows and the first n rows are printed together.
-C: If the match is successful, the matching rows and the n rows before and after them are printed together.
-- Color
-C: If the matching succeeds, the number of matched rows is printed.
-I: case insensitive
-V: reverse. Mismatch
-W: match words

2. sed stream Editor

Syntax:

Syntax 1: sed-option 'digit location + command 'file name

Option:

-N: silent mode, no output
-E: Multiple edits. This is not clear.
-I: directly modify the file content instead of the output content.
-R: Extended Mode. You can use a regular expression.
-F: Specifies the file name and writes the action in the new file.

Command:

A: append,
C: change,
D: delete,
I: insert. I can be followed by strings. These strings will appear in the new row (the previous row)
P: print
S: replace substitute and you can directly replace it. Generally, this s action can be combined with a regular expression. For example, 1, 20 s/old/new/g

* S command special instructions:

Use {command 1: Command 2: Command 3} to add multiple commands

Syntax 2: sed-R' replace command s/regular expression/replace content/greedy option G' file name

Two Methods for locating:

① Digital positioning (input row serial number positioning)

Decimal number
1: Single Row
1, 3: The range is from the first row to the third row.
2, + 4: match several rows
4 ,~ 3: multiple rows from the fourth row to the next 3
2 ~ 3: The rows at intervals of three rows starting from the second row
$: Tail row
1! : Except the first line

[Example] sed-n '1p'/etc/passwd

② Regular Expression Positioning

Regular Expressions must be wrapped in //

To extend the regular expression, you must use the-r parameter or escape it.

Replace the child mode that can use a regular expression, that is, Parentheses (). \ 1 and \ 2 can represent the child mode.

[Example] sed-r's/(.) (.)/\ 2 \ 1/file1 indicates replacing the first part and the second part.

* Greedy option: Fill in g to replace all matching items in a row.

3. awk Text Analysis Tool

It is a combination of commands, regular expressions (which need to be surrounded by //), comparison, and relational operations.

Use the-F parameter in option to define the delimiter

In the order of $1, $2, and $3, each line in files is separated by an interval symbol for different columns. The NF variable indicates the number of fields in the current record.

Syntax

Awk-option parameter 'logical judgment {command variable 1, variable 2, variable 3} 'file name

Option

-F defines the field separator. The default Delimiter is consecutive spaces or tabs.
-V defines variables and assigns values. You can also use the following method to introduce them from shell variables.

AWK variable

Number of current records of NR (Statistics after all files are connected)
Number of current records of FNR (only statistics of the current file, not all)
The default delimiter of the FS Field is a consecutive space or tab. You can use multiple symbols as the separator-F [:/]
The default delimiter of OFS output characters is space.
# Awk-F: 'ofs = "====" {print $1, $2} '/etc/passwd
Root = x
Number of fields in the row currently read by NF
The default delimiter of ORS output records is line feed.
# Awk-F: 'ors = "====" {print $1, $2} '/etc/passwd
Root x ==== bin x ====
Current FILENAME file name

[Example 1] use the AWK variable

# awk '{print NR,FNR,$1}' file1 file2 1 1 aaaaa2 2 bbbbb3 3 ccccc4 1 dddddd5 2 eeeeee6 3 ffffff# 

[Example 2] Method for referencing shell Variables

# A = root # awk-v var = $ a-F: '$1 = var {print $0}'/etc/passwd or pass the entire command Apart, expose shell variables, # awk-F: '$1 = "' $ a'" {print $0} '/etc/passwd # a = NF # awk-F: '{print $' $ a'} '/etc/passwd

Logical operation (operations can be performed by directly referencing a domain)

= + =-=/= * = Value assignment

& |! Logical and logical or non-logical

~ !~ Match the regular expression or do not match the regular expression. The regular expression must be enclosed by/regular /.
<<=>>=! === When comparing a string with a link, double quotation marks must be used to enclose the string.

$ Field Reference: $ is required for field reference, and variable reference is directly obtained using the variable name.

+-*/% ++ -- Operator

Escape Sequence

\\\ Itself
\ $ Escape $
\ T Tab
\ B Return character
\ R carriage return
\ N linefeed
\ C cancel line feed

The above is the regular expression and three Linux text processing tools introduced by xiaobian. I hope it will help you. If you have any questions, please leave a message and I will reply to you in a timely manner. Thank you very much for your support for the help House website!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.