Linux Regular Expressions

Last Update:2013-12-30 Source: Internet

Author: User

Tags file url printable characters egrep

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. linux text search command

Before speaking about linux regular expressions, we also introduce three common commands for searching text files in linux:

1. grep: the earliest text matching program that uses the basic regular expression (BRE) defined by POSIX to match the text.

2. egrep: Extended grep, which uses an extended regular expression (ERE) to match text.

3. fgrep: Quick grep. This version matches fixed strings rather than regular expressions. It is the only version that can match multiple strings in parallel.

The following is a brief introduction to the grep command:

Syntax format:

Grep [options...] pattern-spec [files...]

Purpose:

Match one or more text lines.

Options:

-E: use an extended regular expression for matching. grep-E or replace the egrep command.

-F: Use a fixed string for matching. grep-F or replace the traditional fgrep command.

-E: Generally, the first non-option parameter is considered to be the pattern to be matched. It can also provide multiple modes at the same time, as long as it is placed in single quotes and separated by line breaks.

When the mode starts with a minus sign, to prevent confusion as the option, the-e Option indicates that the subsequent parameter is the mode, even if it starts with a minus sign.

-F: The read mode from the pat-file is matched.

-I: Case sensitivity differences are ignored during pattern matching.

-L: lists the names of matching files, rather than printing matching rows.

-Q: silent. If the match succeeds, the matching row is not output to the standard output; otherwise, the matching row is unsuccessful.

-S: the error message is not displayed. It is usually used with-q.

-V: displays the rows in unmatched mode.

Note: You can search for the content of multiple files at the same time. When multiple files are specified, a colon is added to the file name before each row to identify the file from.

You can use multiple-e or-f options to create a list of modes to be searched.

Ii. Regular Expressions

1. Regular Expression Composition

(1). General characters: characters with no special meaning

(2). Special characters (meta characters): metacharacters, which have special meanings in Regular Expressions

2. The following describes the common meta characters in regular expressions.

(1). meta characters in posix bre and ERE:

\: It is usually used to open or close the special meanings of subsequent characters, such as \ (... \) and \{...\}

.: Match any single character (except NUL)

*: Match any number or single character before it. For example, if "." represents any character, ". *" matches any length of any character.

^: Matches the followed regular expression. BRE has special meanings only at the beginning of the regular expression, and ERE has special meanings at any position.

$: Match the regular expression at the end of a string or line. BRE only has a special meaning at the end of the regular expression, and ERE has a special meaning at any position.

[]: Match any character in square brackets, where hyphens (-) can be used to indicate the range of consecutive characters; ^ The symbol bitter appears at the first position in square brackets, match any character that is not in the list,

(2) characters in posix bre:

\ {N, m \}: interval expression, which indicates the number of times a single character before it is reproduced. \ {N \} refers to the reproduction of n times; \ {n, m \} refers to the reproduction of n to m times;

\ (\): Reserved space. up to nine independent sub-modes can be stored in a single mode. For example, \ (AB \). * \ 1: indicates that a combination of AB can be reproduced twice, and any number of characters can exist in the middle.

\ N: Repeat the pattern of nth subpattern To This vertex in \ (and \) square brackets.

(3) characters in posix ere:

{N, m}: Same as \ {n, m \} of BRE

+: Match one or more extensions of the previous regular expression.

? : Matches zero or one extension of the previous regular expression.

|: Match | regular expression before or after a symbol

(): Regular expression group enclosed by square brackets

(4) square brackets ([]) Expression

4. 1. Character Set [:]

The following types of character sets are supported:

[: Alnum]: digit character	[: Digit:]: digit character	[: Punct:]: punctuation character
[: Alpha:]: letter	[: Graph:]: non-space characters	[: Space:]: space character
[: Blank:]: space and positioning character	[: Lower:]: lowercase letter	[: Upper:]: uppercase letter
[: Cntrl:]: Control Character	[: Print:]: printable characters	[: Xdigit:]: hexadecimal number

4. 2. Sort symbols

Multiple characters are considered as one symbol. For example, [. ch.] indicates that ch is regarded as one symbol.

4. 3. Equivalent characters

Multiple characters are considered to be equal. For example, [= e =] can match multiple characters similar to e in locale of French, which is not listed here.

Note: In addition to square brackets, these three structures must be enclosed by square brackets.

Example: [[: alpha:]!] : Match any English letter or exclamation point.

[[. Ch.]: matches the ch sorting element, but does not match a separate letter c or h.

3. Simple Regular Expression matching case

China: Match rows with any Chinese characters in this row

^ China: match the line with the china Switch

China $: Match rows ending with china

^ China $: Match rows with only five Chinese Characters

[Cc] hina: Match rows containing China or china

Ch. na: match a row that contains two letters (Ch), followed by any character, and contains two characters (na ).

Ch. * na: match a row containing Ch characters followed by 0 or more characters, and then continue with na.

Ii. Instance

For example, we usually use the delimiter to learn breand erematching. The content of the source file url.txt is as follows:

Www.baidu.com
Http://www.baidu.com
Https://www.baidu.com
Http: // wwwbaiducom
Baidu.com
Baidu

1. url matching

Matches a string that starts with http or https and is followed by: and contains.

BRE matching:

Grep '^ https \ {0, 1 \}. * \... * 'url.txt

ERE match:

Grep-E '^ https ?. * \ .. * 'Url.txt

The matching result is as follows:

Http://www.baidu.com
Https://www.baidu.com

2. Email matching

The content of the sample file is:

Hfutwyy@qq.com
Aaaa @
Aaa @. com
Aaa@gmail.com

@ Baidu.com

Matches multiple characters starting with a letter, digit, or underscore, followed by a @, followed by multiple letters, numbers, or underscores, with.

Grep '^ [[: alpha:] [: digit:] _] * @ [[: alpha:] [: digit:] * \ .. * 'email.txt

Matching result:

Hfutwyy@qq.com
Aaa @. com
Aaa@gmail.com

First come here, and then write.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More