Linux Regular expressions

Last Update:2018-08-04 Source: Internet

Author: User

Tags alphabetic character egrep

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1th. What is a regular expression

Regular expressions are a set of rules and methods that are defined to handle a large number of text | strings
By defining the assistance of these special symbols, the system administrator can quickly filter, replace, or output the required strings. Linux regular expressions are typically handled in a behavioral unit.

Simply put

A set of rules and methods defined for processing large amounts of text | strings
One line at a time with a unit of behavior

A regular expression is a pattern that describes a set of strings, similar to a numeric expression, that makes up smaller expressions with various operators

2nd why regular expressions are used

Linux operation and maintenance work, a large number of filtering log work, to simplify the complex.
Simple and efficient.
Regular expression advanced tools; all three musketeers support

The 3rd chapter is easy to confuse two precautions

Regular expressions are widely used in various languages, and PHP perl grep sed is supported by awk. LS * wildcard character
But now we are learning regular expressions in Linux, and the command that most often uses regular expressions is grep (egrep), Sed,awk.
There are essential differences between regular expressions and wildcard characters

The regular expression is used to find: "File" content, text, string. Generally only three Musketeers support
Wildcards are used to find: file names, common commands are supported

4th notes on the use of regular expressions

Linux Regular expressions handle strings in behavioral units
Easy to distinguish the filter out of the string, must cooperate with grep/egrep command learning.

Note the character set, Exportlc_all=c: Whenever you do, pay attention to the character set

Chapter 5th Classification of regular expressions

The POSIX specification divides regular expressions into two

Basic Regular Expressions (bre,basic regular expression)
Advanced Features: Extended regular Expressions (ere,extended regular expression)

The difference between the 5.1 bre and Ere is only the difference between metacharacters:

The BRE (underlying regular expression) admits only metacharacters with ^$. []* other characters recognized as ordinary characters: \ (\)
ERE (extended regular expression) adds () {}?+|, etc.
The character () {} is only treated as a meta-character in the Bre when escaped with a backslash "", and Ere, any meta-symbol preceded by a backslash will instead be treated as a normal character.

The 6th chapter How to distinguish the wildcards regular expression of the pass-distribution

No need to think about the method of judgment: In the Three Musketeers awk,sed,grep,egrep are regular, the other is a wildcard
The simplest way to differentiate between the wildcards regular of a pass and the expression:

(1) file directory name ===> wildcard character
(2) file contents (string, Text "file" content) ===> Regular expression

Wildcard wildcards regular expressions have "*", "?", "" ", but these symbols of wildcards can represent any character themselves, and these symbols of regular expressions can only represent the characters in front of these symbols

7th Basic Regular Expression 7.1 basic regular expression

character	Description
^	^word Search for content that starts with Word

$	word$ Search for content ending in Word

^$	represents a blank line, not a space

.	represents and can only represent any one character (does not match a blank line)

\	escape character, let the character with special meaning take off vest, show the prototype, like \. Denotes only the decimal point

more times in a row

*	repeats the previous character or text 0 or more, preceding the text or character 0 or

.*	any number of characters

^.*	start with any number of strings,. * As much as possible, how much to count, greedy sex

Bracket Expression
[ABC] [0-9] [\.,/]	Matches any one of the characters in a character set, A or B, or c:[a-z] matches all lowercase letters; denotes a whole, with infinite possibilities; [ABC] Find A or B or C can be written in [A-c]

[^ABC]	matches any character A or B or C that does not contain a ^, is an inverse of [ABC] and differs from the meaning of ^

a\{n,m\}	repeats the front a character N to M times (if you use Egrep or sed-r to remove the slash)

a\{n,\}	repeat the previous a character at least n times, if you use Egrep or sed-r to remove the slash
A\{n\}	Repeat the previous a character n times, if you use Egrep or sed-r to remove the slash
---	---

The 8th chapter extends the regular expression ere

Special Characters	Meanings and examples
+	Repeats the previous character one or more times, one or more of the previous characters, and takes the consecutive text/character out

?	Repeat the previous character 0 or 1 times (. Yes and only 1)

Pipe character	indicates or filters multiple characters at the same time

()	grouping filter is surrounded by something that represents a whole (one character), a back reference

The 9th chapter of the regular summary

Basic Regular: BRE
|^|$|.| |.| [abc]| [^abc]|
|---|---|
Extended Regular: ERE
|+|||?| ()| {}|a{n,m}|a{n,}|a{n}|
|---|---|
Escape character \: Change the meaning of a character (does not support regular symbols, change character meaning is regular, support regular conversion to ordinary character meaning)

Attention:

GREP does not support regularization by default, so the notation for regular expressions is equivalent to the ordinary character meaning for grep, so it is necessary for grep to handle the regular symbol directly with the escape character \{\}.

Grep-e force grep to know the regular symbol directly, no need to escape

Egrep equivalent Grep-e is born to recognize regular symbols

We usually backup can be done through the form of CP file name {,. bak}, to avoid hitting the file name again
Sed-r: let sed support the regular

The 10th Chapter basic regular and extended regular difference

the base regular Bre	Extended Regular Ere
\?	?
\+	+
\{\}	{}
\( \ )	()
\

The so-called basic regular is actually the need to escape the character mate expression of the regular, and the extension is to let the command extend its permissions so that he directly know the regular expression symbol (Egrep,sed-r,awk direct support)

The 11th chapter adds 11.1 Some pre-defined:

Regular Expressions	Description	Example
[: Alnum:]	[a-za-z0-9] matches any one letter or number character	[[: alnum:]]+
[: Alpha:]	Match any alphabetic character (including uppercase and lowercase letters)	[[: Alpha:]] {4}
[: Blank:]	Spaces and tabs (Landscape portrait)	[[: blank:]]*
[:d Igit:]	Match any numeric character	[[:d Igit:]]?
[: Lower:]	Match lowercase letters	[[: Lower:]] {5,}
[: Upper:]	Match uppercase letters	([[: upper:]]+)?
[:p UNCT:]	Match punctuation	[[:p UNCT:]]
[: Space:]	Match all whitespace characters including line feed, enter, etc.	[[: space:]]+
[: Graph:]	Matches any character that can be seen and can be printed	[[: Graph:]]
[: Xdigit:]	Any one hexadecimal number	[[: xdigit:]]+
[: Cntrl:]	Any one control character (the first 32 characters in the ASCII character set)	[[: Cntrl:]]
[:p rint:]	Any one of the characters that can be printed	[[:p rint:]]

11.2 Meta Characters

Metacharacters is a Perl-style regular expression, and only a subset of the text processing tools support it, not all text processing tools support

Regular Expressions	Description	Example
\b	Word boundaries	\bcool\b matching cool, mismatched coolant
\b	Non-word boundary	cool\b matching coolant not matching cool
\d	Single numeric character	B\DB match Business-to-business, mismatch BCB
\d	Single non-numeric characters	B\DB matching BCB mismatch business-to-business
\w	Single word characters (letters, numbers, and _)	\w match 1 or a, mismatch &
\w	Single non-word character	\w match &, mismatch 1 or a
\ n	Line break	\ n matches a new row
\s	Single whitespace character	X\SX matches xx, does not match xx
\s	Single non-whitespace character	X\S\X matches Xkx, does not match xx
\ r	Enter	\ r Match Carriage return
\ t	Horizontal tab	\ t matches a horizontal tab
\v	Vertical tab	\v matches a vertical tab
\f	Page break	\f Match a page break

The 12th chapter summary of regular expressions

Egrep/grep understand the regular, simple look at the effect, the results
Egrep/grep-o parameters See what exactly matches the exact
More good, with Grep,egrep,sed-r,awk more powerful

Linux Regular expressions

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More