Detailed description of Linux-based regular expressions (basic regular expressions and extended regular expression commands use instances) and linux Regular Expressions

Source: Internet
Author: User
Tags character classes printable characters expression engine egrep

Detailed description of Linux-based regular expressions (basic regular expressions and extended regular expression commands use instances) and linux Regular Expressions

Preface

Regular Expressions are widely used and can be used perfectly in most programming languages. They are also of great use in Linux.

You can use regular expressions to effectively filter out the required text and then use the corresponding Supported tools or languages to meet the task requirements.

In this blog, we use grep/egrep to call regular expressions. In fact, we can also use tools such as sed. However, the use of sed requires many regular expressions, in order to write the next sed article, we can only sort it in this way. If you need it, you can read the two articles together.

Regular Expression type

Regular Expressions can be implemented using the regular expression engine, which is the basic software for interpreting Regular Expression Patterns and matching texts using these patterns.

In Linux, common regular expressions include:

-POSIX basic regular expression (BRE) Engine

-POSIX extended regular expression (BRE) Engine

Basic use of basic Regular Expressions

Environment text preparation

[root@service99 ~]# mkdir /opt/regular[root@service99 ~]# cd /opt/regular[root@service99 regular]# pwd/opt/regular[root@service99 regular]# cp /etc/passwd temp_passwd

Plain text

Plain text can completely match the corresponding words. Note that the regular expression mode is case sensitive.

// Grep -- color is used to highlight the matched text, so that you can easily observe the effect [root @ service99 regular] # grep -- color "root" temp_passwd root: x: 0: 0: root:/bin/bashoperator: x: 11: 0: operator:/root:/sbin/nologin

In a regular expression, you do not need to limit it to a complete word. The defined text appears anywhere in the data stream, and the regular expression will match.

[root@service99 regular]# ifconfig eth1 | grep --color "add"eth1   Link encap:Ethernet HWaddr 54:52:01:01:99:02      inet addr:192.168.2.99 Bcast:192.168.2.255 Mask:255.255.255.0     inet6 addr: fe80::5652:1ff:fe01:9902/64 Scope:Link

Of course, it does not have to be limited to individual words, or spaces and numbers can appear in text strings.

[root@service99 regular]# echo "This is line number 1" | grep --color "ber 1"This is line number 1

Special characters

Note one problem when using text strings in regular expression mode.

There are several exceptions when defining text strings in regular expressions. Regular Expressions give them special meanings. If these special characters are used in the text, they may not be as expected.

Special characters recognized by regular expressions:

Copy codeThe Code is as follows:
. * [] ^ $ {} +? | ()

If you want to use these special characters as common text characters, you need to escape them, that is, add a special character before the character, to the regular expression engine: it should interpret the next character as a normal text character.

The special character used to implement this function is the backslash (\).

[Root @ service99 regular] # echo "This cat is $4.99" // double quotation marks do not block special characters, so the system will read the value of variable 4.99. However, This variable is not currently available, this cat is empty. 99 [root @ service99 regular] # echo "This cat is \ $4.99" // use "\" Escape $ This cat is $4.99 [root @ service99 regular] # echo 'this cat is \ $4.99 '// block metacharacters in single quotes $ This cat is \ $4.99 [root @ service99 regular] # echo 'this cat is $4.99' This cat is $4.99 [root @ service99 regular] # cat price.txt This price is $4.9. 9 hello, world! $5.00 # $ This is "\". [root @ service99 regular] # grep -- color '\' price.txt This is "\".

Operator

Start from scratch

The Escape Character (^) refers to the pattern starting from the beginning of the Chinese line of the data stream.

[Root @ service99 regular] # grep -- color '^ H' price.txt // The row starting with the letter h: hello, world! [Root @ service99 regular] # grep -- color '^ $' price.txt // No output result, [root @ service99 regular] # grep -- color '^ \ $' price.txt // line starting with $ [root @ service99 regular] # echo" this is ^ test. "> price.txt [root @ service99 regular] # cat price.txt This price is $4.99 hello, world! $5.00 # $ This is "\". this is ^ test. [root @ service99 regular] # grep -- color '^' price.txt // directly use This price is $4.99 hello, world! $5.00 # $ This is "\". this is ^ test. [root @ service99 regular] # grep -- color '\ ^' price.txt // for separate use, and block This is ^ test at the beginning. [root @ service99 regular] # grep -- color 'is ^' price.txt // when the symbol is not at the beginning, you can directly use This is ^ test without blocking.

Search end

Dollar sign $ Special Character defines ending position. After the text mode, add this special character to indicate that the data row must end in this text mode.

[Root @ service99 regular] # grep -- color '\. $ 'price.txt // ". "It also has special meanings in regular expressions. Please block them. For details, refer to This is "\". [root @ service99 regular] # grep -- color '\. $ 'price.txt // because I added a space when entering the file, you need to be careful and be careful about This is ^ test. // In a regular expression, spaces are used as the delimiter. [Root @ service99 regular] # grep -- color '0 $ 'price.txt $5.00 [root @ service99 regular] # grep -- color '9 $' price.txt This price is $4.99

Joint Positioning

"^ $" Is commonly used to indicate empty rows.

Combined with "^ #", because # represents a comment in Linux

Output valid configurations of the text

[Root @ service99 regular] # cat-n/etc/vsftpd. conf | wc-l121 [root @ service99 regular] # grep-vE '^ # | ^ $'/etc/vsftpd. conf // v indicates reverse selection, and E indicates that the extended regular "|" indicates the extended regular symbol, the following code displays response = YESlocal_enable = YESwrite_enable = YESlocal_umask = affinity = YESanon_umask = affinity = YESxferlog_enable = affinity = YESlisten = YESpam_service_name = response = YES

Character range

{N, m} // the first character appears n to m times

{N ,}// the previous character appears more than n times

{N} // The previous character appears n times

[root@service99 regular]# grep --color "12345\{0,1\}" price.txt 1234556[root@service99 regular]# grep --color "12345\{0,2\}" price.txt 1234556

Point character

The dot special character is used to match any single character except the line break, but the dot character must match one character. If there is no character at the dot position, the pattern match fails.

[root@service99 regular]# grep --color ".s" price.txt This price is $4.99This is "\".This is ^ test. [root@service99 regular]# grep --color ".or" price.txt hello,world!

Character class

A character class can define a type of character to match a position in text mode. If a character in the character class is in the data stream, it matches the pattern.
Square brackets must be used to define character classes. All characters in the class should be enclosed in square brackets, and the entire character class should be used in the mode, just like any other wildcard.

[Root @ service99 regular] # grep -- color "[abcdsxyz]" price.txt This price is $4.99 hello, world! This is "\". this is ^ test. [root @ service99 regular] # grep -- color "[sxyz]" price.txt This price is $4.99 This is "\". this is ^ test. [root @ service99 regular] # grep -- color "[abcd]" price.txt This price is $4.99 hello, world! [Root @ service99 regular] # grep -- color "Th [ais]" price.txt // This price that matches the first character after Th in [ais] is $4.99 This is" \". this is ^ test. [root @ service99 regular] # grep-I -- color "th [ais]" price.txt //-I indicates This price is $4.99 This is "\". this is ^ test.

If you cannot determine the case sensitivity of a character, you can use this mode:

[Root @ service99 regular] # echo "Yes" | grep -- color "[yY] es" [] Character Sequence does not affect Yes [root @ service99 regular] # echo "yes" | grep -- color "[Yy] es" yes

You can use multiple character classes in a single expression:

[Root @ service99 regular] # echo "Yes/no" | grep "[Yy] [Ee]" Yes/no [root @ service99 regular] # echo "Yes/no" | grep "[Yy]. * [Nn] "// * Regular Expression usage. See Yes/no.

Character classes also support numbers:

[root@service99 regular]# echo "My phone number is 123456987" | grep --color "is [1234]"My phone number is 123456987[root@service99 regular]# echo "This is Phone1" | grep --color "e[1234]"This is Phone1[root@service99 regular]# echo "This is Phone1" | grep --color "[1]"This is Phone1

The character class also has a very common purpose of parsing words that may be spelled incorrectly:

[root@service99 regular]# echo "regular" | grep --color "r[ea]g[ua]l[ao]"regular

Negative character class

Used to search for characters not in the character class. You only need to add the Escape Character (^) at the beginning of the character class range ).

Even if no character is used, the character class must still match one character.

[root@service99 regular]# cat price.txt This price is $4.99hello,world!$5.00#$#$This is "\".this is ^ test. catcar[root@service99 regular]# sed -n '/[^t]his/p' price.txt This price is $4.99This is "\".[root@service99 regular]# grep --color "[^t]his" price.txt This price is $4.99This is "\".[root@service99 regular]# grep --color "ca[tr]" price.txt catcar[root@service99 regular]# grep --color "ca[^r]" price.txt cat

Scope of use

When you need to match a large number of characters and have certain rules, you can do this:

[Root @ service99 regular] # cat price.txt This price is $4.99 hello, world! $5.00 # $ This is "\". this is ^ test. catcar123454251111806 [root @ service99 regular] # egrep -- color '[a-z]' price.txt This price is $4.99 hello, world! This is "\". this is ^ test. catcar [root @ service99 regular] # egrep -- color '[A-Z]' price.txt This price is $4.99 This is "\". [root @ service99 regular] # grep -- color "[0-9]" price.txt This price is $4.99 $5.00123455691111806 [root @ service99 regular] # sed-n'/^ [^ a-Z]/P' price.txt $5.00 #$123455691111806 [root @ service99 regular] # grep -- color "^ [^ a-Z]" price.txt $5.00 # $ #$123455691111806 [root @ service99 regular] # echo $ LANG // when using [a-Z, pay attention to the LANG environment variable value, if this value is modified, pay attention to the legitimacy of the modified value zh_CN.UTF-8 [root @ service99 regular] # LANG = en_US.UTF-8

Special character class

It is used to match characters of a specific type.

[[: Blank:] space and positioning (tab) characters

[[: Cntrl:] control characters

[[: Graph:] non-space (nonspace) characters

[[: Space:] All blank characters

[[: Print:] printable characters

[[: Xdigit:] hexadecimal number

[[: Punct:] All punctuation marks

[[: Lower:] lowercase letters

[[: Upper:] uppercase letters

[[: Alpha:] uppercase and lowercase letters

[[: Digit:] Number

[[: Alnum:] numbers and uppercase/lowercase letters

Asterisk

Add an asterisk after a character to indicate that the character does not appear or appears multiple times in the matching text.

[Root @ service99 regular] # cat test.info goolego gocome ongoooooooooo [root @ service99 regular] # grep -- color "o *" test.info goolego gocome regular [root @ service99 regular] # grep -- color "go *" test.info goolego go gogoooooooooo [root @ service99 regular] # grep -- color "w. * d "price.txt // often corresponds. use hello, world together!

Extended Regular Expression

Question mark

The question mark indicates that the previous character may not appear or appear once. Does not match repeated characters.

[root@service99 regular]# egrep --color "91?" price.txt This price is $4.99911

Plus sign

The plus sign indicates that the preceding character can appear once or multiple times, but must appear at least once. If the character does not exist, the mode does not match.

[root@service99 regular]# egrep --color "9+" price.txt This price is $4.99911[root@service99 regular]# egrep --color "1+" price.txt 123455691111806

Use braces

Use braces to specify the limit on repeated regular expressions, which is usually called an interval.

-M: The regular expression appears exactly m times.

-M, n: the regular expression appears at least m times, at most n times

[root@service99 regular]# echo "This is test,test is file." | egrep --color "test{0,1}"This is test,test is file.[root@service99 regular]# echo "This is test,test is file." | egrep --color "is{1,2}"This is test,test is file.

Regular Expression instance

Here is an example of the basic regular expression exercises and examples.
Because of regular expressions, the single-view concept or theory is still relatively simple, but in actual use, it is not so easy to use. Once used, the efficiency improvement is absolutely considerable.

1. filter the keyword contained in the downloaded file

grep --color "the" regular_express.txt 

2. Filter download files that contain the keyword

grep --color -vn "the" regular_express.txt 

3. filter the keyword in the downloaded file.

grep --color -in "the" regular_express.txt 

4. filter the two words test or taste.

grep --color -En 'test|taste' regular_express.txt grep --color -i "t[ae]ste\{0,1\}" 1.txt 

5. Filter byte with oo

grep --color "oo" regular_express.txt 

6. filter the products with g in front of oo

grep --color [^g]"oo" regular_express.txt grep --color "[^g]oo" regular_express.txt 

7. pre-filter oo with lower-case characters

egrep --color "[^a-z]oo" regular_express.txt 

8. filter the row with digits

egrep --color [0-9] regular_express.txt 

9. filter

egrep --color ^the regular_express.txt 

10. filter the characters starting with lowercase letters

egrep --color ^[a-z] regular_express.txt 

11. The filter starts with an English letter.

egrep --color ^[^a-Z] regular_express.txt 

12. The end of the row is the decimal point.

egrep --color $"\." regular_express.txt 

13. Filter blank rows

egrep --color "^$" regular_express.txt 

14. filter out g ?? String of d

egrep --color "g..d" regular_express.txt 

15. Filter strings with at least two o Levels

egrep --color "ooo*" regular_express.txt egrep --color o\{2,\} regular_express.txt 

16. filter the beginning and end of g, but there is only one o between two g

egrep --color go\{1,\}g regular_express.txt 

17. Filter rows of any number

egrep --color [0-9] regular_express.txt 

18. Filter two o strings

egrep --color "oo" regular_express.txt 

19. filter 2 to 5 o After g, and then connect a string of g

egrep --color go\{2,5\}g regular_express.txt 

20. Filter more than two o's after g

egrep --color go\{2,\} regular_express.txt

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.