How does grep perform regular expression search?

Last Update:2018-11-01 Source: Internet

Author: User

Tags character classes

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Character class

Search for character classes: If I want to search for the test or taste words, I can find that they actually have a common't? St' exists ~ In this case, I can search:

[[email protected] ~]# grep -n ‘t[ae]st‘ regular_express.txt8:I can‘t finish the test.9:Oh! The soup taste good.

In fact, no matter how many bytes are in [], it represents a "one" Byte. Therefore, the above example shows that, I only need two strings: "tast" and "test!

Reverse Selection of character classes [^]: If you want to search for rows with OO, but do not want to have g before Oo, as shown below:

[[email protected] ~]# grep -n ‘[^g]oo‘ regular_express.txt2:apple is my favorite food.3:Football game is not use feet only.18:google is the best tools for search keyword.19:goooooogle yes!

Line 2 and 3 have no doubt, because both Foo and foo can be accepted!

But there are Google's goo in line 3 ~ Don't forget, because the too of tool is displayed after this line! So this row is also listed ~ That is to say, although there is a project (goo) we don't want in line 18, it is a string search because there is a required project (too!

As for Row 3, the same is true because the OO in goooogle may be prior to O, for example, go (ooo) oogle. Therefore, this line also meets the requirements.

Consecutive character classes: If I do not want to have lower-case bytes before Oo, I can write [^ ABCD .... z] Oo, but it does not seem very convenient. Since the ASCII encoding sequence of lower-case bytes is continuous, we can simplify it to the following:

[[email protected] ~]# grep -n ‘[^a-z]oo‘ regular_express.txt3:Football game is not use feet only.

That is to say, when we are in a set of bytes, if the byte group is continuous, such as uppercase/lowercase English/numbers, you can use [A-Z], [A-Z], [0-9] and other methods to write, then if we require the string is a number and English? Haha! Write all of them together and turn them into: [a-zA-Z0-9].

We need to obtain the line with numbers, just like this:

[[email protected] ~]# grep -n ‘[0-9]‘ regular_express.txt5:However, this dress is about $ 3183 dollars.15:You are the best is mean you are the no. 1.

Byte at the beginning and end of the row ^ $
First line character: What if I want the first line to be listed? At this time, you have to use the location byte! We can do this:

[[email protected] ~]# grep -n ‘^the‘ regular_express.txt12:the symbol ‘*‘ is represented as start.

At this point, there are only 12th rows left, because only 12th rows start with the beginning ~ In addition, what if I want to start with a lower-case byte line? You can do this:

[[email protected] ~]# grep -n ‘^[a-z]‘ regular_express.txt2:apple is my favorite food.4:this dress doesn‘t fit me.10:motorcycle is cheap than car.12:the symbol ‘*‘ is represented as start.18:google is the best tools for search keyword.19:goooooogle yes!20:go! go! Let‘s go.

If I do not want to start with an English letter, it can be like this:

[[email protected] ~]# grep -n ‘^[^a-zA-Z]‘ regular_express.txt1:"Open Source" is a good mechanism to develop programs.21:# I am VBird

^ Symbol, which is different from the character symbol! In [], it indicates "reverse selection". In addition to [], it indicates the meaning of locating the beginning of a row.

If I want to find out the line ending with the decimal point:

[[email protected] ~]# grep -n ‘\.$‘ regular_express.txt1:"Open Source" is a good mechanism to develop programs.2:apple is my favorite food.3:Football game is not use feet only.4:this dress doesn‘t fit me.10:motorcycle is cheap than car.11:This window is clear.12:the symbol ‘*‘ is represented as start.15:You are the best is mean you are the no. 1.16:The world <Happy> is the same with "glad".17:I like dog.18:google is the best tools for search keyword.20:go! go! Let‘s go.

Note that because the decimal point has other meanings (which will be introduced below), you must use the Escape Character (\) to remove its special meaning!

Find the blank line:

[[email protected] ~]# grep -n ‘^$‘ regular_express.txt22:

Because only the beginning and end of the line (^ $), so you can find a blank line!

Any one byte. It must be the same as the repeated byte *
The meanings of these two symbols in a regular expression are as follows:

. (Decimal point): represents the meaning of "must have any Byte"; * (asterisk): represents the meaning of "repeating the previous character, 0 to infinity multiple times", which is a combination form

Suppose I need to find G ?? The string of D, that is, there are four bytes. the start is g and the end is D. I can do this:

[[email protected] ~]# grep -n ‘g..d‘ regular_express.txt1:"Open Source" is a good mechanism to develop programs.9:Oh! The soup taste good.16:The world <Happy> is the same with "glad".

It is emphasized that there must be two bytes between G and D. Therefore, the GD of the 13th row and the 14th row will not be listed!

If I want to list data with OO, OOO, oooo, and so on, that is, there must be at least two (inclusive) O, what should I do?

Because * indicates the meaning of "repeating 0 or multiple preceding re characters", "O *" indicates: "null bytes or more than one o Byte". Therefore, "grep-N 'o * 'regular_express.txt" prints all the data out of the terminal!

When we need "at least two strings of O and above", we need OOO *, that is:

[[email protected] ~]# grep -n ‘ooo*‘ regular_express.txt1:"Open Source" is a good mechanism to develop programs.2:apple is my favorite food.3:Football game is not use feet only.9:Oh! The soup taste good.18:google is the best tools for search keyword.19:goooooogle yes!

If I want to start and end a string with G, but there is only one o between the two g, that is, Gog, Goog, gooog... and so on, what should I do?

[[email protected] ~]# grep -n ‘goo*g‘ regular_express.txt18:google is the best tools for search keyword.19:goooooogle yes!

If I want to find the rows starting with and ending with G, the characters in the rows can be dispensable.

[[email protected] ~]# grep -n ‘g.*g‘ regular_express.txt1:"Open Source" is a good mechanism to develop programs.14:The gd software is a library for drafting programs.18:google is the best tools for search keyword.19:goooooogle yes!20:go! go! Let‘s go.

Because it represents the beginning and end of G, any byte in the middle is acceptable, so 1st, 14, 20 rows are acceptable! This. * Re indicates any character is very common.

If I want to find the line with "any number? Because there are only numbers, it becomes:

[[email protected] ~]# grep -n ‘[0-9][0-9]*‘ regular_express.txt5:However, this dress is about $ 3183 dollars.15:You are the best is mean you are the no. 1.

Limit the continuous re character range {}
We can use the. And re characters and * to configure 0 to unlimited repeated bytes. What if I want to limit the number of repeated bytes in a range?

For example, how do I find two to five consecutive o strings? At this time, you have to use the character {} with a limited range. But because the {And} symbols have special significance in shell, we must use the character \ To make it meaningless. The syntax of {} is as follows. Suppose I want to find two o strings, which can be:

[[email protected] ~]# grep -n ‘o\{2\}‘ regular_express.txt1:"Open Source" is a good mechanism to develop programs.2:apple is my favorite food.3:Football game is not use feet only.9:Oh! The soup taste good.18:google is the best tools for search ke19:goooooogle yes!

Suppose we want to find 2 to 5 o After g, and then pick up another G string, it will be like this:

[[email protected] ~]# grep -n ‘go\{2,5\}g‘ regular_express.txt18:google is the best tools for search keyword.

What if I want more than 2 O Goooo... G? In addition to Gooo * g, it can also be:

[[email protected] ~]# grep -n ‘go\{2,\}g‘ regular_express.txt18:google is the best tools for search keyword.19:goooooogle yes!

Extended grep (grep-E or egrep ):
The main benefit of using extended grep is the addition of an additional regular expression metacharacter set.

Print all rows that contain NW or ea. If you do not use egrep but grep, no results will be found.

    # egrep ‘NW|EA‘ testfile         northwest       NW      Charles Main        3.0     .98     3       34    eastern         EA      TB Savage           4.4     .84     5       20

For standard grep, if \ is added before the extended metacharacters, grep automatically enables the extended option-e.

#grep ‘NW\|EA‘ testfilenorthwest       NW      Charles Main        3.0     .98     3       34eastern         EA      TB Savage           4.4     .84     5       20

Search for all rows that contain one or more three.

# Egrep '3 + 'testfile # grep-e '3 + 'testfile # grep '3 \ + 'testfile # The Three commands will run the Northwest NW Charles Main 3.0 command. 98 3 34 western we Sharon gray 5.3. 97 5 23 northeast ne am Main Jr. 5.1. 94 3 13 Central CT Ann Stephen 5.7. 94 5 13

Search for all rows that contain 0 or 1 decimal point characters.

# Egrep '2 \.? [0-9] 'testfile # grep-e' 2 \.? [0-9] 'testfile # grep' 2 \.\? [0-9] 'testfile # contains 2 characters, followed by 0 or 1 point, followed by a number between 0 and 9. Western we Sharon gray 5.3. 97 5 23 southwest SW Lewis dalsass 2.7. 8 2 18 Eastern ea tb savage 4.4. 84 5 20

Search for one or more consecutive No rows.

# Egrep '(NO) + 'testfile # grep-e' (NO) + 'testfile # grep' \ (NO \) \ + 'testfile #3 commands return the same result, northwest NW Charles Main 3.0. 98 3 34 northeast ne am Main Jr. 5.1. 94 3 13 North no Margot Weber 4.5. 89 5 9

How does grep perform regular expression search?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More