Unveil the mystery of regular expression syntax

Source: Internet
Author: User
Tags line editor egrep

Unveil the mystery of regular expression syntax

Author: Builder. com
Wednesday, October 23 2002 AM

Regular Expressions (RES) are generally incorrectly considered as a mysterious language that only a few people understand. On the surface, they do look messy. If you don't know its syntax, then its Code In your eyes, it's just a pile of text spam. In fact, regular expressions are very simple and understandable. Read this article Article You will be familiar with the general syntax of regular expressions.

Supports multiple platforms



The regular expression was first proposed by the mathematician Stephen Kleene in 1956, based on his incremental research on natural language. Regular Expressions with complete syntax are used in character format matching and subsequently applied to the field of melt information technology. Since then, regular expressions have evolved over several periods, and the current standards have been approved and recognized by ISO (International Standards Organization.

Regular expressions are not a specialized language, but can be used to search for and replace text in a file or character. It has two standards: Basic Regular Expressions (BRE) and extended regular expressions (ERE ). Ere includes bre functions and other concepts.

ManyProgramRegular expressions are used, including xsh, egrep, sed, VI, and programs on UNIX platforms. They can be adopted by many languages, such as HTML and XML, which are usually only a subset of the entire standard.

It's more common than you think.

As regular expressions are transplanted to cross-platform programming languages, this function is becoming increasingly complete and widely used. The search engine on the network uses it, and the e-mail program uses it, even if you are not a Unix programmer, you can also use the rule language to simplify your program and shorten your development time.

Regular Expressions 101

The syntax of many regular expressions looks very similar, because you have never studied them before. A wildcard is a structure type of RE, that is, repeated operations. Let's take a look at the most common basic syntax types of the ere standard. To provide examples for specific purposes, I will use several different programs.

Character matching

The key to a regular expression is to determine what you want to search for. Without this concept, res will be useless.

Each expression contains the command to be searched, as shown in table.

Table A: character-matching Regular Expressions

Operation

Explanation

Example

Result

.

Match any one character

Grep. ord sample.txt

Will match "Ford", "Lord", "2ord", etc. In the file sample.txt.

[]

Match any one character listed between the brackets

Grep [CNG] ord sample.txt

Will match only "cord", "nord", and "Gord"

[^]

Match any one character not listed between the brackets

Grep [^ cn] ord sample.txt

Will match "Lord", "2ord", etc. But not "cord" or "Nord"

Grep [A-Za-Z] ord sample.txt

Will match "aord", "bord", "aord", "bord", etc.

Grep [^ 0-9] ord sample.txt

Will match "aord", "aord", etc. But not "2ord", etc.

Repeated Operators

Repeated operators, or quantifiers, all describe the number of times a specific character is searched. They are often used in character matching syntax to find characters with multiple rows. See table B.

Table B: Regular Expression repetition Operators

Operation

Explanation

Example

Result

?

Match any character one time, if it exists

Egrep "? ERD "sample.txt

Will match "BERD", "herd", etc. And "ERD"

*

Match declared element multiple times, if it exists

Egrep "N. * RD" sample.txt

Will match "nerd", "NRD", "neard", etc.

+

Match declared element one or more times

Egrep "[n] + ERD" sample.txt

Will match "nerd", "nnerd", Etc., but not "ERD"

{N}

Match declared element EXACTLY n times

Egrep "[A-Z] {2} ERD" sample.txt

Will match "cherd", "blerd", etc. But not "nerd", "ERD", "buzzerd", etc.

{N ,}

Match declared element at least N times

Egrep ". {2,} ERD" sample.txt

Will match "cherd" and "buzzerd", but not "nerd"

{N, n}

Match declared element at least N times, but not more than N times

Egrep "N [e] {1, 2} RD" sample.txt

Will match "nerd" and "neerd"

Anchor

The anchor is the matching format, as shown in C. It allows you to conveniently find the merge of common characters. For example, I use the VI line editor command: s to represent substitute. The basic syntax of this command is:

S/pattern_to_match/pattern_to_substitute/
 

Table C: Regular Expression anchors

Operation

Explanation

Example

Result

^

Match at the beginning of a line

S/^/blah/

Inserts "blah" at the beginning of the line

$

Match at the end of a line

S/$/blah/

Inserts "blah" at the end of the line

\ <

Match at the beginning of a word

S/\ </blah/

Inserts "blah" at the beginning of the word

Egrep "\ <blah" sample.txt

Matches "blahfield", etc.

\>

Match at the end of a word

S/\>/blah/

Inserts "blah" at the end of the word

Egrep "\> blah" sample.txt

Matches "soupblah", etc.

\ B

Match at the beginning or end of a word

Egrep "\ bblah" sample.txt

Matches "blahcake" and "countblah"

\ B

Match in the middle of a word

Egrep "\ bblah" sample.txt

Matches "sublahper", etc.

Interval

Another item in RES is the delimiter (or insert. In fact, this symbol is equivalent to an OR statement and represents a | symbol. The following sentence returns the "nerd" and "merd" handles in the sample.txt file:

Egrep "(n | M) ERD" sample.txt

The interval function is very powerful, especially when you are looking for different spelling of files, but you can get the same result in the following example:

Egrep "[Nm] ERD" sample.txt

When you use the interval function to connect with the advanced features of res, its real use is more obvious.

Reserved characters

The last and most important feature of RES is to retain characters (also known as specific characters ). For example, if you want to find the characters "ne * rd" and "Ni * rd, the format match statement "N [ei] * rd" matches "neeeeerd" and "nieieierd", but it is not the character you want to search. Because '*' (asterisk) is a reserved character, you must use a backslash to replace it, that is, "N [ei] \ * rd ". Other reserved characters include:

    • ^ (Carat)
    • . (Period)
    • [(Left bracket}
    • $ (Dollar sign)
    • (Left parenthesis)
    • ) (Right parenthesis)
    • | (PIPE)
    • * (Asterisk)
    • + (Plus symbol)
    • ? (Question mark)
    • {(Left curly bracket, or left brace)
    • \ Backslash

Once you include the above characters in your character search, there is no doubt that res becomes very difficult to read. For example, the eregi Search Engine Code in the following PHP is hard to read.

Eregi ("^ [_ a-z0-9-] + (\. [_ a-z0-9-] +) * @ [a-z0-9-] + (\. [a-z0-9-] +) * $ ", $ sendto)

As you can see, the program's intentions are hard to grasp. However, if you leave the reserved characters aside, you often mistakenly understand the meaning of the Code.

Summary

In this article, we unveil the mystery of regular expressions and list the general syntax of the ere standard. If you want to view the complete description of the Open Group rules, see regular expressions. You are welcome to post your questions or opinions in the discussion area.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.