Normalization of PHP Regular Expressions

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

PHP Regular Expression definition:

A syntax rule used to describe character arrangement and matching modes. It is mainly used for string mode segmentation, matching, search and replacement operations.

Regular Functions in PHP:

PHP has two sets of regular functions, which have similar functions:

One set is composedPCRE(Perl Compatible Regular Expression) provided by the Library. Use"Preg _"Is a function with a prefix;

A set of extensions provided by POSIX (Portable Operating System Interface of Unix. Use a function named with the prefix "ereg _". (POSIX Regular Expression Library is not recommended since PHP 5.3 and will be removed from PHP6)

Since POSIX regular expressions are coming soon, And the PCRE and perl forms are similar, it is more conducive to switching between perl and php, so here we will introducePCREUse of regular expressions.

PCRE Regular Expression

PCRE is called Perl Compatible Regular Expression, which means Perl is Compatible with Regular expressions.

In PCRE, a pattern expression (that is, a regular expression) is usually included between two Backslash "/", such as "/apple /".

There are several important concepts in Regular Expressions: metacharacters, escaping, pattern units (duplicates), assignees, references, and assertions, these concepts can be easily understood and mastered in article [1.

// If you want to publish this article, please indicate the exit, no outgoing copyright notice, I do not like to see this kind of website. My work website does not indicate the person who is serving Seven {See7di # Gmail.com}

Frequently Used metacharacters (Meta-character ):

\ A matches the atom at the beginning of A string

\ Z matches the atoms at the end of a string

\ B matches the boundary of the word/\ bis/string with the matching header is/is \ B/string with the matching tail is/\ bis \ B/Boundary

\ B matches any character except word boundary/\ Bis/matches "is" in the word "This"

\ D matches a number, which is equivalent to [0-9].

\ D matches any character except a number. It is equivalent to [^ 0-9].

\ W matches an English letter, number, or underline. It is equivalent to [0-9a-zA-Z _]

\ W matches any character except English letters, numbers, and underscores. It is equivalent to [^ 0-9a-zA-Z _]

\ S matches a blank character; equivalent to [\ f \ t \ v]

\ S matches any character except the white space. It is equivalent to [^ \ f \ t \ v]

\ F matching a page feed is equivalent to \ x0c or \ cL

\ N matches a line break, which is equivalent to \ x0a or \ cJ

\ R matching a carriage return is equivalent to \ x0d or \ cM

\ T matches a tab. It is equivalent to \ x09 \ or \ cl.

\ V matches a vertical tab, which is equivalent to \ x0b or \ ck

\ ONN matches an octal number

\ XNN matches a hexadecimal number

\ CC matches a control character

Pattern Modifiers ):

Pattern delimiters are used in case-insensitive and multi-row matching. These delimiters often solve many problems.

I-matching uppercase and lowercase letters at the same time

M-treat strings as multiple rows

S-treats a string as a single line, and line breaks are treated as common characters so that "." matches any character.

The white space in X-mode is ignored.

U-match to the nearest string

E-use the replaced string as the expression

Format:/apple/I matches "apple" or "Apple", and case insensitive. /I

PCRE mode unit:

// 1. Extract the first attribute

/^ \ D {2} ([\ W]) \ d {2 }\\ 1 \ d {4} $ matches strings such as "12-31-2006", "09/27/1996", and "86 01 4321. However, the above regular expression does not match the "12/34-5678" format. This is because the result "/" of the mode "[\ W]" has been stored. When the next position "\ 1" is referenced, the matching mode is also the character "/".

When you do not need to store matching results, use the non-storage mode unit "(? :)"

For example /(? : A | B | c) (D | E | F) \ 1g/will match "aEEg ". In some regular expressions, it is necessary to use non-storage mode units. Otherwise, you need to change the subsequent reference sequence. The preceding example can also be written as/(a | B | c) (C | E | F) \ 2g /.

PCRE regular expression function:

Preg_match () and preg_match_all ()
Preg_quote ()
Preg_split ()
Preg_grep ()
Preg_replace ()

For specific functions, we can find them in the PHP manual. Below are some accumulated Regular Expressions:

Matching action attributes

$ Str = '';
$ Match = '';
Preg_match_all ('/\ s + action = \"(?! Http :)(.*?) \ "\ S/', $ str, $ match );
Print_r ($ match );

Use callback in Regular Expressions

/**
* Replace some string by callback function
*/
Function callback_replace (){
$ Url = 'HTTP: // esfang.house.sina.com.cn ';
$ Str = '';
$ Str = preg_replace ('/(? <= \ Saction = \")(?! Http :)(.*?) (? = \ "\ S)/E', 'search (\ $ url, \ 1) ', $ str );

Echo $ str;
}

Function search ($ url, $ match ){
Return $ url. '/'. $ match;
}

Regular Expression matching with assertions

$ Match = '';
$ Str = 'xxxxxx .com.cn bold font
Paragraph text ';
Preg_match_all ('/(? <= <(\ W {1})> ).*(? = <\/\ 1>)/', $ str, $ match );
Echo "matches the content in HTML tags without attributes :";
Print_r ($ match );

Replace the address in the HTML source code

$ Form_html = preg_replace ('/(? <= \ Saction = \ "| \ ssrc = \" | \ shref = \")(?! Http: | javascript )(.*?) (? = \ "\ S)/e ', 'add _ url (\ $ url, \' \ 1 \ ')', $ form_html );

Although the regular expression tool is powerful, in terms of efficiency and writing time, sometimes there may not be explode more direct, for some urgent or less demanding tasks, it may be better to use simple and crude methods, such as Str_iReplace () and Strtr.

For the execution efficiency between the preg and ereg series, I have seen the article saying that preg is a little faster, because there are not many ereg instances, and we have to launch the stage of history, in this way, I am more inclined to add a person than PCRE, so ereg is out of my consideration.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Normalization of PHP Regular Expressions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Normalization of PHP Regular Expressions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support