PHP regular PCRE and PHP PCRE

Source: Internet
Author: User
Tags character classes php regular expression

PHP regular PCRE and PHP PCRE

The summary of PHP Regular Expression PCRE is similar to the following. ReferenceDifferences between PCRE and perl.

 

Anchor (^, $, \ A, \ Z/\ z)

In multi-row mode, it is unfastened, and in single-row mode, it is fastened; \ A, \ Z/\ z are fastened in any mode (\ G indicates the first matching position in the target and can be used with the $ offset parameter ).

 

Assertions

An assertion specifies a condition that must be matched at a specific position. It does not consume any characters from the target string and therefore does not appear in the result. An assertion sub-group with forward-looking assertion "(? =) "Or" (?!)" And houzhan asserted "(? <=) "Or" (? <!)" Format.

1) \ B indicates the word boundary (\ B indicates the backspace character in the character class) and \ B indicates the non-word boundary.
2) When the string end character is a line break, \ Z will regard it as a string end match, while \ z only matches the string end.
3) forward-looking assertions (?! Foo) bar will find any bar with which barxxxxx is not foo.
4) The content of the Post-Zhan assertions is strictly restricted to being used only to match fixed-length strings, such (? <= Bullock | donkey) is allowed, (? <! Dogs? | Cats ?) ,(? <= AB (c | de) will cause a compilation error.
5) Multiple assertions (in any order) can appear at the same time, such (? <= \ D {3 })(? <! 999) foo matches the string "foo" with three numbers but not "999 ".
6) assertions can be nested with any complexity, such (? <= (? <! Foo) bar) baz matches "bar" but "bar" without "foo", and (? <= \ D {3 }... (? <! 999) foo matches the first three digits followed by three "foo" characters other than 999 ".
7) if all assertions contain a capture sub-group, the Sub-group count will be included for the purpose of capturing the Sub-Group count in the entire mode; however, substring capture can only be used for positive assertions, because it is meaningless for negative assertions.

Internal options (?)

If an option is set inside the sub-group, only the remaining part of the sub-group is changed (? I) B) c only matches "abc" and "aBc" (assuming that the PCRE_CASELESS option is not used); but in the same submode, the internal option settings of a branch penetrate into other branches, such as ((? I) B | c) Match "AB", "AB", "c", or "C" (the options are determined during compilation, when "C" is matched, the first branch is discarded ).

 

Pattern Modifier

I (PCRE_CASELESS), m (PCRE_MULTILINE), s (PCRE_DOTALL), x (PCRE_EXTENDED), A (PCRE_ANCHORED), D (random), U (PCRE_UNGREEDY), etc. Space and line breaks in the pattern modifier are ignored. Other characters may cause errors.

 

Sub-Group/sub-mode and branch (|)

The child group uses parentheses (which are used to localize and capture the child group, while "(? :) "Do not capture sub-groups," (? |) "Enables multiple branches to reuse one backward reference number) to separate and define, and they can be nested.

1) cat (arcat | erpillar |) matches "cat", "cataract", and "caterpillar.
2) One-time sub-group (non-capturing sub-group) and post-Zhan assertions are used together to specify a valid match at the end of the target string, such as ^ (?>. *)(? <= Abcd) is more efficient than ^. * abcd $.
3) condition sub-groups (non-capturing Sub-Groups) such (? (Condition) yes-pattern ),(? (Condition) yes-pattern | no-pattern), condition is a number (backward reference to a capture sub-group) or a string or assertion. (\()? [^ ()] + (? (1) \) matches a character sequence without parentheses or enclosed in parentheses.
4) Name a sub-group (capture sub-group) in the form (? P <name> pattern ),(? <Name> pattern ),(? 'Name' pattern (? P> name ),(? The P & name) format references the naming Sub-Group again in the mode, but such references cannot be captured.

 

Character classes and escape characters

Character Classes \ d, \ D, \ s, \ S, \ w, and \ W can also appear in a character class, it is used to add the matched character classes to the new custom character classes. For example, [\ dABCDEF] matches any valid hexadecimal number, and [^ \ W _] matches any letter or number but does not match the underline. \ Q and \ E can ignore the metacharacters of regular expressions,

\w+\Q.$.\E$ 

This mode matches one or more word characters, followed by a dot, a $, a dot, and finally anchor to the end of the string; \ K can be used to reset matching, for example, foot \ Kbar matches "footbar", but the matching result is "bar ".

 

Note

The pattern is like "(? # Comments. If the PCRE_EXTENDED option is set, the unescaped # characters outside the character class indicate that the rest of the line is a comment.

 

Recursive mode (? R)

(? R) provides this kind of recursive usage, such

\(((?>[^()]+)|(?R))*\)

This mode can match the string (AB (cd) ef ).

 

Delimiter

Separators can enable any non-alphanumeric, non-diagonal, non-blank characters, such as forward slashes (/), hash symbols (#), and reverse symbols (~) .

 

Backward reference

(? 1 ),(? 2 ),(? P> name ),(? P & name ),(? P = name), \ 1, \ k <name>, \ k'name', \ k {name}, \ g {name}, and Other forms can be used to reference the previously defined sub-group. Note that \ n reuses the matching result, while (? N ),(? P> name ),(? P & name) Reuse mode. (A | (bc) \ 2 always fails when it matches a string starting with "a" instead of "bc", so it can be changed to (a | (bc ))(? 2 ). Numbers after backward reference can be separated by spaces and decorated with x, or in the format of \ g {n} (sequence \ 1, \ g1, \ g {1} is synonymous), (foo) (bar) \ g {-1} can match the string "foobarbar ". The Sub-group will fail to reference itself (a \ 1), but (a | B \ 1) * The aba will succeed. That is, when the mode is first iterated, it must be ensured that no matching is required for backward reference.

 

PCRE does not allow quantifiers of forward-looking assertions

(?! A) {3} does not mean that the next three characters are not a, but asserted that the next character is not a and three assertions are made.

 

// Verify email address // $ match: Array ([0] => w12aqe_@124afasf.com [1] => 124afasf.) $ patt = '/^ (?! _ |-) (?> [\ W-] + )@(?! -) (?> (?> [A-zA-Z0-9-] + )\.) +) [a-zA-Z] {2, 46} $/'; preg_match ($ patt, 'w12aqe _ @ 124afasf.com', $ match); // $ match: array ([0] => Sunday [day] => Sunday [1] => Sunday) $ patt = '/(? <Day> :(? I) Saturday | Sunday)/'; preg_match ($ patt, 'sunday', $ match); // $ match: array ([0] => Sunday [1] => Sun) $ patt = '/(? | (Sat) ur | (Sun) day/'; preg_match ($ patt, 'sunday', $ match); // when a child group matches multiple times, capture the latest value // $ match: Array ([0] => (AB (cd) ef) [1] => AB (cd) ef [2] => ef) $ patt = '/\ (?> [^ ()] +) | (? R) *) \)/X'; preg_match ($ patt, '(AB (cd) ef)', $ match ); // You can reference a Sub-Group again by name or serial number. // $ match: Array ([0] => 23ab45cd56 [number] => 23 [1] => 23) $ patt = '/^ (? P <number> \ d +) AB (? P> number) cd (? 1) $/'; preg_match ($ patt, '23ab45cd56', $ match); // The following mode is equivalent to ^. * abcd $ // $ match: Array ([0] => 23 abcd) $ patt = '/^ (?>. *)(? <= Abcd)/'; preg_match ($ patt, '23abcd', $ match );

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.