Parsing regular expressions in PHP and pattern matching _ PHP Tutorial

Source: Internet
Author: User
Tags processing text
Parse regular expressions and pattern matching in PHP. In PHP, two methods are provided for regular expression processing text. one is PCRE (the PCRE library implements a slightly different syntax and semantics from perl5 (see below) the regular expression pattern in PHP provides two ways to process text with regular expressions, one is the PCRE method (the PCRE library is a function set that implements the regular expression pattern matching function slightly different from perl 5 in terms of syntax and semantics (see below. the current implementation corresponds to perl 5.005 .); the other method is POSIX.

The pattern syntax used by functions in the PCRE function library is very similar to perl. the expression must be closed with a separator, for example, a forward slash (/). separators allow any non-alphanumeric characters, including non-ascii characters except the backslash (\) and null bytes. if the delimiter is used in an expression, escape it with a backslash. starting from php 4.0.4, you can use the perl style (), {}, [], and <> as separators. for more detailed explanations, see schema syntax.

The end separator can be followed by the pattern modifier to affect the matching effect. see pattern modifier.
Pattern modifier of PCRE
I (PCRE_CASELESS)
If this modifier is set, characters in the mode match both uppercase and lowercase letters.
S (PCRE_DOTALL)
If this modifier is set, the DOT metacharacters (.) in the pattern match all characters, including line breaks. If this parameter is not set, line breaks are not included. This is equivalent to the/s modifier of Perl. For example, [^ a] always matches a line break, regardless of whether this modifier is set.
M (PCRE_MULTILINE)
By default, PCRE uses the target string as a single "line" character (or even contains a line break ). The line start metacharacters (^) only match the start of the string, and the line end metacharacters ($) only match the end of the string, or the last character is before the line break (unless the D modifier is set ). This is the same as Perl. When this modifier is set, "row start" and "row end" not only match the start and end of the entire string, but also match the end and end of the linefeed respectively. This is equivalent to the/m modifier of Perl. If the target string does not contain the \ n character or the mode does not contain ^ or $, this modifier is set to no effect.
X (PCRE_EXTENDED)
If this modifier is set, blank data characters in the mode that are not escaped or not in the character class are always ignored, in addition, the characters between the # character and the next line break located outside of an unescaped character class are also ignored. this modifier is equivalent to the/x modifier in perl, so that the compiled mode can contain comments. note: this is only used for data characters. blank characters cannot appear in a special character sequence in the mode, such as a sequence (? (A condition sub-group is introduced (? (It will cause an error .).
E (PREG_REPLACE_EVAL)
If this modifier is set, preg_replace () replaces the string with a forward reference after it is replaced, and uses the replaced string as the php code evaluation line (eval function mode ), and use the row result as the string to be replaced. single quotation marks, double quotation marks, backslash (\), and NULL characters are escaped by backslash when being replaced by backward references.
Only preg_replace () uses this modifier, which is ignored by other PCRE functions.
A (PCRE_ANCHORED)
If this modifier is set, the pattern is forced to "anchored", that is, the constraint match enables it to search only from the starting position of the target string. this effect can also be constructed using the appropriate mode, and this is also the only way for perl to implement this mode.
D (PCRE_DOLLAR_ENDONLY)
If this modifier is set, the dollar sign in the schema matches only the end of the target string. if this modifier is not set, when the string ends with a linefeed, the dollar symbol will match the linefeed (but will not match any previous linefeeds ). if the modifier m is set, the modifier is ignored. there is no modifier equivalent to this modifier in perl.
S
When a pattern needs to be used multiple times, it is worth some time to analyze it to improve the matching speed. if this modifier is set, this additional analysis will be executed. currently, this analysis of a pattern is only applicable to non-anchoring pattern matching (that is, there is no separate fixed start character ).
U (PCRE_UNGREEDY)
This modifier reverses the "greedy" pattern of quantifiers. by default, quantifiers are non-greedy, followed by quantifiers? This is incompatible with perl. It can also be set using a pattern modifier (? U), or mark it with a question mark (such .*?). In non-greedy mode, it is generally not allowed to match characters that exceed pcre. backtrack_limit.
X (PCRE_EXTRA)
This modifier enables the Attachment feature that is not compatible with perl. after any backslash in the mode, a character with no special meaning will cause an error, so as to retain these characters to ensure backward compatibility. by default, in perl, the backslash follows a character that has no special meaning and is considered as the original character of the character. no other features are controlled by this modifier currently.
J (PCRE_INFO_JCHANGED)
Set internal options (? J) modify the local PCRE_DUPNAMES option. the sub-group name can only be set through the internal option. an error occurs when the external/J setting is used .)
U (PCRE8)
This modifier opens an additional feature that is incompatible with perl. the pattern string is considered UTF-8. this modifier is available for php 4.1.0 or later in unix and php 4.2.3 In win32. php 4.3.5 starts to check the validity of the UTF-8 mode. this modifier turns on additional functionality of PCRE that is incompatible with Perl. pattern strings are treated as UTF-8. this modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

The lift (PCRE library is a regular expression pattern that implements a slight difference in syntax and semantics with perl 5 (see below...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.