Parsing regular Expressions in PHP and _php techniques for pattern matching

Source: Internet
Author: User
Tags character classes modifier modifiers win32
There are two ways to handle regular text in PHP, one is the pcre approach (the Pcre library is a set of functions that implements a regular expression pattern matching feature that is slightly different from the syntax and semantics of Perl 5 (see below). The current implementation corresponds to Perl 5.005.) and the other is the POSIX approach.

Functions in the PCRE function library use a pattern syntax that is very similar to Perl. An expression must be closed with a delimiter, such as a forward slash (/). Delimiters can make any non-alphanumeric, in addition to a backslash (\) and a non-empty ASCII character other than a blank byte. If the delimiter is used in an expression, it needs to be escaped using a backslash. You can use the Perl style (), {}, [] and <> as a separator since PHP 4.0.4 begins. For a more detailed explanation see schema syntax.

The end separator can be followed by a pattern modifier to affect the matching effect. See pattern modifiers.
pattern Modifiers for pcre
I (pcre_caseless)
If you set this modifier, the characters in the pattern will match both the uppercase and lowercase letters.
s (pcre_dotall)
If this modifier is set, the dot meta character (.) in the pattern matches all characters, including line breaks. Without this setting, line breaks are not included. This is equivalent to Perl's/s modifier. Excluding character classes such as [^a] always match newline characters, regardless of whether this modifier is set.
m (pcre_multiline)
By default, PCRE the target string as a single "line" character (even if it contains newline characters). The "line Start" metacharacters (^) only match the start of the string, and the line end Metacharacters ($) match only the end of the string, or the last character is preceded by a newline character (unless the D modifier is set). This is the same as Perl. When this modifier is set, line start and end of line match the end of the entire string, and the after and before the line breaks are matched respectively. This is equivalent to the/m modifier of Perl. If there is no "\ n" character in the target string or there is no ^ or $ in the pattern, setting this modifier has no effect.
x (pcre_extended)
If this modifier is set, the blank data characters in the pattern that are not escaped or are not in the character class are always ignored, and the characters between the # character and the next newline character that are outside of a class that is not escaped are ignored. This modifier is equivalent to the/x modifier in Perl so that a comment can be included in the compiled schema. Note: This is for data characters only. White-space characters are still not present in the pattern's special sequence of characters, such as a sequence (? A conditional subgroup is introduced (this syntax defines a special sequence of characters in which white space characters can cause compilation errors. For example ( will cause errors.).
E (preg_replace_eval)
If this modifier is set, Preg_replace () after replacing the replacement string with a back reference, the substituted string is used as the line of evaluation for the PHP code (the Eval function) and the row result is used as the actual string to be substituted. Single quotes, double quotes, backslash (\), and null characters are escaped with backslashes when they are replaced by a back reference.
only Preg_replace () uses this modifier, and other pcre functions ignore this modifier.
A (pcre_anchored)
If this modifier is set, the pattern is forced to "anchor" mode, meaning that the constraint match makes it search only from the beginning of the target string. This effect can also be constructed using the appropriate patterns, and this is the only way to implement this pattern in Perl.
D (pcre_dollar_endonly)
If this modifier is set, the dollar sign of the metacharacters in the pattern matches only the end of the target string. If this modifier is not set, the dollar sign also matches the newline character when the string ends with a newline character (but does not match any newline characters before it). If the modifier m is set, this modifier is ignored. There is no modifier equivalent to this modifier in Perl.
S
When a pattern needs to be used more than once, it is worth taking some extra time to analyze it in order to get a matching speed increase. If this modifier is set, the additional analysis is performed. Currently, this analysis of a pattern applies only to the matching of a non-anchored pattern (that is, there is no separate fixed start character).
U (Pcre_ungreedy)
This modifier reverses the "greedy" pattern of quantifiers. Make quantifiers default to not greedy, followed by quantifiers? Way to make it greedy. This is incompatible with Perl. It can also be set using the pattern modifier (?). U), or mark the quantifier with a question mark after it is not greedy (e.g.. *?). in non-greedy mode, more than pcre.backtrack_limit characters are usually not matched.
X (Pcre_extra)
This modifier opens an add-on feature that Pcre is incompatible with Perl. Any backslash in the pattern Ingen a character without special meaning causes an error to preserve these characters for backward compatibility. By default, in Perl, a backslash immediately follows a character that has no special meaning and is considered to be the original of the character. No other attributes are currently controlled by this modifier.
J (pcre_info_jchanged)
Internal option settings (?) J) Modify the local pcre_dupnames option. Allows the name of a subgroup to be duplicate. (Only through the internal option settings, external/j settings will produce errors.)
U (PCRE8)
This modifier opens an additional feature that is incompatible with Perl. The pattern string is considered utf-8. This modifier is available from the UNIX version of PHP 4.1.0 or higher, Win32 version of PHP 4.2.3. PHP 4.3.5 begins checking the utf-8 legality of the mode. This modifier turns on additional functionality of PCRE so is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier are available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on Win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.