_php Tutorial on the sub-schema of a regular expression in PHP

Source: Internet
Author: User
Tags posix
This article introduces a detailed description of the sub-patterns of the regular expressions in PHP, and a friend who needs to know the sub-pattern of the expression in PHP is a reference.

Mixed preg_replace (mixed pattern, mixed replacement, mixed subject [, int limit])

Searches the subject for a match in pattern mode and replaces it with replacement. If limit is specified, only the limit match is replaced, and if the limit is omitted or the value is-1, all occurrences are replaced.
The replacement can contain a reverse reference in the form of \ n or $n, and n can represent 0 to 99,\n of text that matches the pattern nth child pattern, which is the text that matches the entire pattern.

$pattern the regular expression enclosed in parentheses in the argument, the number of child patterns is the number of parentheses from left to right. (Pattern is mode)

First, let's look at a PHP code:

The code is as follows Copy Code

$time = Date ("y-m-d h:i:s");
$pattern = "/d{4}-d{2}-d{2} d{2}:d {2}:d {2}/i";
if (Preg_match ($pattern, $time, $arr)) {
echo "
Print_r ($arr);
echo "

Show Results:

[0] = 2012-06-23 03:08:45
Have you noticed that the result of the display is only one piece of data, that is, the time format that matches the pattern, and if there is only one record, why do you want to save it with an array? Isn't it better to save directly with a string?

With this problem, let's look at the sub-patterns in the regular expression.

In regular expressions, you can use "(" and ")" to enclose substrings in a pattern to form a sub-pattern. When a sub-pattern is treated as a whole, it is equivalent to a single character.

For example, we will change the above code slightly to the following:

The code is as follows Copy Code

$time = Date ("y-m-d h:i:s");
$pattern = "/(D{4})-(D{2})-(d{2}) (D{2}):(d{2}):(d{2})/I";
if (Preg_match ($pattern, $time, $arr)) {
echo "
Print_r ($arr);
echo "

Note: I only modified the $pattern, in the matching pattern, using parentheses ()

Execution Result:

[0] = 2012-06-23 03:19:23
[1] =
[2] = [3] = 4
[] = 03< br> [5] = +
[6] =
Summary: We can use parentheses to group the entire matching pattern, by default, each grouping automatically has a group number, the rule is, from left to right, with the left parenthesis of the group as the flag, The first occurrence of the grouping is group number 1, the second is group number 2, and so on. Where grouping 0 corresponds to the entire regular expression. After grouping the entire regular matching pattern, you can further use backward reference to repeat the search for one of the preceding grouped matching text. For example: 1 for grouping 1 matching text, 2 for grouping 2 matching text, etc. we can further modify the following code as follows:

The code is as follows Copy Code
$time = Date ("y-m-d h:i:s");
$pattern = "/(D{4})-(D{2})-(d{2}) (D{2}):(d{2}):(d{2})/I";
$replacement = "$time format: $
The format after substitution is: \1 year \2 month \3 day \ 4 o'clock \5 minutes \6 seconds ";
Print Preg_replace ($pattern, $replacement, $time);
if (Preg_match ($pattern, $time, $arr)) {
echo "
Print_r ($arr);
echo "


Because it is in double quotes, you should use two backslashes when grouping, such as: \1, and if in single quotes, use a backslash, such as: 1
\1 is used to capture the contents of a grouping: 2012,\6 is used to capture the contents of Group 6
Execution Result:

The $time format is: 2012-06-23 03:30:31
The replacement format is: June 23, 2012 03:30 31 seconds
[0] = 2012-06-23 03:30:31
[1] = 2012
[2] = 06
[3] = 23
[4] = 03
[5] = 30
[6] = 31

High-level regular expressions

In addition to POSIX BRE and ERE, Libutilitis supports advanced regular expression language compatible with Tcl 8.2
Law (IS). By adding the prefix "* *:" to the Stregex parameter, you can turn on the IS mode, which
Cover bextended options. Basically, is is a superset of ERE. It was on the basis of ERE the following several
Item Extension:

1. Support "Lazy Match" (also known as "non-greedy match" or "shortest Match"): In '? ', ' * ', ' + ' or ' {m,n} '
The '? ' symbol can be used to enable the shortest match, so that the regular expression clause matches the condition
With as few characters as possible (the default is to match as many characters as possible). For example, "a.*b" acts on "Abab"
, the entire string ("Abab") is matched, and if "a.*?b" is used, only the first two characters ("AB") will be matched.

2. Supports forward reference matching for sub-expressions: In Stregex, you can use ' n ' to forward references to previously defined
Sub-expression. such as: "(a.*) 1" can match "ABCABC" and so on.

3. Nameless subexpression: Creates an unnamed expression using the (?: expression) method, and the nameless expression does not return
To an ' n ' match.

4. Forward pre-judgment: to hit a match, the specified condition must be met forward. Forward pre-judgment divided into affirmative and negative pre-judgment
Two kinds. The syntax for affirmation is: "(? = Expression)", for example: "bai.* (? =yang)" matches "Bai Yang"
The first four characters ("Bai") in the, but guarantees that the string must contain "Yang" after "bai.*" when matching.
The grammar of the negative judgment is: "(?! Expression) ", for example:" Bai.* (?! Yang) "matches" Bai Shan "in front
Four characters, but the match is guaranteed that the string does not appear "Yang" after "bai.*".

5. Support the mode switch prefix, after the "* *:" Can immediately follow the shape of "(? Mode string)" style of the pattern string, mode
The string affects the semantics and behavior of the subsequent expression. The pattern string can be a combination of characters:

B-Switch to POSIX BRE mode and overwrite the bextended option.
e-Switch to POSIX ERE mode, overwriting the bextended option.
Q-Switch to text literal match mode, word nonalphanumeric in expression Search as text, cancel all regular
Semantic. This pattern degrades the regular match to a simple string lookup. The "***=" prefix is its shortcut representation
means: "***=" is equivalent to "* * *:(? q)".

C-Performs a case-sensitive match that overrides the Bnocase option.
I-Performs a ignore case match, overwriting the bnocase option.

N-Open Line sensitive match: ' ^ ' and ' $ ' match beginning and end of line; '. ' and negative set (' [^ ...] ' ) does not
Match line breaks. This function is equivalent to the ' PW ' mode string. Overrides the Bnewline option.
M-Equates to ' n '.
P-' ^ ' and ' $ ' only match the entire end of the string, do not match the line; '. ' and the negative set do not match the line break.
Overrides the Bnewline option.
W-' ^ ' and ' $ ' match beginning and end of line; '. ' and negative sets match line breaks. Overrides the Bnewline option.
S-' ^ ' and ' $ ' only match the entire string of the end and end, do not match lines; '. ' and negative sets match line breaks. Reply
Cover Bnewline options. This mode is used by default in the IS state.

X-Turn on Extended mode: In extended mode, the contents of the whitespace and the comment ' # ' in the expression are ignored
For example:
@code @
(? x)
S+ ([[: graph:]]+) # First Number
S+ ([[: graph:]]+) # Second number
@code @
Equivalent to "s+ ([[: graph:]]+) S+ ([[: graph:]]+)".
T-closes the extended mode without ignoring the contents of the whitespace and the comment. This mode is used by default in the IS state.

6. The Perl-style character-class swap sequence that differs from the Bre/ere mode:

Perl class-equivalent POSIX expression description
A-Bell character
A-matches only the beginning of the entire string, regardless of the current pattern
B-Retreat characters (' x08 ')
B-escape character itself (' \ ')
CX-Control-X (= X & 037)
d [[:d Igit:]] 10 binary digits (' 0 '-' 9 ')
D [^[:d igit:]] Non-digital
E-Exit character (' x1b ')
F-page break (' x0c ')
m [[: <:]] Word start position
M [[:;:]] word End position
N-NewLine character (' x0a ')
R-Carriage return character (' x0d ')
s [[: Space:]] white space character
S [^[:space:]] non-whitespace characters
T-tab (' x09 ')
UX-16-bit UNICODE characters (x∈[0000). FFFF])
UX-32-bit UNICODE characters (x∈[00000000). FFFFFFFF])
V-Portrait tab (' x0b ')
w [[: Alnum:]_] The characters that make up a word
W [^[:alnum:]_] non-word characters
XX-8 bit character (x∈[00). FF])
Y-word boundary (m or m)
Y-Non-word boundary
Z-matches only the tail of the entire string, regardless of the current pattern
-NULL, NULL character
X-sub-expression forward reference (X∈[1:9])
XX-sub-expression forward reference or 8 binary representation of 8 characters
XXX-sub-expression forward reference or 8 binary representation of 8 characters

http://www.bkjia.com/PHPjc/632203.html www.bkjia.com true http://www.bkjia.com/PHPjc/632203.html techarticle This article introduces a detailed description of the sub-patterns of the regular expressions in PHP, and a friend who needs to know the sub-pattern of the expression in PHP is a reference. function mixed preg_replace (mixed pattern,...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.