PHP5 regular expression

Source: Internet
Author: User
Tags printable characters expression engine
Regular expressions (RegularExpression, abbreviated as regexp, regex or regxp), also known as regular expressions, regular expressions or regular expressions, or regular expressions, it refers to a single regular expression used to describe or match a series of strings that conform to a certain syntax rule.

Introduction

Regular Expression (Regular Expression, abbreviated as regexp, regex or regxp), also known as Regular Expression, Regular Expression or Regular Expression, or Regular Expression, it refers to a single string used to describe or match a series of strings that conform to a certain syntax rule. In many text editors or other tools, regular expressions are usually used to retrieve and/or replace text content that meets a certain pattern. Many programming languages support string operations using regular expressions. For example, a powerful regular expression engine is built in Perl. The concept of regular expressions was initially popularized by tools in Unix (such as sed and grep. (From Wikipedia)

PHP uses two sets of regular expression rules at the same time. one set is POSIX Extended 1003.2 compatible with regular expressions developed by the Institute of Electrical and Electronics Engineers (IEEE) (in fact, PHP does not fully support this standard ), another set of Perl-Compatible Regular expressions from the PCRE (PERL Compatible Regular Expression) library is open source software written by Philip Hazel.

Functions using POSIX compatibility rules include:
Ereg_replace ()
Ereg ()
Eregi ()
Eregi_replace ()
Split ()
Spliti ()
SQL _regcase ()
Mb_ereg_match ()
Mb_ereg_replace ()
Mb_ereg_search_getpos ()
Mb_ereg_search_getregs ()
Mb_ereg_search_init ()
Mb_ereg_search_pos ()
Mb_ereg_search_regs ()
Mb_ereg_search_setpos ()
Mb_ereg_search ()
Mb_ereg ()
Mb_eregi_replace ()
Mb_eregi ()
Mb_regex_encoding ()
Mb_regex_set_options ()
Mb_split ()

Functions using PERL compatibility rules include:
Preg_grep ()
Preg_replace_callback ()
Preg_match_all ()
Preg_match ()
Preg_quote ()
Preg_split ()
Preg_replace ()

Delimiters:

POSIX compatible regular expressions do not have delimiters, and the corresponding parameters of the function are considered regular expressions.

PERL-compatible regular expressions can use any character that is not a letter, number, or backslash (\) as the delimiter. if the character used as the delimiter must be used in the expression itself, escape with the backslash. You can also use (), {}, [], and <> as delimiters.

Modifier:

POSIX compatible regular expressions do not have modifiers.

Possible modifiers in PERL-compatible regular expressions (space and line feed in the modifier are ignored, and other characters may cause errors ):

I(PCRE_CASELESS ):
Case insensitive for matching.

M(PCRE_MULTILINE ):
When this modifier is set, in addition to matching the beginning and end of the entire string, the row start (^) and the row end ($) also match the line break (\ n) respectively) and before.

S(PCRE_DOTALL ):
If this modifier is set, the DOT metacharacters (.) in the pattern match all characters, including line breaks. If this parameter is not set, line breaks are not included.

X(PCRE_EXTENDED ):
If this modifier is set, the white space characters in the mode are ignored in addition to escaping or in the character class.

E:
If this modifier is set, preg_replace () replaces the reverse reference in the replacement string as a normal replacement, evaluate it as the PHP code, and use the result to replace the searched string. This modifier is only used by preg_replace (), which is ignored by other PCRE functions.

A(PCRE_ANCHORED ):
If this modifier is set, the pattern is forced to "anchored", that is, it is forced to match only from the beginning of the target string.

D(PCRE_DOLLAR_ENDONLY ):
If this modifier is set, the row end ($) in the mode matches only the end of the target string. Without this option, if the last character is a line break, it will also be matched. If the m modifier is set, ignore this option.

S:
When a mode is used several times, it is worth analyzing for acceleration matching. If this modifier is set, additional analysis is performed. Currently, the analysis mode is only useful for non-anchored modes without a single fixed start character.

U(PCRE_UNGREEDY ):
Make "?" Is greedy by default.

X(PCRE_EXTRA ):
Any backslash followed by a letter with no special meaning in the pattern causes an error, so that this combination is retained for future expansion. By default, a backslash followed by a letter without special meaning is treated as the letter itself.

U(PCRE_UTF8 ):
The pattern string is treated as a UTF-8.

Logical separation:

POSIX is compatible with regular expressions and PERL is compatible with regular expressions. the functions and usage of logical separation symbols are exactly the same:
[]: Contains information about any operation.
{}: Contains information about the number of matches.
(): Contains information about a logical interval, which can be used for reference operations.
|: "Or", [AB] and a | B are equivalent.

Metacharacters are related:

There are two groups of different metacharacters: One is that the pattern can be recognized in addition to square brackets, and the other is recognized in square brackets.

POSIX is compatible with regular expressions and PERL. it is compatible with metacharacters other than regular expressions:
\Common escape characters for several purposes
^Match the start of a string
$Matches the end of a string
?Match 0 or 1
*Matches 0 or more characters of the specified type.
+Matches one or more characters of the specified type.

POSIX compatible with regular expressions and PERL compatible with metacharacters other than regular expressions "[]" inconsistent:
.PERL-compatible regular expression matching any character except line breaks
.POSIX compatible with regular expressions to match any character

POSIX is compatible with regular expressions and PERL. it is compatible with metacharacters within the regular:
\Common escape characters for several purposes
^It is valid only when it is the first character.
-Specify the ASCII range of characters and study the ASCII code carefully. you will find that [W-c] is equivalent to [WXYZ \ ^ _ 'ABC]

POSIX is compatible with regular expressions and PERL. it is compatible with metacharacters that are "inconsistent" within the regular expression:
-[A-c-e] in POSIX compatible regular expressions will throw an error.
-In PERL compatible regular expressions, the [a-c-e] is equivalent to [a-e].

The number of matches is related:

POSIX compatible regular expressions and PERL Compatible Regular expressions are exactly the same in terms of matching times:
{2}: Match the first character twice
{2 ,}: Match the first character twice or multiple times. by default, the match is greedy (as many as possible ).
{2, 4}: Match the first character twice or four times

The logical interval is related:

The region contained in () is a logical interval. The main function of a logical interval is to reflect the logical order of some characters, another use can be used for reference (values in this range can be referenced to a variable ). The latter has a special role:

$ Str = "http://www.163.com /";
// POSIX compatible regular expressions:
Echo ereg_replace ("(. +)", "\ 1> \ 1", $ str );
// PERL compatible with regular expressions:
Echo preg_replace ("/(. +)/", "$1", $ str );
// Display two links
?>

When referencing, parentheses can be nested, and the logical order is calibrated according to the order in which ("appears.

Type match:

POSIX compatible regular expressions:
[: Upper:]: Match all uppercase letters
[: Lower:]: Match all lowercase letters
[: Alpha:]: Match all letters
[: Alnum:]: Match all letters and numbers
[: Digit:]: Match all numbers
[: Xdigit:]: Match all hexadecimal characters, equivalent to [0-9A-Fa-f]
[: Punct:]: Match all punctuation marks, equivalent [.,"'?!; :]
[: Blank:]: Match spaces and tabs, equivalent to [\ t]
[: Space:]: Match all blank characters, equivalent to [\ t \ n \ r \ f \ v]
[: Cntrl:]: Match all control characters between ASCII 0 and 31.
[: Graph:]: Match all printable characters, equivalent to: [^ \ t \ n \ r \ f \ v]
[: Print:]: Match all printable characters and spaces, equivalent to: [^ \ t \ n \ r \ f \ v]
[. C.]: Unknown function
[= C =]: Unknown function
[: <:]: Match the start of a word
[:>:]: Match the end of a word

PERL is compatible with regular expressions (here we can see that PERL regular expressions are powerful ):
\Alarm, that is, the BEL character ('0)
\ Cx"Control-x", where x is any character
\ EEscape ('0b)
\ FForm Feed ('0c)
\ NNewline ('0a)
\ RCarriage return ('0d)
\ TTab ('0)
\ XhhThe hexadecimal code is a hh character.
\ DddThe octal code is a ddd character or backreference
\ DAny 10-digit number
\ DAny non-decimal character
\ SAny blank character
\ SAny non-blank character
\ WAny "word" character
\ WAny non-word character
\ BWord demarcation line
\ BNon-word line
\The start of the target (independent from the multiline mode)
\ ZThe end of the target or the line break at the end (independent from the multiline mode)
\ ZEnd of the target (independent from the multiline mode)
\ GThe first matching position in the target

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.