PHP regular expressions

Source: Internet
Author: User
Tags printable characters
Every PHP programmer knows that PHP has a powerful regular expression function. for the convenience of future work, I sorted out information about regular expressions on the Internet to facilitate future work. Regularexpression describes a string matching pattern, it can be used to check whether a string contains a seed string, replace matched substrings, or every PHP programmer knows that PHP has a powerful regular expression function for future convenience, I sorted out information about regular expressions on the Internet to facilitate future work.


A regular expression (regular expression) describes a string matching pattern, it can be used to check whether a string contains a seed string, replace matched substrings, or retrieve substrings that meet certain conditions from a string.


For example, the most common application of regular expressions is to verify whether the format of the email address entered by the user online is correct. If the regular expression is used to verify that the email address format is correct, the form information entered by the user will be processed normally. Otherwise, if the email address entered by the user does not match the regular expression mode, A prompt will pop up asking the user to re-enter the correct email address. It can be seen that regular expressions play an important role in the logic judgment of WEB applications.

Regular expression
It is a text mode consisting of common characters (such as characters a to z) and special characters (called metacharacters. A regular expression is used as a template to match a character pattern with the searched string.

You can construct a regular expression by adding various components in expression mode between a pair of delimiters, that is,/expression/
Common characters
It consists of all the print and non-print characters that are not explicitly specified as metacharacters. This includes all uppercase and lowercase letter characters, all numbers, all punctuation marks, and some symbols.
Non-printable characters
Character Description
\ Cx Match the control characters specified by x. For example, \ cM matches a Control-M or carriage return character. The value of x must be either a A-Z or a-z. Otherwise, c is treated as an original 'C' character.
\ F Match a form feed. It is equivalent to \ x0c and \ cL.
\ N Match a linefeed. It is equivalent to \ x0a and \ cJ.
\ R Match a carriage return. It is equivalent to \ x0d and \ cM.
\ S Matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v].
\ S Match any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v].
\ T Match a tab. It is equivalent to \ x09 and \ cI.
\ V Match a vertical tab. It is equivalent to \ x0b and \ cK.

Special characters
A special character is a character with special meanings, such as *. txt. In short, it represents the meaning of any string. If you want to find a file with * in the file name, you need to escape *, that is, add a \ before it \. Ls \ *. txt. Regular expressions have the following special characters.
Special characters Description
$ Matches the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches '\ n' or' \ r '. To match the $ character, use \ $.
() Mark the start and end positions of a subexpression. Subexpressions can be obtained for future use. To match these characters, use \ (and \).
* Matches the previous subexpression zero or multiple times. To match * characters, use \*.
+ Match the previous subexpression once or multiple times. To match + characters, use \ +.
. Match any single character except linefeed \ n. To match., use \.
[ Mark the start of a bracket expression. To match [, use \[.
? Match the previous subexpression zero or once, or specify a non-greedy qualifier. To match? Character, use \?.
\ Mark the next character as or a special character, or a literal character, or backward reference, or an octal escape character. For example, 'n' matches the character 'n '. '\ N' matches the line break. The sequence '\' Matches "\", while '\ (' matches "(".
^ Matches the start position of the input string. unless used in the square brackets expression, this character set is not accepted. To match the ^ character itself, use \ ^.
{ Mark the start of a qualifier expression. To match {, use \{.
| Specifies a choice between two items. To match |, use \ |.

The method for constructing a regular expression is the same as that for creating a mathematical expression. That is, a larger expression is created by combining a small expression with a variety of metacharacters and operators. The regular expression component can be a single character, character set combination, character range, choice between characters, or any combination of all these components.
Qualifier
A qualifier is used to specify how many times a given component of a regular expression must appear to match. There are * or + OR? There are 6 types: {n}, {n,}, or {n, m.
*, +, And? The delimiters are greedy because they will match as many words as possible, and only add one? You can achieve non-greedy or minimum matching.
Regular expressions have the following delimiters:
Character Description
* Matches the previous subexpression zero or multiple times. For example, zo * can match "z" and "zoo ". * Is equivalent to {0 ,}.
+ Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "zoo", but cannot match "z ". + Is equivalent to {1 ,}.
? Match the previous subexpression zero or once. For example, "do (es )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }.
{N} N is a non-negative integer. Match n times. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food.
{N ,} N is a non-negative integer. Match at least n times. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent to 'O + '. 'O {0,} 'is equivalent to 'O *'.
{N, m} Both m and n are non-negative integers, where n <= m. Match at least n times and at most m times. For example, "o {1, 3}" matches the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'O? '. Note that there must be no space between a comma and two numbers.

Operator
It is used to describe the boundary of a string or word. ^ and $ respectively refer to the start and end of a string, \ B describes the boundary before or after a word, and \ B indicates non-word boundary. The delimiters cannot be used.
Select
Enclose all Selection items with parentheses, and separate adjacent selection items with |. But there is a side effect when parentheses are used, that is, the related matching will be cached. is it available now? : Put the first option to eliminate this side effect.
Where? : One non-capturing element, and two non-capturing elements? = And ?!, The two have more meanings. The former is forward pre-query, and matches the search string at any position starting to match the regular expression pattern in parentheses. The latter is negative pre-query, match the search string at any position that does not match the regular expression pattern.
Backward reference
Adding parentheses on both sides of a regular expression or partial expression will cause the matching to be stored in a temporary buffer, each captured sub-match is stored in the content from left to right in the regular expression mode. Store the sub-matched buffer numbers from
Start from 1, serial number until a maximum of 99 subexpressions. Each buffer zone can be accessed using '\ n', where n is one or two decimal digits that identify a specific buffer zone.
Can I use non-captured metacharacters '? :','? = ', Or '?! 'To ignore the save of the matching.
Operation priority of various operators
Operations with the same priority are performed from left to right. operations with different priorities are first high and then low. The priorities of operators are as follows:
Operator Description
\ Escape character
(),(? :),(? =), [] Parentheses and square brackets
*, + ,?, {N}, {n ,}, {n, m} Qualifier
^, $, \ Anymetacharacter Location and order
| "Or" operation

All symbolic interpretations
Character Description
\ Mark the next character as a special character, a literal character, or a backward reference, or an octal escape character. For example, 'n' matches the character "n ". '\ N' matches a line break. The sequence '\' Matches "\" and "\ (" matches "(".
^ Matches the start position of the input string. If the Multiline attribute of the RegExp object is set, ^ matches the position after '\ n' or' \ r.
$ Matches the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches the position before '\ n' or' \ r.
* Matches the previous subexpression zero or multiple times. For example, zo * can match "z" and "zoo ". * Is equivalent to {0 ,}.
+ Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "zoo", but cannot match "z ". + Is equivalent to {1 ,}.
? Match the previous subexpression zero or once. For example, "do (es )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }.
{N} N is a non-negative integer. Match n times. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food.
{N ,} N is a non-negative integer. Match at least n times. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent to 'O + '. 'O {0,} 'is equivalent to 'O *'.
{N, m} Both m and n are non-negative integers, where n <= m. Match at least n times and at most m times. For example, "o {1, 3}" matches the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'O? '. Note that there must be no space between a comma and two numbers.
? When this character is followed by any other delimiter (*, + ,?, The matching mode after {n}, {n ,}, {n, m}) is not greedy. The non-greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible. For example, for strings "oooo", 'O ++? 'Will match a single "o", and 'O +' will match all 'o '.
. Matches any single character except "\ n. To match any character including '\ n', use a pattern like' [. \ n.
(Pattern) Match pattern and obtain this match. The obtained match can be obtained from the generated Matches set. the SubMatches set is used in VBScript, and $0… is used in JScript... $9 attribute. To match the parentheses, use '\ (' or '\)'.
(? : Pattern) Matches pattern but does not get the matching result. that is to say, this is a non-get match and is not stored for future use. This is useful when you use the "or" character (|) to combine each part of a pattern. For example, 'industr (? : Y | ies) is a simpler expression than 'industry | industries.
(? = Pattern) Forward pre-query: matches the search string at the beginning of any string that matches the pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (? = 95 | 98 | NT | 2000) 'can match "Windows" in "Windows 2000", but cannot match "Windows" in "Windows 3.1 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
(?! Pattern) Negative pre-query: matches the search string at the beginning of any string that does not match pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (?! 95 | 98 | NT | 2000) 'can match "Windows" in "Windows 3.1", but cannot match "Windows" in "Windows 2000 ". Pre-query does not consume characters. that is to say, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
X | y Match x or y. For example, 'Z | food' can match "z" or "food ". '(Z | f) ood' Matches "zood" or "food ".
[Xyz] Character Set combination. Match any character in it. For example, '[abc]' can match 'A' in "plain '.
[^ Xyz] Negative value character set combination. Match any character not included. For example, '[^ abc]' can match 'P' in "plain '.
[A-z] Character range. Matches any character in the specified range. For example, '[a-z]' can match any lowercase letter in the range of 'A' to 'Z.
[^ A-z] Negative character range. Matches any character that is not within the specified range. For example, '[^ a-z]' can match any character that is not in the range of 'A' to 'Z.
\ B Match a word boundary, that is, the position between a word and a space. For example, 'er \ B 'can match 'ER' in "never", but cannot match 'ER 'in "verb '.
\ B Match non-word boundary. 'Er \ B 'can match 'ER' in "verb", but cannot match 'ER 'in "never '.
\ Cx Match the control characters specified by x. For example, \ cM matches a Control-M or carriage return character. The value of x must be either a A-Z or a-z. Otherwise, c is treated as an original 'C' character.
\ D Match a numeric character. It is equivalent to [0-9].
\ D Match a non-numeric character. It is equivalent to [^ 0-9].
\ F Match a form feed. It is equivalent to \ x0c and \ cL.
\ N Match a linefeed. It is equivalent to \ x0a and \ cJ.
\ R Match a carriage return. It is equivalent to \ x0d and \ cM.
\ S Matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v].
\ S Match any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v].
\ T Match a tab. It is equivalent to \ x09 and \ cI.
\ V Match a vertical tab. It is equivalent to \ x0b and \ cK.
\ W Match any word characters that contain underscores. It is equivalent to '[A-Za-z0-9 _]'.
\ W Match any non-word characters. It is equivalent to '[^ A-Za-z0-9 _]'.
\ Xn Match n, where n is the hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers. For example, '\ x41' matches "". '\ X041' is equivalent to '\ x04' & "1 ". The regular expression can use ASCII encoding ..
\ Num Matches num, where num is a positive integer. References to the obtained matching. For example, '(.) \ 1' matches two consecutive identical characters.
\ N Identifies an octal escape value or a backward reference. If at least n subexpressions are obtained before \ n, n is backward referenced. Otherwise, if n is an octal digit (0-7), n is an octal escape value.
\ Nm Identifies an octal escape value or a backward reference. If at least one child expression is obtained before \ nm, the nm is backward referenced. If at least n records are obtained before \ nm, n is a backward reference followed by text m. If none of the preceding conditions are met, if n and m are octal numbers (0-7), \ nm matches the octal escape value nm.
\ Nml If n is an octal number (0-3) and m and l are octal numbers (0-7), the octal escape value nml is matched.
\ Un Match n, where n is a Unicode character represented by four hexadecimal numbers. For example, \ u00A9 matches the copyright symbol (?).

Some examples
Regular expression Description
/\ B ([a-z] +) \ 1 \ B/gi Position where a word appears consecutively
/(\ W +): \/([^/:] +) (: \ d *)? ([^ #] *)/ Parses a URL into a protocol, domain, port, and relative path.
/^ (? : Chapter | Section) [1-9] [0-9] {0, 1} $/ Locate the unit
/[-A-z]/ A to z contains 26 letters plus A-number.
/Ter \ B/ Can match chapter, but cannot terminal
/\ Bapt/ Can match chapter, but not aptitude
/Windows (? = 95 | 98 | NT )/ It can match Windows 95, Windows 98, or Windows SNT. after a match is found, the next retrieval match starts after Windows.
^ [_ \. 0-9a-z-] + @ ([0-9a-z] [0-9a-z-] + \.) + [a-z] {2, 3} $ Check the valid Email format
^ [0-9] + $ Pure Data check
^ [0-9a-z] {1} [0-9a-z \-] {0, 19} $ User name check, starts with letters and numbers, and can only contain letters, numbers, and horizontal bars
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.