PHP Regular Expression data

Source: Internet
Author: User
Tags alphabetic character php regular expression printable characters

Every PHP programmer knows that PHP has a strong regular expression function, in order to facilitate future work, I collated the information on the regular expression on the Internet, convenient for later work on the data access.
The regular expression (regular expression) describes a pattern of string matching that can be used to check whether a string contains a seed string, replaces a matched substring, or extracts a substring that matches a certain condition from a string.
For example, one of the most common uses of regular expressions is to verify that the e-mail addresses that users enter online are in the correct format. If the user's e-mail address is properly formatted with a regular expression, the form information that the user fills in will be processed normally, and if the user enters a message address that does not match the regular expression pattern, a prompt will be displayed asking the user to reenter the correct email address. This shows that the regular expression plays an important role in the logic judgment of Web application.

Regular Expressions

is a text pattern consisting of ordinary characters (such as characters A through Z) and special characters (called metacharacters). A regular expression, as a template, matches a character pattern to the string you are searching for.
You can construct a regular expression by putting various components of the expression pattern between a pair of delimiters, that is,/expression/

Normal characters

Consists of all printed and nonprinting characters that are not explicitly specified as metacharacters. This includes all uppercase and lowercase alphabetic characters, all numbers, all punctuation marks, and some symbols.

non-printable characters
Character Meaning
\cx Matches the control character indicated by X. For example, \cm matches a control-m or carriage return. The value of x must be one of a-Z or a-Z. Otherwise, c is treated as a literal ' C ' character.
\f Matches a page break. Equivalent to \x0c and \CL.
\ n Matches a line break. Equivalent to \x0a and \CJ.
\ r Matches a carriage return character. Equivalent to \x0d and \cm.
\s Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v].
\ t Matches a tab character. Equivalent to \x09 and \ci.
\v Matches a vertical tab. Equivalent to \x0b and \ck.

Special characters
The so-called special characters, which are some characters with special meanings, such as the * in "*.txt", simply means to represent any string. If you are looking for a file with * in the file name, you need to escape the *, which is preceded by a \. LS \*.txt. The regular expression has the following special characters.

Special characters Description
$ Matches the end position of the input string. If the Multiline property of the RegExp object is set, then $ also matches ' \ n ' or ' \ R '. To match the $ character itself, use \$.
( ) Marks the start and end positions of a subexpression. Sub-expressions can be obtained for later use. To match these characters, use \ (and \).
* Matches the preceding subexpression 0 or more times. To match the * character, use \*.
+ Matches the preceding subexpression one or more times. to match the + character, use \+.
. Matches any single character except for the newline character \ n. To match, please use \.
[ Marks the beginning of a bracket expression. to match [, please use \[.
? Matches the preceding subexpression 0 or one time, or indicates a non-greedy qualifier. to match? characters, use \?.
\ Marks the next character as either a special character, a literal character, a backward reference, or an octal escape. For example, ' n ' matches the character ' n '. ' \ n ' matches line breaks. The sequence ' \ \ ' matches ' \ ', while ' \ (' then Match ' (".
^ Matches the starting position of the input string, unless used in a square bracket expression, which indicates that the character set is not accepted at this time. To match the ^ character itself, use \^.
{ The start of the tag qualifier expression. To match {, use \{.
| Indicates a choice between the two items. to match |, please use \|.

The method of constructing a regular expression is the same as the method for creating a mathematical expression. That is, using a variety of meta-characters and operators to combine small expressions together to create larger expressions. A component of a regular expression can be a single character, a character set, a range of characters, a selection between characters, or any combination of all of these components.

Qualifier

Qualifiers are used to specify how many times a given component of a regular expression must appear to satisfy a match. There are 6 types of * or + or? or {n} or {n,} or {n,m}.
The *, +, and? Qualifiers are greedy because they match as many words as possible, but only after they are added with one? You can implement a non-greedy or minimal match.
The qualifiers for a regular expression are:

Character Describe
* Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * Equivalent to {0,}.
+ Matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but not "Z". + equivalent to {1,}.
? Matches the preceding subexpression 0 or one time. For example, "Do (es)?" can match "do" in "do" or "does".? Equivalent to {0,1}.
N N is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '.
{N,} N is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ', but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '.
{N,m} Both M and n are non-negative integers, where n <= m. Matches at least n times and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' O? '. Note that there can be no spaces between a comma and two numbers.

Locator characters
Used to describe the bounds of a string or word, ^ and $ refer to the beginning and end of the string, \b describes the front or back bounds of the word, and \b represents a non-word boundary. You cannot use qualifiers on locators.

Select

Enclose all selections in parentheses, separating the adjacent selections by |. But with parentheses there is a side effect that the associated match is cached and available at this time?: Put the first option to eliminate this side effect.
Where?: one of the non-capturing elements, and two non-capturing elements are? = and?!, these two also have more meanings, the former is forward pre-check, in any beginning to match the position of the regular expression pattern within the parentheses to match the search string, the latter is a negative pre-check, Matches the search string at any start where the regular expression pattern does not match.

Back to reference

Adding parentheses around a regular expression pattern or part of a pattern causes the related match to be stored in a temporary buffer, and each captured sub-match is stored according to what is encountered in the regular expression pattern from left to right. The buffer number of the storage sub-match is from
1 starts with a continuous numbering up to 99 sub-expressions. Each buffer can be accessed using ' \ n ', where n is a single or two-bit decimal number that identifies a particular buffer.
You can use the non-capturing metacharacters '?: ', '? = ', or '?! ' to ignore the save of the related match.

operator precedence for various operators

The operations of the same priority are left-to-right, and the operations of different priorities are higher and lower than before. The precedence of the various operators is from high to low as follows:

Operator Describe
\ Escape character
(), (?:), (?=), [] Parentheses and square brackets
*, +,?, {n}, {n,}, {n,m} Qualifier
^, $, \anymetacharacter Location and order
| "or" action

All symbols Explained

Character Describe
\ Marks the next character as a special character, or a literal character, or a backward reference, or an octal escape. For example, ' n ' matches the character "n". ' \ n ' matches a line break. The sequence ' \ \ ' matches "\" and "\ (" Matches "(".
^ Matches the starting position of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after ' \ n ' or ' \ R '.
$ Matches the end position of the input string. If the Multiline property of the RegExp object is set, $ also matches the position before ' \ n ' or ' \ R '.
* Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * Equivalent to {0,}.
+ Matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but not "Z". + equivalent to {1,}.
? Matches the preceding subexpression 0 or one time. For example, "Do (es)?" can match "do" in "do" or "does".? Equivalent to {0,1}.
N N is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '.
{N,} N is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ', but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '.
{N,m} Both M and n are non-negative integers, where n <= m. Matches at least n times and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' O? '. Note that there can be no spaces between a comma and two numbers.
? When the character immediately follows any other restriction (*, +,?, {n}, {n,}, {n,m}), the matching pattern is non-greedy. The non-greedy pattern matches the searched string as little as possible, while the default greedy pattern matches as many of the searched strings as possible. For example, for the string "oooo", ' o+? ' will match a single "O", while ' o+ ' will match all ' o '.
. Matches any single character except "\ n". To match any character including ' \ n ', use a pattern like ' [. \ n] '.
(pattern) Match pattern and get this match. The obtained matches can be obtained from the resulting Matches collection, the Submatches collection is used in VBScript, and the $0...$9 property is used in JScript. To match the parentheses character, use ' \ (' or ' \ ').
(?:p Attern) Matches pattern but does not get a matching result, which means that this is a non-fetch match and is not stored for later use. This is useful when using the "or" character (|) to combine parts of a pattern. For example, ' Industr (?: y|ies) is a more abbreviated expression than ' industry|industries '.
(? =pattern) Forward-checking matches the lookup string at the beginning of any string that matches the pattern. This is a non-fetch match, which means that the match does not need to be acquired for later use. For example, ' Windows (? =95|98| nt|2000) ' Can match Windows 2000 ', but does not match Windows 3.1 in Windows. Pre-checking does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, rather than starting with the character that contains the pre-check.
(?! Pattern A negative pre-check matches the lookup string at the beginning of any string that does not match the pattern. This is a non-fetch match, which means that the match does not need to be acquired for later use. For example ' Windows (?! 95|98| nt|2000) ' can match Windows 3.1 ', but does not match Windows 2000 in Windows. Pre-check does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, rather than starting with the character that contains the pre-check
X|y Match x or Y. For example, ' Z|food ' can match "z" or "food". ' (z|f) Ood ' matches "Zood" or "food".
[XYZ] The character set is combined. Matches any one of the characters contained. For example, ' [ABC] ' can match ' a ' in ' plain '.
[^XYZ] Negative character set. Matches any character that is not contained. For example, ' [^ABC] ' can match ' P ' in ' plain '.
[A-z] The character range. Matches any character within the specified range. For example, ' [A-z] ' can match any lowercase alphabetic character in the ' a ' to ' Z ' range.
[^a-z] A negative character range. Matches any character that is not in the specified range. For example, ' [^a-z] ' can match any character that is not within the range of ' a ' to ' Z '.
\b Matches a word boundary, which is the position between a word and a space. For example, ' er\b ' can match ' er ' in ' never ', but not ' er ' in ' verb '.
\b Matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '.
\cx Matches the control character indicated by X. For example, \cm matches a control-m or carriage return. The value of x must be one of a-Z or a-Z. Otherwise, c is treated as a literal ' C ' character.
\d Matches a numeric character. equivalent to [0-9].
\d Matches a non-numeric character. equivalent to [^0-9].
\f Matches a page break. Equivalent to \x0c and \CL.
\ n Matches a line break. Equivalent to \x0a and \CJ.
\ r Matches a carriage return character. Equivalent to \x0d and \cm.
\s Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v].
\ t Matches a tab character. Equivalent to \x09 and \ci.
\v Matches a vertical tab. Equivalent to \x0b and \ck.
\w Matches any word character that includes an underscore. Equivalent to ' [a-za-z0-9_] '.
\w Matches any non-word character. Equivalent to ' [^a-za-z0-9_] '.
\xn Match N, where n is the hexadecimal escape value. The hexadecimal escape value must be two digits long for a determination. For example, ' \x41 ' matches ' A '. ' \x041 ' is equivalent to ' \x04 ' & ' 1 '. ASCII encoding can be used in regular expressions:
\num Matches num, where num is a positive integer. A reference to the obtained match. For example, ' (.) \1 ' matches two consecutive identical characters.
\ n Identifies an octal escape value or a backward reference. n is a backward reference if \ n is preceded by at least one of the sub-expressions obtained. Otherwise, if n is the octal number (0-7), N is an octal escape value.
\nm Identifies an octal escape value or a backward reference. If at least NM has obtained a subexpression before \nm, then NM is a backward reference. If there are at least N fetches before \nm, then n is a backward reference followed by the literal m. If none of the preceding conditions are met, if both N and M are octal digits (0-7), then \nm will match the octal escape value nm.
\nml If n is an octal number (0-3) and both M and L are octal digits (0-7), the octal escape value NML is matched.
\un Match N, where N is a Unicode character represented by four hexadecimal digits. For example, \u00a9 matches the copyright symbol (?).

Some examples

Regular expressions Description
/\b ([a-z]+) \1\b/gi The position in which a word appears consecutively
/(\w+): \/\/([^/:]+) (: \d*)? ([^# ]*)/ Resolves a URL to a protocol, domain, port, and relative path
/^ (?: chapter| section) [1-9][0-9]{0,1}$/ Location of the position chapter
/[-a-z]/ A to Z a total of 26 letters plus one-number.
/ter\b/ Can match chapter, but not terminal
/\bapt/ Can match chapter, but not aptitude
/windows (? =95 |98 | NT)/ Can match Windows95 or Windows98 or WindowsNT, and when a match is found, the next search match starts from behind Windows.
^[_\.0-9a-z-][email protected] ([0-9a-z][0-9a-z-]+\.) +[a-z]{2,3}$ Email Legal Format Check
^[0-9]+$ Pure data Check
^[0-9a-z]{1}[0-9a-z\-]{0,19}$ User name check, letters and numbers start with only letters, numbers, bars

PHP Regular Expression data

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.