Basic rules for PHP regular expressions

Source: Internet
Author: User
Tags alphabetic character

Basic knowledge of regular expressions:

\ marks the next character as a special character, or a literal character, or a backward reference, or an octal escape character.
For example, ' n ' matches the character "n". ' \ n ' matches a line break. The sequence ' \ \ ' matches "\" and "\ (" Matches "(".
^ matches the starting position of the input string. If the Multiline property of the RegExp object is set, ^ also matches ' \ n '
or ' \ r ' after the position.
$ matches the end position of the input string. If the Multiline property of the RegExp object is set, $ also matches ' \ n '
or ' \ r ' before the position.
* matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * Equivalent to {0,}.
+ matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but not "Z".
+ equivalent to {1,}.
? Matches the preceding subexpression 0 or one time. For example, "Do (es)?" can match "do" in "do" or "does".
? Equivalent to {0,1}.
{n} n is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can
Match two o in "food".
{N,} n is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ', but can match
All o in "Foooood". ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '.
{n,m} m and n are non-negative integers, where n <= m. Matches at least n times and matches up to M times. For example, "o{1,3}"
The first three o in the "Fooooood" will be matched. ' o{0,1} ' is equivalent to ' O? '. Note that there can be no spaces between a comma and two numbers.
? When the character immediately follows any other restriction (*, +,?, {n}, {n,}, {n,m}), the matching pattern is non-
Greedy. The non-greedy pattern matches the searched string as little as possible, while the default greedy pattern matches as much of the search as possible
The string. For example, for the string "oooo", ' o+? ' will match a single "O", while ' o+ ' will match all ' o '.
. Matches any single character except "\ n". To match any character including ' \ n ', use a pattern like ' [. \ n] '.
X|y matches x or Y. For example, ' Z|food ' can match "z" or "food". ' (z|f) Ood ' matches "Zood" or "food".
[XYZ] Character set. Matches any one of the characters contained. For example, ' [ABC] ' can match ' a ' in ' plain '.
[^XYZ] negative character set. Matches any character that is not contained. For example, ' [^ABC] ' can match ' P ' in ' plain '.
A [A-z] character range. Matches any character within the specified range. For example, ' [A-z] ' can match ' a ' to ' Z ' in the range
Any lowercase alphabetic character.
[^a-z] negative character range. Matches any character that is not in the specified range. For example, ' [^a-z] ' can match any
Any character within the range ' a ' to ' Z '.
\b Matches a word boundary, which is the position between a word and a space. For example, ' er\b ' can match "never" in
' er ', but cannot match ' er ' in ' verb '.
\b Matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '.
\CX matches the control character indicated by X. For example, \cm matches a control-m or carriage return. The value of x must be
One A-Z or a-Z. Otherwise, c is treated as a literal ' C ' character.
\d matches a numeric character. equivalent to [0-9].
\d matches a non-numeric character. equivalent to [^0-9].
\f matches a page break. Equivalent to \x0c and \CL.
\ n matches a line break. Equivalent to \x0a and \CJ.
\ r matches a carriage return character. Equivalent to \x0d and \cm.
\s matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s matches any non-whitespace character. equivalent to [^ \f\n\r\t\v].
\ t matches a tab character. Equivalent to \x09 and \ci.
\v matches a vertical tab. Equivalent to \x0b and \ck.
\w matches any word character that includes an underscore. Equivalent to ' [a-za-z0-9_] '.
\w matches any non-word character. Equivalent to ' [^a-za-z0-9_] '.
\XN matches N, where n is the hexadecimal escape value. The hexadecimal escape value must be two digits long for a determination. For example
' \x41 ' matches ' A '. ' \x041 ' is equivalent to ' \x04 ' & ' 1 '. ASCII encoding can be used in regular expressions:
\num matches num, where num is a positive integer. A reference to the obtained match. For example, ' (.) \1 ' match
Two consecutive characters of the same character.
\ n identifies an octal escape value or a backward reference. n is backward if \ n is at least one of the obtained sub-expressions before
Reference. Otherwise, if n is the octal number (0-7), N is an octal escape value.
\NM identifies an octal escape value or a backward reference. If at least NM has obtained a subexpression before \nm, the NM is
Backward reference. If there are at least N fetches before \nm, then n is a backward reference followed by the literal m. If the previous
Conditions are not satisfied, if both N and M are octal digits (0-7), then \nm will match the octal escape value nm.
\NML if n is an octal number (0-3) and both M and L are octal digits (0-7), the octal escape value NML is matched.
\un matches N, where N is a Unicode character represented by four hexadecimal digits. For example, \u00a9 matches the copyright symbol (?).

Sub-mode:


To match the parentheses character, use ' \ (' or ' \ ').
(?:p Attern) matches the pattern but does not get a matching result, which means that this is a non-fetch match,
Do not store for later use. This uses the "or" character (|) to combine parts of a pattern

A more abbreviated expression.
(? =pattern) forward, matching the lookup string at the beginning of any string that matches the pattern.
This is a non-fetch match, which means that the match does not need to be acquired for later use. For example
' Windows (? =95|98| nt|2000) ' Can match ' Windows 2000 ' in Windows,
But you cannot match windows in Windows 3.1. Pre-checking does not consume characters, which means that
After a match occurs, the next matching search starts immediately after the last match, not from
Starts after the character that contains the pre-check.
(?! pattern) matches the lookup string at the beginning of any string that does not match the pattern.
This is a non-fetch match, which means that the match does not need to be acquired for later use. For example ' Windows (?! 95|98| nt|2000) '
Can match windows in Windows 3.1, but it does not match Windows 2000 in Windows.
Pre-check does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match,
Instead of starting with the character that contains the pre-check

Sub-pattern matching:

<?php $subject = "Im Ap<pple>la"; $pattern = "/< (. *) >/"; $match   = []; $result = Preg_match ($pattern, $ Subject, $match);p Rint_r ($match); Var_dump ($result);

Output:

Do not capture sub-patterns:

<?php $subject = "Im Ap<pple>la"; $pattern = "/< (?:. *) >/"; $match   = []; $result = Preg_match ($pattern , $subject, $match);p Rint_r ($match); Var_dump ($result);

Forward pre-check: Before matching, check whether the expression conforms to the rules in parentheses, non-capturing matches

<?php $subject = "Im Appple"; $pattern = "/P (? =p)/"; $match   = []; $result = Preg_match ($pattern, $subject, $match); Print_r ($match); Var_dump ($result);

Output: Note this is ' P '

Forward mismatch

<?php $subject = "Im Appple"; $pattern = "/P (?!) p) le/"; $match   = []; $result = Preg_match ($pattern, $subject, $match);p Rint_r ($match); Var_dump ($result);

Output:

Reverse Pre-check (? <=pattern) reverse match. (? <=j) A, the match follows the letter J followed by a, the result Java6 Java7 (? <!pattern) reverse mismatch. (? <! j) A, mismatch immediately after the letter J, result Java6 Java7

Reverse pre-check: Before matching, ensure that the preceding sub-mode conditions, that is, and sub-mode matching.

<?php $subject = "Im Appple"; $pattern = "/(? <=p) le/"; $match   = []; $result = Preg_match ($pattern, $subject, $ma TCH);p Rint_r ($match); Var_dump ($result);

Run:

Reverse mismatch:

<?php $subject = "Im Appple"; $pattern = "/A (? <!p) p/"; $match   = []; $result = Preg_match ($pattern, $subject, $ma TCH);p Rint_r ($match); Var_dump ($result);

Run:

Greedy mode

.*? The regular engine is greedy by default, and when "*" is present, it tries to match the string as long as possible a possible solution to fix the above problem is to replace greed with "*" inertia. You can follow the "*" followed by a question mark "?" To reach this point this tells the regular engine to repeat the last character as little as possible

Pass? The number can be used to indicate the minimum possible length.

<?php $subject = "Im Appple"; $pattern = "/im.*p/"; $pattern _no = "/im.*?p/"; $match   = []; $match _no = [];p reg_match ($pattern, $subject, $match);p Reg_match ($pattern _no, $subject, $match _no);p rint_r ($match);p rint_r ($match _no);

Run:

Basic rules for PHP regular expressions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.