PHP Regular expression notes, PHP Regular expression Note _php tutorial

Source: Internet
Author: User
Tags modifiers php and mysql php regular expression posix repetition alphanumeric characters expression engine

PHP Regular expression notes, PHP regular expression notes


What is a regular expression

On the computer we often use (wildcards) to find the files we need, for example: *.doc here the * representative matches 0 or more characters. Regular expressions are also tools used for text matching, except that they are more powerful. Quote a sentence from the PHP manual: a Regular expression is a pattern that matches a target string from left to right, and most characters themselves represent a pattern that matches their own.

Here are a few simple examples to make a preliminary understanding of regular expressions.

Hi  //Match English characters (ignoring case) Hi, Hi, Hi, hi
\bhi\b  //Match the English word hi  ' \b ' is a special character in the regular (an assertion) that represents the word boundary
\bhi\b.*\blucy\b  //Match such as: ' Hi my name is Lucy '  . ' indicates that any character other than the line break  ' is a quantifier, which means repeating 0 or more times
0\D{2}-\D{8}  //matches such as: 020-12345678  ' \d ' matches a number (0-9)    ' {n} ' repeated n times, such as {2} {8}

In the above example,,,, \b . there are * \d {2} special meanings, which are described below.

PHP Regular syntax

1. Introduction

The two types of regular support in PHP are POSIX and PCRE, respectively. POSIX Regular expression extensions have been deprecated since PHP 5.3.0. So the following discussion is based on the PCRE model. You can click to see what's different from POSIX regular expressions and what's different from Perl.

2. Separators

When using the PCRE function, the pattern needs to be enclosed by a delimiter. The delimiter can make any non-alphanumeric, non-backslash, non-whitespace character. Frequently used delimiters are forward slashes / , hash symbols, # and inverse symbols ~ . The following example is a pattern that uses a valid delimiter.

/foo bar/#^[^0-9]$#+php+%[a-za-z0-9_-]%

If a delimiter needs to be matched within a pattern, it must be escaped with a backslash. If delimiters often appear within a pattern, a better choice is to use other separators to improve readability. Cases:

/http:\/\//#http://#

3. Meta-Characters

The power of a regular expression is derived from its ability to choose and repeat in a pattern. Some characters are given special meanings so that they are no longer purely representative of themselves, and this coded character in the pattern is called a meta-character .
There are two different metacharacters: one can be used anywhere outside of the pattern brackets, and the other one needs to be used inside square brackets.

The metacharacters used outside the square brackets are as follows:

Code Description
/ Generally used to escape characters
^ Assert the starting position of the target (or the beginning of a line in multiline mode)
$ Assert the end position of the target (or end of line in multi-line mode)
. Match any character except line break (default)
[ Start character class definition
] End character class definition
| Start an optional branch
( The start tag of the child group
) The end tag of the child group
? A: As a quantifier, it represents 0 or 1 matches. B: The greedy character that is used to change quantifiers after quantifiers.
* quantifier, 0 or more matches
+ quantifier, 1 or more matches
{ Custom quantifier start tag
} Custom quantifier end Tag

The part of the pattern in brackets is called the "character class". Only the following meta characters are available in a character class:

Code Description
\ Escape character
^ Indicates that the character class is reversed only when it is the first character (in square brackets)
- Mark Character Range

Example:

    • \ba\w*\bMatches a word beginning with the letter A, first \b at the beginning of a word, followed by the letter A, then any number of any word characters (word character refers to any letter, number, underscore) \w*, and finally the end of the word \b.
    • \d+Match 1 or more consecutive digits.
    • ^\d{5,12}$Matches are 5-bit to 12-bit numbers, because ^ and $ are used, so the entire string entered is used to match \d{5,12}, meaning that the entire input must be 5 to 12 digits.

4. Escape sequence (backslash)

\There are four ways to use backslashes, and you can click escape sequences in detail (backslash)

"1" is an escape character, for example, if you want to match a * character, you need to write it in the pattern \* . This applies if a character has special meanings without escaping. However, for non-alphanumeric characters, it is safe to declare that it represents itself by adding a backslash in front of it when it needs to match the original. If you want to match a backslash, use it in the pattern \\ .
Backslashes have special meanings in both single-quote strings and double-quote strings, so to match a backslash, the pattern must be written as \\\\ . The reason: first it acts as a string, and the backslash is escaped. The last regular expression engine also considers the backslash to be escaped. Therefore, 4 backslashes are required to match a backslash.

"2" provides a means of controlling the visible encoding of nonprinting characters

"3" is used to describe a particular character class

Code Description
\d arbitrary decimal digits
\d Any non-decimal number
\h Any horizontal whitespace character (since PHP 5.2.4)
\h Any non-horizontal whitespace character (since PHP 5.2.4)
\s Any whitespace character
\s Any non-whitespace character
\v Any vertical whitespace character (since PHP 5.2.4)
\v Any non-vertical whitespace character (since PHP 5.2.4)
\w Any word character, word character refers to any letter, number, underline.
\w Any non-word character

"4" Some simple assertions. An assertion specifies a condition that must be matched at a particular location, and they do not consume any characters from the target string. Backslash assertions include:

    • \bWord boundaries
    • \BNon-word boundary
    • \AThe starting position of the target (independent of multiline mode)
    • \ZThe end position or line break at the end of the target (independent of multiline mode)
    • \zEnd position of the target (independent of multiline mode)
    • \GFirst match position in target

5. Repetition/quantifier

Code Description
* Repeat 0 or more times, equivalent to
+ Repeat one or more times, equivalent to
? Repeat 0 or one time, equivalent to
N Repeat n times
{N,} Repeat N or more times
{N,m} Repeat N to M times

Quantifiers are "greedy" by default, meaning that they will match as many characters as possible (up to the maximum number of matches) without causing the pattern match to fail. However, if a quantifier follows a ? tag, it becomes lazy (non-greedy) mode, it no longer matches as much as possible, but matches as little as possible.
Take a look at the example below to understand how "greedy" and "non-greedy" patterns are going.

For the string "aatest1bbtest2 cc" Regular expression ". *"    match result "test1  BBtest2"Regular expression". *?  "   match result"test1"

For more "greedy" and "non-greedy" modes, see http://php.net/manual/zh/regexp.reference.repetition.php

6. Character class (square brackets)

Description in the PHP manual:

    • The opening parenthesis begins with a description of the character class and ends with square brackets. A separate right parenthesis does not have a special meaning. If a right parenthesis is required as a member of a character class, it can be written in the first character of the prompt (if the ^ is reversed, then the second) or the escape character is used.

    • A character class matches a single character in the target string, which must be one of the character sets defined in the character class, unless a ^ is used to reverse the character class. If ^ needs to be a member of a character class, make sure it is not the first character of the character class, or escape it.

Example:

[Aeiou]    Matches all lowercase vowel letters [^aeiou]   //matches all non-vowel characters [.?!]      Match punctuation (. or? or!)

Note: ^ Just a convenience symbol that specifies the characters that do not exist in the character class by enumerating. Instead of asserting, it will still consume one character from the target string, and if the current match point is at the end of the target string, the match will fail.

Easily specify a range of characters, with the range manipulation sorted in ASCII collation. They can be used to specify values for characters, such as [\000-\037]

[0-9]    The meaning of the representation is exactly the same as ' \d ' [a-z0-9a-z_]    //is exactly equivalent to ' \w ' if only the English language is considered

The following is a more complex expression \(?0\d{2}[) -]?\d{8}
This expression can match phone numbers in several formats, such as (010) 88886666, or 022-22334455, or 02912345678.
Simple analysis: First is an escape character \( , it can appear 0 or 1 times ? , then a number 0, followed by 2 digits \d{2} , then a or ) - "space" in one, it appears 0 or 1 times, and finally is 8 digits \d{8} .

7. Branch (|)

The vertical bar character is used to detach the optional path in the pattern. such as pattern gilbert|Sullivan matching "Gilbert" or "Sullivan". The vertical bar can have any number of occurrences in the pattern, and allows for an optional path that is empty (matches an empty string). The matching processing attempts each of the optional paths from left to right, and uses the first successfully matched one. If the optional path is in a subgroup (defined below), a successful match means that both the branch in the sub-pattern and the other part of the main mode are matched.

Look back at an example above \(?0\d{2}[) -]?\d{8} this regular can also match 010) 12345678 or (022-87654321 such an "incorrect" format. In fact, we can use the branch to solve this problem, as follows:

\({1}0\d{2}\){1}[- ]?\d{8}|0\d{2}[- ]?\d{8}This expression matches the phone number of the 3-bit area code, where the area code can be enclosed in parentheses or not, the area code and the local number can be separated by a hyphen or space, or there can be no interval.

When using branching conditions, be aware of the order of each condition

8. Internal option settings

It is possible that regular expressions may not match the same results under different pattern modifiers. It's syntax is:(?修饰符)

For example, the (?im) settings indicate that the multiline case is not sensitive. You can also use it to cancel these settings, such as " (?im-sx) pcre_caseless", "Pcre_multiline", but also cancel "Pcre_dotall" and "pcre_extended". If a letter appears - before, it also appears - after this option is set to cancel.

The following is a quick example of a simple sample, you want to learn more clickable internal option settings and pattern modifiers

Example: /ab(?i)c/ match only "ABC" and "ABC"

9. Sub-Group (sub-mode)

Subgroups are delimited by parentheses, and they can be nested.

Example:

String: "The red King" Regular Expression: ((Red|white) (King|queen)) matches the result: Array ("Red King", "Red King", "Red", "King") Description: of which the No. 0 Elements are the result of an entire pattern match, followed by three elements, followed by three subgroups of matching results. Their subscripts are 1, 2, 3, respectively.

Often we have a need to group with subgroups, but not to capture them (individually). A string immediately following the left parenthesis defined by the subgroup causes ?: the subgroup to not be captured separately and does not affect the calculation of the subsequent subgroup ordinal. For example:

String: "The red King" Regular expression: ((?: Red|white) (king|queen)) matching result: Array ("Red King", "Red King", "King")

To facilitate shorthand, if you need to set options at the start of a non-capturing subgroup, the option letters can be located ? and : between, such as:

(? i:saturday|sunday) (?:(? i) saturday|sunday)

The above two formulations are actually the same pattern. Because the optional branch tries each branch from left to right, and the option is not reset before the end of the sub-mode, and because the options are set to penetrate through the other branches later, the above pattern will match "SUNDAY" and "Saturday".

And look at a regular that matches the IP address.((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4]\d|25[0-5]|[01]?\d\d?)
Related articles regular expressions for IP addresses

Conclusion

The syntax that is commonly used in regular expressions in PHP is described above, and some of the syntax is not in detail and related, such as: pattern modifiers, back references, assertions, recursive patterns, and so on. You can view the content in PHP manual.

tip: in general, for the same functionality, regular expression functions run less efficiently than string functions. If the application is simpler, use a string expression. However, for tasks that can be performed with a single regular expression, it is not right to use multiple string functions. ----from the book "PHP and MySQL Web open."

Resources

http://php.net/manual/zh/book.pcre.php
Https://msdn.microsoft.com/zh-cn/library/d9eze55x%28v=vs.80%29.aspx
Http://deerchao.net/tutorials/regex/regex.htm
http://tool.chinaz.com/regex/
Http://www.regexlab.com/zh/regref.htm

http://www.bkjia.com/PHPjc/1091264.html www.bkjia.com true http://www.bkjia.com/PHPjc/1091264.html techarticle PHP Regular expression notes, PHP regular expression notes what are regular expressions on a computer we often use (wildcards) to find the files we need, such as: *.doc, here ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.