Explanation of special characters in PHP regular expressions

Source: Internet
Author: User
Tags lowercase regular expression uppercase letter


Character/
Meaning: for characters, it usually indicates the literal meaning, indicating that the subsequent characters are special characters, not explained.
For example, if/B/matches the character 'B' and adds a backslash (/B/) before B, the character becomes a special character, the line that matches a word.
Or:
For a few characters, it is generally described as special. It is pointed out that the subsequent characters are not special, but should be interpreted literally.
For example, * is a special character that matches any character (including 0 characters). For example,/a */indicates that it matches 0 or multiple a characters. To match the literal *, add a backslash before a. For example,/a */matches 'A *'.

Character ^
Meaning: the matched characters must be at the frontend.
For example,/^ A/does not match 'A' in "an A,", but matches 'A' in the top of "An '.

Character $
Meaning: similar to ^, it matches the last character.
For example,/t $/does not match 't' in "eater", but matches 't' in "eat '.

Character *
Meaning: match the first character of * 0 or n times.
For example,/bo */matches 'boooo' in "A ghost booooed" or 'B' in "A bird warbled", but does not match "Agoat g
Any character in runted.

Character +
Meaning: match the character before the plus sign once or n times. It is equivalent to {1 ,}.
For example,/a +/matches all 'A' in "candy" and "caaaaaaandy '.

Character?
Meaning: match? The first character is 0 or 1 time.
Example:/e? Le? /Match 'El' in "angel" and 'le' in "angle '.

Character.
Meaning: (decimal point) match all single characters except line breaks.
For example,/. n/matches 'any' and 'on' in "nay, an apple is on the tree", but does not match 'nay '.

Character (x)
Meaning: Match 'X' and record the matched value.
For example,/(foo)/matches and records 'foo' in "foo bar '. Matching substrings can be returned by the element [1],..., [n] in the result array.
Return, or be returned by RegExp object attributes.

Character x │ y
Meaning: Match 'X' or 'y '.
For example,/green │ red/matches 'green' in "green apple" and 'red' in "red apple '.

Character {n}
Meaning: Here n is a positive integer. Match the previous n characters.
For example:/a {2}/does not match 'A' in "candy,", but matches all 'A' and "caaandy" in "caandy. "The first two 'A '.

Character {n ,}
Meaning: Here n is a positive integer. Match at least n first characters.
For example,/a {2,} does not match 'A' in "candy", but matches all 'A' in "caandy" and "caaaaaaandy'

Character {n, m}
Meaning: both n and m are positive integers. Match at least n characters at most before m.
For example,/a {}/does not match any character in "cndy", but matches the first two characters in "candy," 'A', "caandy,"
'A' and "caaaaaaandy" are the first three 'A'. Note: even if "caaaaaaandy" has many 'A ', but only match the first three 'A', that is, "aaa ".

Character [xyz]
Meaning: a one-character list that matches any character in the list. You can use a hyphen to indicate a character range.
For example, [abcd] is the same as [a-c. They match 'B' in "brisket" and 'C' in "ache '.

Character [^ xyz]
Meaning: A character complement, that is, it matches everything except the listed characters. You can use a hyphen to indicate the one-character range.
For example, [^ abc] is equivalent to [^ a-c]. They first match 'R' in "brisket" and 'H' in "chop '.

Character
Meaning: match a space (do not confuse with B)

Character B
Meaning: match the boundary of a word, such as a space (not to be confused)
For example,/bnw/matches 'no' in "noonday",/wyb/matches 'ly 'in "possibly yesterday '.

Character B
Meaning: match the non-dividing line of a word
For example,/wBn/matches 'on' in "noonday",/yBw/matches 'Ye 'in "possibly yesterday '.

Character cX
Meaning: X is a control character. Matches the control character of a string.
For example,/cM/matches control-M in a string.

Character d
Meaning: matching a number is equivalent to [0-9].
For example,/d/or/[0-9]/matches '2' in "B2 is the suite number '.

Character D
Meaning: match any non-number, which is equivalent to [^ 0-9].
For example,/D/or/[^ 0-9]/matches 'B' in "B2 is the suite number '.

Character f
Meaning: match a form character

Character n
Meaning: match a linefeed.

Anyone who has used regular expressions in PERL knows that regular expressions are very powerful, but they are not so easy to learn. For example:

 

^. + @. +... + $

 

This effective but incomprehensible code is enough to make some programmers have a headache (I am) or let them give up using regular expressions. I believe that after reading this tutorial, you can understand the meaning of this code.

Basic mode matching

Everything starts from the most basic. Pattern is the most basic element of a regular expression. They are a set of characters that describe character strings. The mode can be very simple. It is composed of common strings and can be very complex. Special characters are often used to indicate characters in a range, repeated occurrences, or context. For example:

 

^ Once

 

This mode contains a special character ^, indicating that this mode only matches strings starting with once. For example, this pattern matches the string "once upon a time" and does not match "There once was a man from NewYork. Like a ^ symbol, $ is used to match strings ending in a given pattern.

 

Bucket $

 

This mode matches "Who kept all of this cash in a bucket" and does not match "buckets. When both the character ^ and $ are used, it indicates exact match (the string is the same as the pattern ). For example:

 

^ Bucket $

 

Only matches the string "bucket ". If a mode does not include ^ and $, it matches any string containing this mode. Example: mode

 

Once

 

And string

There once was a man from NewYork
Who kept all of his cash in a bucket.

Is matched.

In this mode, letters (o-n-c-e) are literal characters, that is, they indicate the letter itself, and numbers are the same. Escape sequences are used for other slightly complex characters, such as punctuation marks and white characters (spaces and tabs. All escape sequences start with a backslash. The escape sequence of the tab is t. So if we want to check whether a string starts with a tab character, we can use this mode:

 

^ T

 

Similarly, n is used to represent a new line, and r is used to represent a carriage return. Other special symbols can be used in front with a backslash. For example, the backslash itself is represented by periods, and so on.

Character cluster

In INTERNET programs, regular expressions are usually used to verify user input. After a user submits a FORM, it is not enough to determine whether the entered phone number, address, EMAIL address, and credit card number are valid.

Therefore, we need to use a more free way to describe the mode we want. It is a character cluster. To create a character cluster that represents all vowel characters, put all the vowel characters in a square bracket:

 

[AaEeIiOoUu]

 

This mode matches any vowel character, but can only represent one character. The font size can be used to indicate the range of a character, for example:

 

[A-z] // Match all lowercase letters
A-Z // Match all uppercase letters
[A-zA-Z] // Match all letters
[0-9] // Match all numbers
[0-9.-] // Match all numbers, periods, and minus signs
[Frtn] // Match all white characters

 

Similarly, these are only one character, which is very important. If you want to match a string consisting of a lowercase letter and a digit, such as "z2", "t6", or "g7 ", if it is not "ab2", "r2d3", or "b52", use this mode:

 

^ [A-z] [0-9] $

 

Although [a-z] represents the range of 26 letters, it can only match strings with lowercase letters with the first character.

^ Indicates the start of a string, but it has another meaning. When ^ is used in square brackets, it indicates "not" or "excluded", which is often used to remove a character. In the preceding example, the first character must not be a number:

 

^ [^ 0-9] [0-9] $

 

This pattern matches "& 5", "g7", and "-2", but does not match "12", "66. The following are examples of how to exclude specific characters:

 

[^ A-z] // All characters except lowercase letters
[^/^] // All characters except "/" and "^"
[^ "'] // All characters except double quotation marks (") and single quotation marks (')

 


The special character "." (Point, period) is used to represent all characters except the "new line" in a regular expression. Therefore, the pattern "^. 5 $" matches any two-character string that ends with a number 5 and starts with another non-New Line character. Mode "." can match any string, except empty strings and strings containing only one "new line.

PHP regular expressions have some built-in general character clusters. The list is as follows:

 

Character cluster Description
[[: Alpha:] Any letter
[[: Digit:] Any number
[[: Alnum:] Any letter or number
[[: Space:] Any white characters
[[: Upper:] Any uppercase letter
[[: Lower:] Any lowercase letter
[[: Punct:] Any punctuation
[[: Xdigit:] Any hexadecimal number, equivalent to [0-9a-fA-F]

 

 

Confirm repeated occurrence

Until now, you know how to match a letter or number, but in more cases, you may need to match a word or a group of numbers. A word may consist of several letters, and a group of numbers may consist of several singular numbers. Braces ({}) following the character or character cluster are used to determine the number of occurrences of the preceding content.

 

Character cluster Description
^ [A-zA-Z _] $ All letters and underscores
^ [[: Alpha:] {3} $ All 3-letter words
^ A $ Letter
^ A {4} $ Aaaa
^ A {2, 4} $ Aa, aaa, or aaaa
^ A {1, 3} $ A, aa or aaa
^ A {2,} $ String containing more than two a Strings
^ A {2 ,} For example, aardvark and aaab, but not apple
A {2 ,} For example, baad and aaa, but not Nantucket
T {2} Two tabs
. {2} All two characters

 

These examples describe three different usages of curly brackets. A number, {x} indicates "the character or character cluster appears only x times"; a number is added with a comma, {x ,} "The preceding content appears x or more times"; two numbers separated by commas (,). {x, y} indicates that "the preceding content appears at least x times, but not more than y times ". We can extend the pattern to more words or numbers:

 

^ [A-zA-Z0-9 _] {1,} $ // All strings containing more than one letter, number, or underline
^ [0-9] {1,} $ // All positive numbers
^-{0, 1} [0-9] {1,} $ // All integers
^-{0, 1} [0-9] {0,}. {0, 1} [0-9] {0,} $ // All decimals

 

The last example is hard to understand, right? Take a look: start with an optional negative number (-{0, 1}) (^), followed by 0 or more numbers ([0-9] {0 ,}) and an optional decimal point (. {0, 1}) followed by 0 or multiple numbers ([0-9] {0,}), and nothing else ($ ). Next you will know the simpler method that can be used.

Special character "? "It is equal to {0, 1}, and both represent" 0 or 1 previous content "or" previous content is optional ". So the example just now can be simplified:

 

^ -? [0-9] {0 ,}.? [0-9] {0,} $

 

The special characters "*" and {0,} are equal. They all represent "0 or multiple front content ". Finally, the character "+" is equal to {1,}, indicating "one or more previous content". Therefore, the preceding four examples can be written as follows:

 

^ [A-zA-Z0-9 _] + $ // All strings containing more than one letter, number, or underline
^ [0-9] + $ // All positive numbers
^ -? [0-9] + $ // All integers
^ -? [0-9] *.? [0-9] * $ // All decimals

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.