PHP regular expression: php regular expression syntax

Source: Internet
Author: User
Tags character classes email string ereg php regular expression preg
PHP regular expression syntax reprinted from: http://blog.csdn.net/kkobebryant/article/details/267527

Basic regular expression syntax

First, let's take a look at two special characters: '^' and '$'. they are used to match the start and end of the string, respectively.


"^ The": matches strings starting with ";
"Of despair $": match the string ending with "of despair;
"^ Abc $": matches strings starting with abc and ending with abc. In fact, only abc matches.
"Notice": match a string containing notice


You can see that if you didn't use the two characters we mentioned (the last example), that is, the pattern (regular expression) can appear anywhere in the string to be tested, and you didn't lock it to either side.
How many characters are there: '*', '+', and '? ', Which indicates the number or sequence of occurrences of a character. they indicate: "zero or more", "one or more", and "zero or one. "Here are some examples:


"AB *": matches strings a and 0 or more B ("a", "AB", "abbb", etc .);
"AB +": same as above, but at least one B ("AB", "abbb", etc .);
"AB? ": Matches 0 or 1 B;
"? B + $ ": match the string ending with one or zero a plus more than one B.


You can also limit the number of characters in braces, such


"AB {2}": Match a and a with two B (one cannot be less) ("abb ");
"AB {2,}": at least two B ("abb", "abbbb", etc .);
"AB {3, 5}": 2-5 B ("abbb", "abbbb", or "abbbbb ").

You must also note that you must always specify (I. e, "{0, 2}", not "{, 2 }"). similarly, you must note that '*', '+', and '? 'Are the same as the following three range annotations: "{0,}", "{1,}", and "{0, 1 }".

Put a certain number of characters in parentheses, for example:


"A (bc) *": Match a with 0 or a "bc ";
"A (bc) {}": one to five "bc ."


There is also a character '│', which is equivalent to the OR operation:


"Hi │ hello": match string containing "hi" or "hello;
"(B │ cd) ef": matches strings containing "bef" or "cdef;
"(A │ B) * c": the matching contains multiple (including 0) a or B, followed by a c
String;


A point ('.') can represent all single characters:


"A. [0-9]": a string with a character and a number (a string containing such a string will be matched, and this bracket will be omitted later)
"^. {3} $": ends with three characters.


The content enclosed in brackets only matches a single character.


"[AB]": Match a or B (same as "a │ B );
"[A-d]": match a single character from 'A' to 'D' (same effect as "a │ B │ c │ d" and "[abcd );
"^ [A-zA-Z]": matches a string starting with a letter.
"[0-9] %": match a string containing x %
", [A-zA-Z0-9] $": match a string ending with a comma plus a number or letter


You can also column the characters you don't want in brackets. you just need to use '^' in the brackets to start with (I. e ., "% [^ a-zA-Z] %" matches a non-letter string with two percentage signs ).

To be able to explain, but "^. [$ () │ * +? {/"As a special character, you must add'' in front of these characters, and avoid using/at the beginning of the pattern in php3, for example, regular Expression "(/$ │? [0-9] + "ereg (" (// $ │? [0-9] + ", $ str) (I don't know if php4 is the same)

Do not forget that the characters in brackets are the exception of this rule? In brackets, all special characters, including (''), will lose their special properties (I. e.," [*/+? {}.] "Match strings containing these characters ). also, as the regx manual tells us: "If the list contains ']', it is best to use it as the first character in the list (possibly following '^ ). if it contains '-', it is best to put it at the beginning or the end, or the second end point of a range (I. e. the '-' in the [a-d-0-9] will be valid.

For completeness, I should involve collating sequences, character classes, and equivalence classes. however, I do not want to elaborate on these aspects, and these articles do not need to be involved. you can get more messages in regex man pages.

How to build a pattern to match the number of currency input

Now let's use what we have learned to do something useful: Build a matching pattern to check whether the input information is a number that represents money. We think there are four ways to indicate the number of money: "10000.00" and "10,000.00", or no decimal part, "10000" and "10,000 ". now let's start building this matching mode:

^ [1-9] [0-9] * $

This variable must start with a number other than 0, but it also means that a single "0" cannot pass the test. The following is a solution:

^ (0 │ [1-9] [0-9] *) $

"Only numbers starting with 0 and not 0 match", we can also allow a negative number before the number:

^ (0 │ -? [1-9] [0-9] *) $

This is: "0 or a digit starting with 0 may have a negative number in front. "Well, now let's not be so rigorous. we can start with 0. now let's give up the negative number, because we don't need to use it to represent coins. we now specify a pattern to match the fractional part:

^ [0-9] + (/. [0-9] + )? $

This implies that the matched string must start with at least one Arabic number. but note that in the above mode, "10. "It does not match. only" 10 "and" 10.2 "are allowed. (Do you know why)

^ [0-9] + (/. [0-9] {2 })? $

We have specified two decimal places. if you think this is too harsh, you can change it:

^ [0-9] + (/. [0-9] {1, 2 })? $

This will allow one or two decimal places. Now we add a comma (every three digits) to increase readability, which can be expressed as follows:

^ [0-9] {1, 3} (, [0-9] {3}) * (/. [0-9] {1, 2 })? $

Do not forget the plus sign '+' to be multiplied by '*'. if you want to allow blank strings to be input (why ?). Also, do not forget the backslice bar '/'. errors may occur in php strings (common errors ). now we can confirm the string. now we can remove all the commas (,) from str_replace (",", "", $ money) then we can regard the type as double and then use it for mathematical computation.
Construct a regular expression for checking email

Okay. let's continue to discuss how to verify an email address. there are three parts in a complete email address: POP3 user name (everything on the left of '@'), '@', and server name (that is, the remaining part ). the user name can contain uppercase and lowercase letters, Arabic numerals, periods ('. '), minus sign ('-'), and underline ('_'). the server name also complies with this rule, except for the underlines.

The start and end of the user name cannot be a period. the same is true for servers. you cannot have at least one character between two consecutive periods. now let's take a look at how to write a matching pattern for the user name:

^ [_ A-zA-Z0-9-] + $

The end cannot exist yet. we can add the following:

^ [_ A-zA-Z0-9-] + (/. [_ a-zA-Z0-9-] +) * $

The above means: "There is at least one canonicalized character (except. unexpected), followed by 0 or more strings starting with a point ."

To simplify it, we can replace eregi () with eregi (). eregi () is case insensitive and we don't need to specify two ranges "a-z" and "A-Z "? You only need to specify one:

^ [_ A-z0-9-] + (/. [_ a-z0-9-] +) * $

The server name is the same, but the underline should be removed:

^ [A-z0-9-] + (/. [a-z0-9-] +) * $

Done. now you only need to use @ to connect the two parts:

^ [_ A-z0-9-] + (/. [_ a-z0-9-] +) * @ [a-z0-9-] + (/. [a-z0-9-] +) * $


This is the complete email authentication matching mode. you only need to call

Eregi ('^ [_ a-z0-9-] + (/. [_ a-z0-9-] +) * @ [a-z0-9-] + (/. [a-z0-9-] +) * $ ', $ eamil)

Then you can check whether the email is used.
Other regular expressions

Extract string

Ereg () and eregi () has a feature that allows users to extract a part of a string through regular expressions (you can read the manual for specific usage ). for example, we want to extract the file name from path/URL? The following code is required:

Ereg ("([^ //] *) $", $ pathOrUrl, $ regs );
Echo $ regs [1];

Advanced replacement

Ereg_replace () and eregi_replace () are also very useful: if we want to replace all the negative signs at intervals with commas:


Ereg_replace ("[/n/r/t] +", trim ($ str ));


PHP is widely used in Web background CGI development. it usually produces some results after user data. However, if the user input data is incorrect, a problem may occur, for example, a person's birthday is "August February 30 "! How can we check whether the summer vacation is correct? We have added support for regular expressions in PHP, so that we can easily perform data matching.

2. what is a regular expression:
In short, regular expressions are a powerful tool for pattern matching and replacement. Trace regular expressions in almost all UNIX/LINUX-based software tools, such as Perl or PHP scripting languages. In addition, the script language of the JavaScript client also provides support for regular expressions. now, regular expressions have become a common concept and tool and are widely used by various technical personnel.
On a Linux website, there is a saying like this: "If you ask Linux fans what they like most, he may answer regular expressions. if you ask him what he is most afraid, in addition to tedious installation configurations, he will definitely say regular expressions. "
As mentioned above, regular expressions seem very complex and scary. most PHP beginners will skip this section and continue the following learning, however, regular expressions in PHP can use pattern matching to find matching strings, determine whether a string meets the conditions, or use a specified string to replace matching strings, it's a pity that you don't study ......


3. basic syntax of a regular expression:
A regular expression is divided into three parts: separator, expression, and modifier.
Separators can be any character except special characters (for example "/! ", Etc.), the commonly used separator is "/". The expression is composed of special characters (see the following for special characters) and non-special strings, such as "[a-z0-9 _-] + @ [a-z0-9 _-.] + "can match a simple email string. Modifier is used to enable or disable a function/mode. The following is an example of a complete regular expression:
/Hello. +? Hello/is
The above regular expression "/" is the separator, the expression is between two "/", and the string "is" after the second "/" is the modifier.
If the expression contains delimiters, you need to use the escape symbol "/", such as "/hello. +? // Hello/is ". In addition to separators, escape characters can also be used to execute special characters. all special characters consisting of letters must be escaped by "/". for example, "/d" indicates all numbers.


4 Special characters of the regular expression:
Special characters in regular expressions include metacharacters and positioning characters.
Metacharacter is a special character in a regular expression. it is used to describe how a primary character (the character before the metacharacter) appears in a matched object. Metacharacters are single characters, but different or identical metacharacters can be combined to form large metacharacters.
Metacharacters:
Braces: braces are used to precisely specify the number of occurrences of matching metacharacters, for example, "/pre {}/" indicates that the matched objects can be "pre", "pree", and "preeeee". In this way, one to five "e" appear after "pr ".. Or "/pre {, 5}/" indicates that pre appears between 0 and 5 times.
Plus sign: the "+" character is used to match the character before the metacharacters once or multiple times. For example, "/ac +/" indicates that the matched object can be "act", "account", and "acccc". one or more "c" objects appear after "". string. "+" Is equivalent to "{1 ,}".
Asterisk: "*" is used to match zero or multiple times before the metacharacters. For example, "/ac */" indicates that the matched object may be "app", "acp", and "accp". there may be zero or multiple "c" after "". "*" Is equivalent to "{0 ,}".
Question mark :"? "Character is used to match the character before the metacharacter zero or one time. For example, "/ac? /"Indicates that the matched object can be" a "," acp ", and" acwp ". In this way, zero or one" c "string appears after". "? "There is also a very important role in regular expressions, that is," greedy mode ".

There are two important special characters: "[]". They can match any character in "[]", for example, "/[az]/" can match a single character "a" or "z "; if you change the expression above to "/[a-z]/", you can match any single lowercase letter, such as "a" and "B.
If "^" is displayed in "[]", this expression does not match the characters in, for example, "/[^ a-z]/" does not match any lower-case letters! In addition, the regular expression provides the default values:
[: Alpha:]: match any letter
[: Alnum:]: match any letter or number
[: Digit:]: match any number
[: Space:]: matches space characters.
[: Upper:]: match any uppercase letter
[: Lower:]: match any lowercase letter
[: Punct:]: match any punctuation marks
[: Xdigit:]: match any hexadecimal number

In addition, the following special characters indicate the following meanings after the escape symbol "/" Is escaped:
S: matches a single space character.
S: used to match all characters except a single space character.
D: used to match numbers from 0 to 9, which is equivalent to "/[0-9]/".
W: used to match letters, numbers or underscores, equivalent to "/[a-zA-Z0-9 _]/".
W: used to match all characters that do not match w, equivalent to "/[^ a-zA-Z0-9 _]/".
D: used to match any non-decimal numeric characters.
.: Used to match all characters except line breaks. if the modifier "s" is modified, "." can represent any character.

The special characters above can be used to easily express some complicated pattern matching. For example, "// d0000/" can use the above regular expression to match an integer string of more than 100,001 and.

Positioning character:
Positioning characters are another very important character in regular expressions. They are mainly used to describe the position of characters in matching objects.
^: Indicates that the matching mode appears at the beginning of the matching object (different from that in)
$: Indicates that the matching mode appears at the end of the matching object.
Space: indicates that the matching mode appears at one of the two boundaries at the beginning and end.
"/^ He/": it can match strings starting with "he", such as hello and height;
"/He $/": Can Match strings ending with "he", that is, "she;
"/He/": starts with a space. it matches a string starting with "he" as ^;
"/He/": the end of the space. it matches the string ending with "he" as $;
"/^ He $/": indicates that it only matches the string "he.

Brackets:
In addition to user matching, regular expressions can also use parentheses () to record the required information, store it, and read the subsequent expressions. For example:
/^ ([A-zA-Z0-9 _-] +) @ ([a-zA-Z0-9 _-] +) (. [a-zA-Z0-9 _-]) $/
Is to record the user name of the mail address, and the server address of the mail address (in the form of username@server.com and so on), in the end if you want to read the string recorded, you only need to use the "escape character + record order" to read. For example, "/1" is equivalent to the first "[a-zA-Z0-9 _-] +", "/2" is equivalent to the second ([a-zA-Z0-9 _-] + ), "/3" is the third (. [a-zA-Z0-9 _-]). However, in PHP, "/" is a special character and needs to be escaped. Therefore, "" when it comes to PHP expressions, it should be written as "// 1 ".
Other special symbols:
"|": Or symbol "|" is the same as or in PHP, but it is a "|" instead of two "| "! It can be a character or another string, for example, "/abcd | dcba/" may match "abcd" or "dcba ".


5 greedy mode:
As mentioned in metacharacters "? "Another important role is" greedy mode ". what is" greedy mode?
For example, we want to match the string ending with the letter "a" and the letter "B", but the string to be matched contains many "B" after "", for example, "a bbbbbbbbbbbbbbbbb", will the regular expression match the first "B" or the last "B? If you use the greedy mode, it will match the last "B", and vice versa, it will only match the first "B ".
The expression for greedy mode is as follows:
/A. +? B/
/A. + B/U
The greedy mode is not used as follows:
/A. + B/
The above uses a modifier U. for details, see the following section.


6 modifier:
Modifiers in regular expressions can change many features of regular expressions, making them more suitable for your needs (note: modifiers are case sensitive, this means that "e" is not equal to "E "). The modifiers in the regular expression are as follows:
I: if "I" is added to the modifier, the regular expression will be case insensitive, that is, "a" and "A" are the same.
M: The default regular start "^" and end "$" only if "m" is added to the modifier of the regular string, the start and end will refer to each row of the string: each line starts with "^" and ends with "$ ".
S: if "s" is added to the modifier, the default "." indicates that any character except the line break will become any character, that is, include a line break!
X: if this modifier is added, spaces in the expression will be ignored unless it has been escaped.
E: This modifier is only useful for replacement, which indicates to use as PHP code in replacement.
A: If this modifier is used, the expression must be the start part of the matched string. For example, "/a/A" matches "abcd ".
E: opposite to "m", if this modifier is used, "$" matches the end of the absolute string instead of the line break. this mode is enabled by default.
U: Similar to question mark, used to set "greedy mode ".


7. PCRE-Related regular expression functions:
PHP Perl is compatible with multiple functions provided by regular expressions, including pattern matching, replacement, and matching quantity:
1. preg_match:
Function format: int preg_match (string pattern, string subject, array [matches]);
This function uses the pattern expression in the string for matching. if [regs] is given, the string will be recorded in [regs] [0, [regs] [1] indicates the first string recorded using parentheses "()", [regs] [2] indicates the second string recorded, and so on. If a matched pattern is found in the string, "true" is returned; otherwise, "false" is returned ".

2. preg_replace:
Function format: mixed preg_replace (mixed pattern, mixed replacement, mixed subject );
This function replaces all strings matching the expression pattern with the expression replacement. If replacement needs to contain some characters of pattern, you can use "()" to record it. in replacement, you only need to use "/1" to read it.

3. preg_split:
Function format: array preg_split (string pattern, string subject, int [limit]);
This function is the same as the function split. The difference is that only the simple regular expression can be used to split the matched string with the split function, while the preg_split function uses a fully Perl-compatible regular expression. The third parameter "limit" indicates the number of values that meet the conditions allowed to be returned.

4. preg_grep:
Function format: array preg_grep (string patern, array input );
This function basically works with the preg_match function. However, preg_grep can match all elements in the input of the given array and return a new array.

The following is an example. for example, we want to check whether the Email address format is correct:


Function emailIsRight ($ email ){
If (preg_match ("^ [_/. 0-9a-z-] + @ ([0-9a-z] [0-9a-z-] + /.) + [a-z] {2, 3} $ ", $ email )){
Return 1;
}
Return 0;
}
If (emailIsRight ('y10k @ 963.net') echo 'is correct
';
If (! EmailIsRight ('y10k @ fffff') echo 'incorrect
';
?>

The above program will output "correct"
Incorrect ".

8. the differences between Perl-compatible regular expressions and Perl/Ereg regular expressions in PHP:
Although it is called "Perl Compatible Regular Expressions", PHP is different from Perl's regular expressions. for example, the modifier "G" indicates all matches in Perl, however, this modifier is not supported in PHP.
There is also the difference with the ereg series functions. ereg is also a regular expression function provided in PHP, but it is much weaker than preg.

1. ereg does not need or use delimiters and modifiers. Therefore, ereg is much weaker than preg.
2. about ".": in a regular expression, all characters except line breaks are generally entered, but "." in ereg is any character, that is, line breaks! If you want "." to include line breaks in the preg, you can add "s" to the modifier ".
3. ereg uses greedy mode by default and cannot be modified. This causes a lot of trouble for replacement and matching.
4. speed: This may be a concern of many people. Will the preg feature be powerful in exchange for speed? Don't worry, the preg speed is much faster than ereg. I did a program test:

Time test:

PHP code:

Echo "Preg_replace used time :";
$ Start = time ();
For ($ I = 1; $ I <= 100000; $ I ++ ){
$ Str = "ssssssssssssssssssssssssssssss ";
Preg_replace ("/s/", "", $ str );
}
$ Ended = time ()-$ start;
Echo $ ended;
Echo"
Ereg_replace used time :";
$ Start = time ();
For ($ I = 1; $ I <= 100000; $ I ++ ){
$ Str = "ssssssssssssssssssssssssssssss ";
Ereg_replace ("s", "", $ str );
}
$ Ended = time ()-$ start;
Echo $ ended;
Echo"
Str_replace used time :";
$ Start = time ();
For ($ I = 1; $ I <= 100000; $ I ++ ){
$ Str = "sssssssssssssssssssssssssssssssss ";
Str_replace ("s", "", $ str );
}
$ Ended = time ()-$ start;
Echo $ ended;
?>
Result:
Preg_replace used time: 5
Ereg_replace used time: 15
Str_replace used time: 2


Str_replace is faster than ereg_replace because it does not need to be matched.


9. PHP3.0 support for preg:
Preg support is added by default in PHP 4.0, but it does not exist in 3.0. If you want to use the preg function in 3.0, you must load the php3_pcre.dll file. you only need to add "extension = php3_pcre.dll" in the extension section of php. ini and then restart PHP!
In fact, regular expressions are also often used in UbbCode implementation. many PHP forums use this method (such as zForum zphp.com or vB vbullent.com), but the specific code is relatively long.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.