Use regular expressions compatible with Perl in PHP
Preface
PHP is widely used in Web background CGI development. it usually produces some results after user data. However, if the user input data is incorrect, a problem may occur, for example, a person's birthday is "August February 30 "! How can we check whether the summer vacation is correct? We have added support for regular expressions in PHP, so that we can easily perform data matching.
What is a regular expression?
In short, regular expressions are a powerful tool for pattern matching and replacement. Trace regular expressions in almost all UNIX/LINUX-based software tools, such as Perl or PHP scripting languages. In addition, the script language of the JavaScript client also provides support for regular expressions. now, regular expressions have become a common concept and tool and are widely used by various technical personnel.
On a Linux website, there is a saying: "If you ask Linux fans what they like most, he may answer regular expressions. if you ask him what he is most afraid, in addition to tedious installation configurations, he will definitely say regular expressions."
As mentioned above, regular expressions seem very complex and scary. most PHP beginners will skip this section and continue the following learning, however, regular expressions in PHP can use pattern matching to find matching strings, determine whether a string meets the conditions, or use a specified string to replace matching strings, it's a pity that you don't study ......
Basic regular expression syntax
A regular expression is divided into three parts: separator, expression, and modifier.
Separators can be any character except special characters (for example "/! ", Etc.), the commonly used separator is "/". The expression is composed of special characters (see the following for special characters) and non-special strings, such as "[a-z0-9 _-] + @ [a-z0-9 _-.] + "can match a simple email string. Modifier is used to enable or disable a function/mode. The following is an example of a complete regular expression:
/Hello. +? Hello/is
The above regular expression "/" is the separator, the expression is between two "/", and the string "is" after the second "/" is the modifier.
If the expression contains delimiters, you need to use the escape symbol "\", such as "/hello. +? \/Hello/is ". In addition to separators, escape characters can also be used to execute special characters. all special characters consisting of letters must be escaped by "\". for example, "\ d" indicates all numbers.
Special characters of a regular expression
Special characters in regular expressions include metacharacters and positioning characters.
Metacharacter is a special character in a regular expression. it is used to describe how a primary character (the character before the metacharacter) appears in a matched object. Metacharacters are single characters, but different or identical metacharacters can be combined to form large metacharacters.
Metacharacters:
Braces: braces are used to precisely specify the number of occurrences of matching metacharacters, for example, "/pre {}/" indicates that the matched objects can be "pre", "pree", and "preeeee". In this way, one to five "e" appear after "pr ".. Or "/pre {, 5}/" indicates that pre appears between 0 and 5 times.
Plus sign: the "+" character is used to match the character before the metacharacters once or multiple times. For example, "/ac +/" indicates that the matched object can be "act", "account", and "acccc". one or more "c" objects appear after "". string. "+" Is equivalent to "{1 ,}".
Asterisk: "*" is used to match zero or multiple times before the metacharacters. For example, "/ac */" indicates that the matched object may be "app", "acp", and "accp". there may be zero or multiple "c" after "". "*" Is equivalent to "{0 ,}".
Question mark :"? "Character is used to match the character before the metacharacter zero or one time. For example, "/ac? /"Indicates that the matched object can be" a "," acp ", and" acwp ". In this way, zero or one" c "string appears after". "? "There is also a very important role in regular expressions, that is," greedy mode ".
There are two important special characters: "[]". They can match any character in "[]", for example, "/[az]/" can match a single character "a" or "z "; if you change the expression above to "/[a-z]/", you can match any single lowercase letter, such as "a" and "B.
If "^" is displayed in "[]", this expression does not match the characters in, for example, "/[^ a-z]/" does not match any lower-case letters! In addition, the regular expression provides the default values:
[: Alpha:]: match any letter
[: Alnum:]: match any letter or number
[: Digit:]: match any number
[: Space:]: matches space characters.
[: Upper:]: match any uppercase letter
[: Lower:]: match any lowercase letter
[: Punct:]: match any punctuation marks
[: Xdigit:]: match any hexadecimal number
In addition, the following special characters indicate the following meanings after the escape symbol:
S: matches a single space character.
S: used to match all characters except a single space character.
D: used to match numbers from 0 to 9, which is equivalent to "/[0-9]/".
W: used to match letters, numbers or underscores, equivalent to "/[a-zA-Z0-9 _]/".
W: used to match all characters that do not match w, equivalent to "/[^ a-zA-Z0-9 _]/".
D: used to match any non-decimal numeric characters.
.: Used to match all characters except line breaks. if the modifier "s" is modified, "." can represent any character.
The special characters above can be used to easily express some complicated pattern matching. For example, "/\ d0000/" can use the above regular expression to match an integer string of more than 100,001 and.
Positioning character:
Positioning characters are another very important character in regular expressions. They are mainly used to describe the position of characters in matching objects.
^: Indicates that the matching mode appears at the beginning of the matching object (different from that in)
$: Indicates that the matching mode appears at the end of the matching object.
Space: indicates that the matching mode appears at one of the two boundaries at the beginning and end.
"/^ He/": it can match strings starting with "he", such as hello and height;
"/He $/": Can Match strings ending with "he", that is, "she;
"/He/": starts with a space. it matches a string starting with "he" as ^;
"/He/": the end of the space. it matches the string ending with "he" as $;
"/^ He $/": indicates that it only matches the string "he.
In addition to user matching, regular expressions can also use parentheses () to record the required information, store it, and read the subsequent expressions. For example:
/^ ([A-zA-Z0-9 _-] +) @ ([a-zA-Z0-9 _-] +) (. [a-zA-Z0-9 _-]) $/
Is to record the user name of the mail address, and the server address of the mail address (in the form of username@server.com and so on), in the end if you want to read the string recorded, you only need to use the "escape character + record order" to read. For example, "\ 1" is equivalent to the first "[a-zA-Z0-9 _-] +", "\ 2" is equivalent to the second ([a-zA-Z0-9 _-] + ), "\ 3" is the third (. [a-zA-Z0-9 _-]). However, in PHP, "\" is a special character and needs to be escaped. Therefore, "" should be written as "\ 1" in the PHP expression ".
Other special symbols:
"|": Or symbol "|" is the same as or in PHP, but it is a "|" instead of two "| "! It can be a character or another string, for example, "/abcd | dcba/" may match "abcd" or "dcba ".
Greedy mode
As mentioned in metacharacters "? "Another important role is" greedy mode ". what is" greedy mode?
For example, we want to match the string ending with the letter "a" and the letter "B", but the string to be matched contains many "B" after "", for example, "a bbbbbbbbbbbbbbbbb", will the regular expression match the first "B" or the last "B? If you use the greedy mode, it will match the last "B", and vice versa, it will only match the first "B ".
The expression for greedy mode is as follows:
/A. +? B/
/A. + B/U
The greedy mode is not used as follows:
/A. + B/
The above uses a modifier U. for details, see the following section.
Modifier
Modifiers in regular expressions can change many features of regular expressions, making them more suitable for your needs (note: modifiers are case sensitive, this means that "e" is not equal to "E "). The modifiers in the regular expression are as follows:
I: if "I" is added to the modifier, the regular expression will be case insensitive, that is, "a" and "A" are the same.
M: The default regular start "^" and end "$" only if "m" is added to the modifier of the regular string, the start and end will refer to each row of the string: each line starts with "^" and ends with "$ ".
S: if "s" is added to the modifier, the default "." indicates that any character except the line break will become any character, that is, include a line break!
X: if this modifier is added, spaces in the expression will be ignored unless it has been escaped.
E: This modifier is only useful for replacement, which indicates to use as PHP code in replacement.
A: If this modifier is used, the expression must be the start part of the matched string. For example, "/a/A" matches "abcd ".
E: opposite to "m", if this modifier is used, "$" matches the end of the absolute string instead of the line break. this mode is enabled by default.
U: Similar to question mark, used to set "greedy mode ".
PCRE-Related regular expression functions
PHP Perl is compatible with multiple functions provided by regular expressions, including pattern matching, replacement, and matching quantity:
1. preg_match:
Function format: int preg_match (string pattern, string subject, array [matches]);
This function uses the pattern expression in the string for matching. if [regs] is given, the string will be recorded in [regs] [0, [regs] [1] indicates the first string recorded using parentheses "()", [regs] [2] indicates the second string recorded, and so on. If a matched pattern is found in the string, "true" is returned; otherwise, "false" is returned ".
2. preg_replace:
Function format: mixed preg_replace (mixed pattern, mixed replacement, mixed subject );
This function replaces all strings matching the expression pattern with the expression replacement. If replacement needs to contain some characters of pattern, you can use "()" to record it. in replacement, you only need to use "\ 1" to read it.
3. preg_split:
Function format: array preg_split (string pattern, string subject, int [limit]);
This function is the same as the function split. The difference is that only the simple regular expression can be used to split the matched string with the split function, while the preg_split function uses a fully Perl-compatible regular expression. The third parameter "limit" indicates the number of values that meet the conditions allowed to be returned.
4. preg_grep:
Function format: array preg_grep (string patern, array input );
This function basically works with the preg_match function. However, preg_grep can match all elements in the input of the given array and return a new array. The following is an example. for example, we want to check whether the Email address format is correct:
Function emailIsRight ($ email ){
If (preg_match ("^ [_. 0-9a-z-] + @ ([0-9a-z] [0-9a-z-] + .) + [a-z] {2, 3} $ ", $ email )){
Return 1;
}
Return 0;
}
If (emailIsRight ('y10k @ 963.net') echo 'is correct
';
If (! EmailIsRight ('y10k @ fffff') echo 'incorrect
';
?>
The above program will output "correct"
Incorrect ".
Differences between Perl-compatible regular expressions and Perl/Ereg regular expressions in PHP
Although it is called "Perl Compatible Regular Expressions", PHP is different from Perl's regular expressions. for example, the modifier "G" indicates all matches in Perl, however, this modifier is not supported in PHP.
There is also the difference with the ereg series functions. ereg is also a regular expression function provided in PHP, but it is much weaker than preg.
1. ereg does not need or use delimiters and modifiers. Therefore, ereg is much weaker than preg.
2. about ".": in a regular expression, all characters except line breaks are generally entered, but "." in ereg is any character, that is, line breaks! If you want "." to include line breaks in the preg, you can add "s" to the modifier ".
3. ereg uses greedy mode by default and cannot be modified. This causes a lot of trouble for replacement and matching.
4. speed: This may be a concern of many people. Will the preg feature be powerful in exchange for speed? Don't worry, the preg speed is much faster than ereg. I did a program test:
Echo "Preg_replace used time :";
$ Start = time ();
For ($ I = 1; $ I <= 100000; $ I ++ ){
$ Str = "ssssssssssssssssssssssssssssss ";
Preg_replace ("/s/", "", $ str );
}
$ Ended = time ()-$ start;
Echo $ ended;
Echo "ereg_replace used time :";
$ Start = time ();
For ($ I = 1; $ I <= 100000; $ I ++ ){
$ Str = "ssssssssssssssssssssssssssssss ";
Ereg_replace ("s", "", $ str );
}
$ Ended = time ()-$ start;
Echo $ ended;
Echo "str_replace used time :";
$ Start = time ();
For ($ I = 1; $ I <= 100000; $ I ++ ){
$ Str = "sssssssssssssssssssssssssssssssss ";
Str_replace ("s", "", $ str );
}
$ Ended = time ()-$ start;
Echo $ ended;
?>
Result:
Preg_replace used time: 5
Ereg_replace used time: 15
Str_replace used time: 2
Str_replace is faster than ereg_replace because it does not need to be matched.
Support for preg in PHP3.0
Preg support is added by default in PHP 4.0, but it does not exist in 3.0. If you want to use the preg function in 3.0, you must load the php3_pcre.dll file. you only need to add "extension = php3_pcre.dll" in the extension section of php. ini and then restart PHP!
In fact, regular expressions are also often used in UbbCode implementation. many PHP forums use this method (such as zForum zphp.com or vB vbullent.com), but the specific code is relatively long.