Objective
PHPBy a large number ofApplicationInWebThe background CGIDevelopment, usually in a userDataData, but if the user entered the data is incorrect, there will be problems, such as someone's birthday is "February 30!" How should it be like to test the summer vacation is correct? In PHP, you have addedRegularThe support of the expression, so that we can be very convenient for data matching.
2 What isRegular Expressions:
In short, regular expressions are a powerful tool that can be used for pattern matching and substitution. In almost all based on Unix/linuxsystemOfSoftwareTool to find traces of regular expressions, such as Perl or the PHP scripting language. In addition, JavaScript, a client-side scripting language, also provides support for regular expressions, and now regular expressions have become a common concept and tool that has been used by variousTechnologyWidely used by people.
In a LinuxwebsiteIt says: "If you ask what a Linux enthusiast likes most, he may answer regular expressions; If you ask him what he is most afraid of, he will certainly say regular expressions in addition to the cumbersome installation configuration." "
As the above says, regular expressions look very complex and scary, and most PHP beginners will skip over here and continue with the followingLearning, but regular expressions in PHP have the right to use pattern matching to find qualifiedcharacterString, to determine whether a string is eligible or to substitute a specified string for a qualified string, etc. powerfulfunction, it's a shame not to learn ...
3 basic syntax for regular expressions:
A regular expression that is divided into three parts: delimiters, expressions, and modifiers.
The delimiter can be any character other than a special character (such as "/!"). And so on), the commonly used separator is "/". An expression consists of special characters (see below for special characters) and special strings, such as [a-z0-9_-]+@[a-z0-9_-.] + "can match a simple email string. Modifiers are used to turn a function/mode on or off. The following is an example of a complete regular expression:
/hello.+?hello/is
The above regular expression "/" is the delimiter, and the two "/" is the expression, and the string "is" after the second "/" is the modifier.
If you include a separator in an expression, you need to use the escape symbol "", such as "/hello.+?/hello/is". An escape symbol can perform special characters in addition to a delimiter, and all special characters made up of letters need to be escaped, such as "D" to represent the whole number.
4 special characters for regular expressions:
Special characters in regular expressions are divided into meta characters, positional characters, and so on.
A metacharacters is a special type of character in a regular expression that describes how its leading character (the character before the metacharacters) appears in the matched object. The meta character itself is a single character, but the combination of different or identical metacharacters can constitute a large meta character.
Metacharacters
Braces: Braces are used to precisely specify the number of occurrences of a matching meta character, such as "/pre{1,5}/" to indicate that the matched object can be "pre", "Pree", "preeeee" so that a string of 1 to 5 "E" appears after "PR". or "/pre{,5}/" on behalf of the pre appears between 0 this to 5 times.
Plus: the "+" character is used to match the characters before the character appears once or more times. For example, "/ac+/" means that objects that are matched can be "act", "account", "ACCCC", and so on, one or more "C" strings that appear after "a". "+" is equivalent to "{1,}".
Asterisk: the "*" character used to match the character before the metacharacters appears 0 or more times. For example, "/ac*/" means that the objects that are matched can be "app", "ACP", "ACCP", and so on, and then 0 or more "C" strings appear after "a". "*" is equivalent to "{0,}".
Question mark: "?" Characters used to match the metacharacters appear 0 or 1 times. For example, "/ac?/" means that the matching object can be "a", "ACP", "ACWP" so that a string of 0 or 1 "C" appears after "a". "?" In regular expressions There is also a very important role in the "greedy mode."
Two other very important special characters are "[]". They can match the characters that appear in "[]", such as "/[az]/" to match a single character "a" or "Z", and if you change the above expression to "/[a-z]/", you can match any single lowercase letter, such as "A", "B", and so on.
If "^" appears in "[]", this expression does not match the characters appearing in "[]", such as "/[^a-z]/" does not match any lowercase letters! And the regular expression gives several default values for []:
[: Alpha:]: Matches any letter
[: Alnum:]: Match any letter and number
[:d Igit:]: Match any number
[: Space:]: Matching spaces
[: Upper:]: Match any uppercase letter
[: Lower:]: Match any lowercase letter
[:p UNCT:]: Match any punctuation
[: Xdigit:]: Match any 16 binary digits
In addition, the following special characters, after escaping the escape symbol "", represent the following meanings:
S: Match a single spaces
S: Used to match all characters except a single spaces.
D: Used to match numbers from 0 to 9, equivalent to "/[0-9]/".
W: Used to match letters, numbers, or underscore characters, equivalent to "/[a-za-z0-9_]/".
W: used to match all characters that do not match W, equivalent to "/[^a-za-z0-9_]/".
D: Used to match any numeric character that is not in the 10 binary.
.: Used to match all characters except line breaks, if decorated with modifiers "s", "." can represent any character.
Using the above special characters can be very convenient to express some of the more cumbersome pattern matching. For example, "/d0000/" uses the above regular expression to match more than 100,001 integer strings.
Position character:
A positional character is another very important character in a regular expression, and its main function is to describe the position of a character in a matching object.
^: The pattern that represents the match appears at the beginning of the matching object (and differs in "[]")
$: Indicates a matching pattern appears at the end of the matched object
Spaces: One of the two boundaries that represent matching patterns that are now beginning and ending
"/^he/": You can match a string that starts with the "he" character, such as Hello, height, etc.
"/he$/": can match with "he" character the end of the string that she, etc.;
"/he/": The beginning of a space, and the function of ^, matches a string beginning with he;
"/he/": The space ends, and the function of $, matches the string ending with he;
"/^he$/": represents only matches the string "he".
Brackets:
In addition to being able to match a user, a regular expression can be used to record the required information in parentheses "()", to be stored and read to the subsequent expression. Like what:
/^ ([a-za-z0-9_-]+) @ ([a-za-z0-9_-]+) (. [ A-za-z0-9_-]) $/
is to record the email address of the username, and the email address of theServerAddress (in the formusername@server.comIn the following case, if you want to read the recorded string, you just need to read the "escape character + record order". For example, "1" is equivalent to the first "[a-za-z0-9_-]+", "2" equals the second ([a-za-z0-9_-]+), "3" is the third (. [ A-za-z0-9_-]). But in PHP, "" is a special character that needs to be escaped, so "it should be written" [Url=file://\1]\1[/url] in the PHP expression.
Other special symbols:
"|" : or symbol "|" and PHP inside or the same, but a "|", not php two "| |"! This means that it can be a character or another string, such as "/abcd|dcba/" that might match "ABCD" or "DCBA".
5 Greedy Mode:
Previously mentioned "?" in the meta character. There is also an important role in the "greedy model", what is the "greedy model"?
Like we're going to match a string that ends with the letter "a" at the beginning of the letter "B", but the string that needs to be matched has a lot of "B" after "a", such as "a bbbbbbbbbbbbbbbbb", and that regular expression matches the first "B" or the Last "B"? If you use greedy mode, it will match to the last "B", and vice versa only to the first "B".
Expressions that use greedy mode are as follows:
/a.+?b/
/a.+b/u
The following are not used in greedy mode:
/a.+b/
The above uses a modifier u, as described in the following section.
6 Modifiers:
Modifiers inside regular expressions can change many of the regular features, making regular expressions more appropriate for your needs (note: Modifiers are sensitive to capitalization, which means "E" is not equal to "E"). The modifiers in the regular expression are as follows:
I: If you add "I" to the modifier, it will remove the case sensitivity, that is, "a" and "a" are the same.
M: The default regular start "^" and end "$" just for regular strings if you add "M" to the modifier, the start and end will refer to each line of the string: "^" at the beginning of each line, and "$" at the end.
S: If you add "s" to the modifier, the default "." Any character that represents anything other than a line break will become any character, including a newline character!
X: If you add the modifier, the white space character in the expression will be ignored unless it has been escaped.
E: This modifier is only useful for replacement, which is represented as PHP in replacementCode。
A: If you use this modifier, the expression must be the beginning of the matching string. For example, "/a/a" matches "ABCD".
E: In contrast to "M", if this modifier is used, then "$" matches the end of the absolute string, rather than the newline character, which is turned on by default.
U: Similar to the question mark, used to set "greedy mode."
7 Pcre-related regular expressionsfunction:
PHP's Perl-compatible regular expressions provide multiple functions, including pattern matching, substitution and matching numbers, and so on:
1, Preg_match:
function format: int preg_match (string pattern, string subject, array [matches]);
This function matches the pattern expression in string, and if [regs] is given, the string is recorded in [Regs][0], and [Regs][1] represents the first string that is recorded with parentheses "()", [regs][2] Represents the second string that is recorded, and so on. Preg returns "true" if a matching pattern is found in string, otherwise returns "false".
2, Preg_replace:
function format: Mixed preg_replace (mixed pattern, mixed replacement, mixed subject);
This function replaces all strings that match the expression pattern in string with an expression replacement. If you need to include part of the pattern in the replacement, you can use "()" to record, in the replacement only need to use "1" to read.
3, Preg_split:
function format: Array preg_split (string pattern, string subject, int [limit]);
This function is the same as the function split, and the difference is that only with split you can use a simple regular expression to split a matching string, while Preg_split uses a full Perl-compatible regular expression. The third parameter, limit, represents how many eligible values are allowed to be returned.
4, Preg_grep:
function format: Array preg_grep (string patern, array input);
This function and the Preg_match function are basically, but Preg_grep can match all the elements in the given array input and return a new array.
Here's an example, for example, to check that the email address is in the correct format:
<?php
function Emailisright ($email) {
if (Preg_match (^[_.0-9a-z-]+@) ([0-9a-z][0-9a-z-]+.) +[a-z]{2,3}$ ", $email)) {
return 1;
}
return 0;
}
if (Emailisright ([Url=mailto:] ' y10k@963.net ' [/url])) echo ' correct <br/> ';
if (!emailisright ([Url=mailto:] ' y10k@fffff ' [/url])) echo ' Incorrect <br/> ';
?>
Above theprogram"Correct <br/> Incorrect" is exported.
The difference between Perl-compatible regular expressions and perl/ereg regular expressions in 8.PHP:
Although called "Perl-compatible regular expressions," PHP is different from Perl's regular expressions, such as the modifier "G", which represents all matches in Perl, but does not include support for this modifier in PHP.
There is the difference with the Ereg series functions, Ereg is also provided in PHP regular expression functions, but compared with preg, much weaker.
1, Ereg inside is does not need also cannot use the separator and the modifier, therefore the EREG function is weaker than the preg to be many.
2, about "." : The point in the regular inside is generally except the line break character all characters, but in Ereg inside "." Is any character, including line breaks! If you want "." In Preg. Can include a newline character, which adds "s" to the modifier.
3, Ereg By default use greedy mode, and can not modify, this gives a lot of substitution and match bring trouble.
4, Speed: This may be a lot of people concerned about the problem, will not preg powerful is to speed in exchange for? Do not worry, preg speed is much faster than Ereg, the author has done a program test:
Time Test:
PHP Code:
<?php
echo "Preg_replace used time:";
$start = time ();
For ($i =1 $i <=100000; $i + +) {
$str = "Ssssssssssssssssssssssssssss";
Preg_replace ("/s/", "", $str);
}
$ended = Time ()-$start;
Echo $ended;
echo "
Ereg_replace used time: ";
$start = time ();
For ($i =1 $i <=100000; $i + +) {
$str = "Ssssssssssssssssssssssssssss";
Ereg_replace ("s", "", $str);
}
$ended = Time ()-$start;
Echo $ended;
echo "
Str_replace used time: ";
$start = time ();
For ($i =1 $i <=100000; $i + +) {
$str = "Sssssssssssssssssssssssssssss";
Str_replace ("s", "", $str);
}
$ended = Time ()-$start;
Echo $ended;
?>
Results:
Preg_replace used Time:5
Ereg_replace used Time:15
Str_replace used Time:2
Str_replace because there is no need to match so the speed is very fast, and preg_replace speed is much faster than ereg_replace.
9. With regard to PHP3.0 support for Preg:
Preg support was added by default in PHP 4.0, but not in 3.0. If you want to use the Preg function in 3.0, you must load the Php3_pcre.dllfile, just add "extension = Php3_pcre.dll" to the extension section of php.ini and then start PHP again!
In fact, regular expressions are often used in Ubbcode implementations, many PHPForumAll use this method (such as Zforum zphp.com or VB vbullent.com), but the specific code is longer.