Objective
PHP is a large number of web-used background CGI development, usually after the user data data to produce some results, but if the user input data is incorrect, there will be problems, such as someone's birthday is "February 30!" How should it be like to test the summer vacation is correct? The support of regular expressions is added to PHP so that we can easily match data.
What is a regular expression
In short, regular expressions are a powerful tool that can be used for pattern matching and substitution. Find traces of regular expressions in almost all unix/linux-based software tools, such as Perl or PHP scripting languages. In addition, JavaScript, a client-side scripting language, also provides support for regular expressions, and now regular expressions have become a common concept and tool that is widely used by all kinds of technicians.
On a Linux website there is a saying: "If you ask what a Linux enthusiast likes most, he may answer regular expressions; If you ask him what he is most afraid of, he will certainly say regular expressions in addition to the cumbersome installation configuration." ”
As the above says, regular expressions look very complex and scary, and most PHP beginners skip here and continue with the following learning, but regular expressions in PHP have strings that can be matched by pattern matching, It's a shame to judge whether a string is qualified or to substitute a specified string for a qualified string.
Basic syntax for regular expressions
A regular expression that is divided into three parts: delimiters, expressions, and modifiers.
The delimiter can be any character other than a special character (such as "/!"). And so on), the commonly used separator is "/". An expression consists of special characters (see below for special characters) and special strings, such as [a-z0-9_-]+@[a-z0-9_-.] + "can match a simple email string. Modifiers are used to turn a function/mode on or off. The following is an example of a complete regular expression:
/hello.+?hello/is
The above regular expression "/" is the delimiter, and the two "/" is the expression, and the string "is" after the second "/" is the modifier.
If you include a separator in an expression, you need to use the escape symbol "\", such as "/hello.+?\/hello/is". Escape symbols can also perform special characters in addition to delimiters, all special characters made up of letters require "\" to escape, such as "\d" represents the whole number.
Special characters for regular expressions
Special characters in regular expressions are divided into meta characters, positional characters, and so on.
A metacharacters is a special type of character in a regular expression that describes how its leading character (the character before the metacharacters) appears in the matched object. The meta character itself is a single character, but the combination of different or identical metacharacters can constitute a large meta character.
Metacharacters
Braces: Braces are used to precisely specify the number of occurrences of a matching meta character, such as "/pre{1,5}/" to indicate that the matched object can be "pre", "Pree", "preeeee" so that a string of 1 to 5 "E" appears after "PR". or "/pre{,5}/" on behalf of the pre appears between 0 this to 5 times.
Plus: the "+" character is used to match the characters before the character appears once or more times. For example, "/ac+/" means that objects that are matched can be "act", "account", "ACCCC", and so on, one or more "C" strings that appear after "a". "+" is equivalent to "{1,}".
Asterisk: the "*" character used to match the character before the metacharacters appears 0 or more times. For example, "/ac*/" means that the objects that are matched can be "app", "ACP", "ACCP", and so on, and then 0 or more "C" strings appear after "a". "*" is equivalent to "{0,}".
Question mark: "?" Characters used to match the metacharacters appear 0 or 1 times. For example, "/ac?/" means that the matching object can be "a", "ACP", "ACWP" so that a string of 0 or 1 "C" appears after "a". "?" In regular expressions There is also a very important role in the "greedy mode."
Two other very important special characters are "[]". They can match the characters that appear in "[]", such as "/[az]/" to match a single character "a" or "Z", and if you change the above expression to "/[a-z]/", you can match any single lowercase letter, such as "A", "B", and so on.
If "^" appears in "[]", this expression does not match the characters appearing in "[]", such as "/[^a-z]/" does not match any lowercase letters! And the regular expression gives several default values for []:
[: Alpha:]: Matches any letter
[: Alnum:]: Match any letter and number
[:d Igit:]: Match any number
[: Space:]: Matching spaces
[: Upper:]: Match any uppercase letter
[: Lower:]: Match any lowercase letter
[:p UNCT:]: Match any punctuation
[: Xdigit:]: Match any 16 binary digits
In addition, the following special characters, after escaping the escape symbol "\", represent the following meanings:
S: Match a single spaces
S: Used to match all characters except a single spaces.
D: Used to match numbers from 0 to 9, equivalent to "/[0-9]/".
W: Used to match letters, numbers, or underscore characters, equivalent to "/[a-za-z0-9_]/".
W: used to match all characters that do not match W, equivalent to "/[^a-za-z0-9_]/".
D: Used to match any numeric character that is not in the 10 binary.
.: Used to match all characters except line breaks, if decorated with modifiers "s", "." can represent any character.
Using the above special characters can be very convenient to express some of the more cumbersome pattern matching. For example, "/\d0000/" uses the above regular expression to match more than 100,001 integer strings.
Position character:
A positional character is another very important character in a regular expression, and its main function is to describe the position of a character in a matching object.
^: The pattern that represents the match appears at the beginning of the matching object (and differs in "[]")
$: Indicates a matching pattern appears at the end of the matched object
Spaces: One of the two boundaries that represent matching patterns that are now beginning and ending
"/^he/": You can match a string that starts with the "he" character, such as Hello, height, etc.
"/he$/": can match with "he" character the end of the string that she, etc.;
"/he/": The beginning of a space, and the function of ^, matches a string beginning with he;
"/he/": The space ends, and the function of $, matches the string ending with he;
"/^he$/": represents only matches the string "he".
In addition to being able to match a user, a regular expression can be used to record the required information in parentheses "()", to be stored and read to the subsequent expression. Like what:
/^ ([a-za-z0-9_-]+) @ ([a-za-z0-9_-]+) (. [ A-za-z0-9_-]) $/
is to record the e-mail address of the user name, and the e-mail address of the server address (in the form of username@server.com, and so on), in the following if you want to read the recorded string, just need to use the "escape character + record order" to read. For example, "\1" is equivalent to the first "[a-za-z0-9_-]+", "\2" is the second ([a-za-z0-9_-]+), "\3" is the third (. [ A-za-z0-9_-]). But in PHP, "\" is a special character that needs to be escaped, so "\\1" should be written in the PHP expression.
Other special symbols:
"|" : or symbol "|" and PHP inside or the same, but a "|", not php two "| |"! This means that it can be a character or another string, such as "/abcd|dcba/" that might match "ABCD" or "DCBA".
Greedy mode
Previously mentioned "?" in the meta character. There is also an important role in the "greedy model", what is the "greedy model"?
Like we're going to match a string that ends with the letter "a" at the beginning of the letter "B", but the string that needs to be matched has a lot of "B" after "a", such as "a bbbbbbbbbbbbbbbbb", and that regular expression matches the first "B" or the Last "B"? If you use greedy mode, it will match to the last "B", and vice versa only to the first "B".
Expressions that use greedy mode are as follows:
/a.+?b/
/a.+b/u
The following are not used in greedy mode:
/a.+b/
The above uses a modifier u, as described in the following section.
Modifiers
Modifiers inside regular expressions can change many of the regular features, making regular expressions more appropriate for your needs (note: Modifiers are sensitive to capitalization, which means "E" is not equal to "E"). The modifiers in the regular expression are as follows:
I: If you add "I" to the modifier, it will remove the case sensitivity, that is, "a" and "a" are the same.
M: The default regular start "^" and end "$" just for regular strings if you add "M" to the modifier, the start and end will refer to each line of the string: "^" at the beginning of each line, and "$" at the end.
S: If you add "s" to the modifier, the default "." Any character that represents anything other than a line break will become any character, including a newline character!
X: If you add the modifier, the white space character in the expression will be ignored unless it has been escaped.
E: This modifier is only useful for replacement, and represents the PHP code in replacement.
A: If you use this modifier, the expression must be the beginning of the matching string. For example, "/a/a" matches "ABCD".
E: In contrast to "M", if this modifier is used, then "$" matches the end of the absolute string, rather than the newline character, which is turned on by default.
U: Similar to the question mark, used to set "greedy mode."
Pcre-related regular expression functions
PHP's Perl-compatible regular expressions provide multiple functions, including pattern matching, substitution and matching numbers, and so on:
1, Preg_match:
function format: int preg_match (string pattern, string subject, array [matches]);
This function matches the pattern expression in string, and if [regs] is given, the string is recorded in [Regs][0], and [Regs][1] represents the first string that is recorded with parentheses "()", [regs][2] Represents the second string that is recorded, and so on. Preg returns "true" if a matching pattern is found in string, otherwise returns "false".
2, Preg_replace:
function format: Mixed preg_replace (mixed pattern, mixed replacement, mixed subject);
This function replaces all strings that match the expression pattern in string with an expression replacement. If you need to include part of the pattern in the replacement, you can use "()" to record, in replacement only need to use "\1" to read.
3, Preg_split:
function format: Array preg_split (string pattern, string subject, int [limit]);
This function is the same as the function split, and the difference is that only with split you can use a simple regular expression to split a matching string, while Preg_split uses a full Perl-compatible regular expression. The third parameter, limit, represents how many eligible values are allowed to be returned.
4, Preg_grep:
function format: Array preg_grep (string patern, array input);
This function and the Preg_match function are basically, but Preg_grep can match all the elements in the given array input and return a new array. Here's an example, for example, to check that the email address is in the correct format:
<?php
function emailisright($email ) {
if (preg_match(^[_.0-9a-z-]+@) ([0-9a-z][0-9a-z-]+.) +[a-z]{2,3}$ ",$email )) {
return 1 ;
}
return 0 ;
}
if (emailisright(' y10k@963.net ')) echo ' correct <br> ' ;
if (! Emailisright (' y10k@fffff ') echo ' incorrect <br> ' ;
?>
The above program will output "correct <br> incorrect".
The difference between Perl-compatible regular expressions and perl/ereg regular expressions in PHP
Although called "Perl-compatible regular expressions," PHP is different from Perl's regular expressions, such as the modifier "G", which represents all matches in Perl, but does not include support for this modifier in PHP.
There is the difference with the Ereg series functions, Ereg is also provided in PHP regular expression functions, but compared with preg, much weaker.
1, Ereg inside is does not need also cannot use the separator and the modifier, therefore the EREG function is weaker than the preg to be many.
2, about "." : The point in the regular inside is generally except the line break character all characters, but in Ereg inside "." Is any character, including line breaks! If you want "." In Preg. Can include a newline character, which adds "s" to the modifier.
3, Ereg By default use greedy mode, and can not modify, this gives a lot of substitution and match bring trouble.
4, Speed: This may be a lot of people concerned about the problem, will not preg powerful is to speed in exchange for? Do not worry, preg speed is much faster than Ereg, the author has done a program test:
<?php
echo "Preg_replace used time:" ;
$start = time();
For ($i=1; $i <= 100000 ; $i ++) {
$str = "ssssssssssssssssssssssssssss" ;
preg_replace("/s/","" ",$str );
}
$ended = time()-$start ;
echo $ended ;
echo "Ereg_replace used time:" ;
$start = time();
For ($i=1; $i <= 100000 ; $i ++) {
$str = "ssssssssssssssssssssssssssss" ;
ereg_replace("s","",$str );
}
$ended = time()-$start ;
echo $ended ;
echo "Str_replace used time:" ;
$start = time();
For ($i=1; $i <= 100000 ; $i ++) {
$str = "sssssssssssssssssssssssssssss" ;
str_replace("s","",$str );
}
$ended = time()-$start ;
echo $ended ;
?>
Results:
Preg_replace used Time:5
Ereg_replace used Time:15
Str_replace used Time:2
Str_replace because there is no need to match so the speed is very fast, and preg_replace speed is much faster than ereg_replace.
About PHP3.0 support for Preg
Preg support was added by default in PHP 4.0, but not in 3.0. If you want to use the Preg function in 3.0, you must load the Php3_pcre.dll file, just add "extension = Php3_pcre.dll" to the extension section of php.ini and start PHP again!
In fact, regular expressions are often used for Ubbcode implementations, many PHP forums use this method (such as Zforum zphp.com or VB vbullent.com), but the specific code is relatively long.