Using Perl-compatible regular expressions in PHP _php tutorial

Source: Internet
Author: User
Tags ereg first string preg uppercase letter

PHP is widely used in the background CGI development of the web, usually after the user data data to obtain a certain result, but if the user input data is incorrect, there will be problems, such as someone's birthday is "February 30"! What should I do to check whether the summer vacation is correct? With the support of regular expressions in PHP, we can make data matching very convenient.

What is a regular expression

In short, regular expressions are a powerful tool that can be used for pattern matching and substitution. Find traces of regular expressions in almost all software tools based on the Unix/linux system, such as Perl or PHP scripting languages. In addition, JavaScript, a client-side scripting language, also provides support for regular expressions, and now regular expressions have become a common concept and tool that is widely used by various technical staff.

On one of the Linux sites, "If you ask what Linux enthusiasts like best, he may answer regular expressions, and if you ask him what he fears most, he will say regular expressions in addition to the cumbersome installation configuration." ”

As said above, the regular expression looks very complex, scary, most of the PHP beginners will skip here, continue the following learning, but the regular expression in PHP has the ability to use pattern matching to find a qualifying string, It is a pity to judge whether a string is eligible or to replace a string with a specified string, such as a strong function.

Basic syntax for regular expressions

A regular expression, divided into three parts: delimiters, expressions, and modifiers.

The delimiter can be any character other than a special character (such as "/!"). And so on), the commonly used delimiter is "/". Expressions are made up of special characters (see below for special characters) and non-special strings, such as "[a-z0-9_-]+@[a-z0-9_-.] + "can match a simple e-mail string. Modifier is used to turn on or off a function/mode. Here is an example of a complete regular expression:
The above regular expression "/" is the delimiter, between two "/" is an expression, the second "/" after the string "is" is the modifier.

If you have delimiters in an expression, you need to use the escape symbol "\", such as "/hello.+?\/hello/is". Escape symbols can execute special characters in addition to delimiters, and all the special characters made of letters need to be "\" to escape, for example "\d" represents all numbers.

Special characters for regular expressions

Special characters in regular expressions are divided into metacharacters, positional characters, and so on.

Metacharacters is a special kind of character in a regular expression that describes how the leading character (the character preceding the meta character) appears in the matched object. The meta-character itself is a single character, but different or identical meta-characters can be combined to form large meta-characters.


Curly braces: Curly braces are used to precisely specify the number of occurrences of a match metacharacters, such as "/pre{1,5}/" to indicate that a matching object can be a "pre", "Pree", "preeeee" so that a string of 1 to 5 "E" appears after PR. or "/pre{,5}/" on behalf of the pre appears 0 this to 5 times between.

Plus: the "+" character appears one or more times before a character is used to match a meta character. For example, "/ac+/" means that the matched object can be "act", "account", "ACCCC", and so on, after "a", one or more "C" strings. "+" equals "{1,}".

Asterisk: "*" character is used to match metacharacters before characters appear 0 or more times. For example, "/ac*/" means that the matched object can be "app", "ACP", "ACCP" and so on "a" after "a" appears 0 or more "C" string. "*" corresponds to "{0,}".

Question mark: "?" Characters appear 0 or 1 times before the character is used to match a meta character. For example, "/ac?/" means that the matching object can be a "a", "ACP", "ACWP" so that a 0 or 1 "C" string appears after "a". "?" There is also a very important role in regular expressions, namely "greedy mode".

There are also two very important special characters that are "[]". They can match the characters that appear in "[]", such as "/[az]/" can match a single character "a" or "Z", if the above expression is changed to such "/[a-z]/", you can match any single lowercase letter, such as "A", "B" and so on.

If "^" appears in "[]", it means that this expression does not match the characters appearing in "[]", such as "/[^a-z]/" does not match any lowercase letters! And the regular expression gives the default values of several "[]":
[: Alpha:]: Matches any letter
[: Alnum:]: Matches any letter and number
[:d Igit:]: Matches any number
[: Space:]: Match whitespace
[: Upper:]: matches any uppercase letter
[: Lower:]: matches any lowercase letter
[:p UNCT:]: matches any punctuation
[: Xdigit:]: Matches any 16 binary digits

In addition, the following special characters are escaped after escaping the symbol "\" to represent the following meanings:
S: matches individual whitespace
S: Used to match all characters except a single space character.
D: Used to match numbers from 0 to 9, equivalent to "/[0-9]/".
W: Used to match letters, numbers, or underscore characters, equivalent to "/[a-za-z0-9_]/".
W: used to match all characters that do not match W, equivalent to "/[^a-za-z0-9_]/".
D: Used to match any numeric characters that are not 10 binary.
.: Used to match all characters except the newline character, if the modifier "s" is decorated, "." can represent any character.

The use of the above special characters can be very convenient to express some of the more cumbersome pattern matching. For example, "/\d0000/" uses the above regular expression to match the integer string above, 100,001.

Positioning characters:

A positional character is another very important character in a regular expression, and its main purpose is to describe the position of the character in the matching object.
^: Indicates that the matching pattern appears at the beginning of the matching object (and differs in "[]")
$: Indicates that the matching pattern appears at the end of the matching object
Spaces: Indicates that the matching pattern is one of the two boundaries at the beginning and end
"/^he/": You can match a string that begins with the "he" character, such as Hello, height, and so on;
"/he$/": You can match a string that ends with the "he" character, she, and so on;
"/he/": The beginning of a space, and the function of ^, matching the string with the beginning of he;
"/he/": The space ends, and the function of $, matches a string ending with he;
"/^he$/": indicates that only the string "he" is matched.
In addition to the regular expression can be user-matching, you can also use parentheses "()" to record the required information, stored up, to the subsequent expression read. Like what:
/^ ([a-za-z0-9_-]+) @ ([a-za-z0-9_-]+) (. [ A-za-z0-9_-]) $/
Is the user name that records the e-mail address, and the server address of the email address (in the form of, which, if you want to read the recorded string, only needs to be read with "escape character + record order". For example, "\1" is the equivalent of the first "[a-za-z0-9_-]+", "\2" equivalent to the second ([a-za-z0-9_-]+), "\3" is the third (. [ A-za-z0-9_-]). However, in PHP, "\" is a special character that needs to be escaped, so "" in the PHP expression should be written as "\\1".
Other special symbols:
"|" : or symbol "|" And in PHP or the same, but a "|", not php two "| |"! This means that it can be a character or another string, such as "/abcd|dcba/" may match "ABCD" or "DCBA".

Greedy mode

Earlier in the meta-character mentioned "?" There is also an important role, namely "greedy mode", what is "greedy mode"?

For example, we want to match the letter "a" at the beginning of the letter "B" end of the string, but need to match the string after "a" contains a lot of "B", such as "a bbbbbbbbbbbbbbbbb", that the regular expression will match the first "B" or the Last "B"? If you use greedy mode, it will match to the last "B", and vice versa only to the first "B".
Expressions that use greedy mode are as follows:
The following are not used for greedy mode:
The above uses a modifier u, as described in the following section.


Modifiers in regular expressions can change many of the regular features, making regular expressions more appropriate for your needs (note: Modifiers are sensitive to case, meaning "E" is not equal to "E"). The modifiers inside the regular expression are as follows:
I: If you add "I" to the modifier, the regular will remove the case sensitivity, i.e. "a" and "a" are the same.
M: Default regular start "^" and end "$" just for regular strings if you add "M" to the modifier, then the start and end will refer to each line of the string: the beginning of each line is "^" and the End is "$".
S: If "s" is added to the modifier, then the default "." Any character that represents anything other than a newline character will become any characters, including line breaks!
x: If the modifier is added, the white space character in the expression will be ignored unless it has been escaped.
E: This modifier is only useful for replacement, and represents the PHP code in replacement.
A: If you use this modifier, the expression must be the beginning of the matching string. For example, "/a/a" matches "ABCD".
E: In contrast to "M", if this modifier is used, then "$" will match the end of the absolute string, not the line break, which is turned on by default.
U: Similar to question mark, used to set "greedy mode".

Pcre-related regular expression functions

PHP's Perl-compatible regular expressions provide multiple functions, including pattern matching, substitution and matching numbers, and so on:

1, Preg_match:
function format: int preg_match (string pattern, string subject, array [matches]);
This function uses the pattern expression in string to match, and if given [Regs], a string is recorded in [Regs][0], [regs][1] represents the first string that is recorded using the parentheses "()", [regs][2] Represents a second string that is recorded, and so on. Preg if a matching pattern is found in the string, it returns "true", otherwise "false" is returned.

2, Preg_replace:
function format: Mixed preg_replace (mixed pattern, mixed replacement, mixed subject);
This function replaces all strings that match the pattern in the string with the expression replacement. If you need to include some of the pattern's characters in replacement, you can use "()" to record, in replacement you just need to use "\1" to read.

3, Preg_split:
function format: Array preg_split (string pattern, string subject, int [limit]);
This function, like the function split, distinguishes between matching strings using simple regular expressions only with split, while Preg_split uses full Perl-compatible regular expressions. The third parameter, the limit, represents the number of eligible values that are allowed to be returned.

4, Preg_grep:
function format: Array preg_grep (string patern, array input);
This function and the Preg_match function are basically, but Preg_grep can match all the elements in the given array input, returning a new array. For example, let's check if the email address is in the correct format:

function Emailisright ($email) {
if (Preg_match ("^[_.0-9a-z-]+@" ([0-9a-z][0-9a-z-]+.) +[a-z]{2,3}$ ", $email)) {
return 1;
return 0;
if (emailisright (' ')) echo ' correct
if (!emailisright (' y10k@fffff ')) echo ' incorrect

The program above will output the correct
Not correct. "

The difference between Perl-compatible regular expressions and perl/ereg regular expressions in PHP

Although called "Perl-compatible regular Expressions", PHP is still a bit different from Perl's regular expressions, such as the modifier "G", which represents all matches in Perl, but does not include support for this modifier in PHP.
There is the difference between the Ereg series functions, Ereg is also the regular expression function provided in PHP, but compared with preg, it is much weaker.

1, Ereg inside is not necessary and can not use separators and modifiers, so ereg function than preg to a lot weaker.
2, about "." : The point in the regular is usually all characters except the newline character, but in the Ereg "." Is any character, which includes line breaks! If in Preg hope "." To include line breaks, you can add "s" to the modifier.
3, ereg default use greedy mode, and can not be modified, this gives a lot of replacement and matching trouble.
4, Speed: This may be a lot of people are concerned about the problem, will not preg powerful is to exchange speed for? Do not worry, preg speed is far faster than Ereg, the author has done a program test:

echo "Preg_replace used time:";
$start = time ();
for ($i =1; $i <=100000; $i + +) {
$str = "Ssssssssssssssssssssssssssss";
Preg_replace ("/s/", "", $str);
$ended = Time ()-$start;
Echo $ended;
echo "Ereg_replace used time:";
$start = time ();
for ($i =1; $i <=100000; $i + +) {
$str = "Ssssssssssssssssssssssssssss";
Ereg_replace ("s", "", $str);
$ended = Time ()-$start;
Echo $ended;
echo "Str_replace used time:";
$start = time ();
for ($i =1; $i <=100000; $i + +) {
$str = "Sssssssssssssssssssssssssssss";
Str_replace ("s", "", $str);
$ended = Time ()-$start;
Echo $ended;

Preg_replace used Time:5
Ereg_replace used Time:15
Str_replace used Time:2
Str_replace because there is no need to match so very fast, and preg_replace faster than ereg_replace to a lot faster.

About PHP3.0 support for Preg

Preg support was added by default in PHP 4.0, but not in 3.0. If you want to use the Preg function in 3.0, you must load the Php3_pcre.dll file, just add "extension = Php3_pcre.dll" in the extension section of php.ini and then start PHP again!

In fact, regular expressions are often used in the implementation of Ubbcode, many PHP forums use this method (such as Zforum or VB, but the specific code is relatively long. true techarticle preface PHP is widely used in the background CGI development of the web, usually after the user data data to obtain a certain result, but if the user input data is incorrect, there will be problems, such as ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.