[Go] php Regular: PHP Regular expression syntax

Source: Internet
Author: User
Tags character classes ereg first string php regular expression preg uppercase letter
Reprinted from: http://blog.csdn.net/kkobebryant/article/details/267527

Basic syntax for regular expressions

First, let's take a look at two special characters: ' ^ ' and ' $ ' they are used to match the beginning and end of the string, respectively, to illustrate


"^the": matches a string that begins with "the";
"Of despair$": matches a string ending with "of despair";
"^abc$": matches a string starting with ABC and ending with ABC, in fact only ABC matches it
"Notice": matches a string containing notice


You can see if you're not using the two characters we mentioned (the last one), which means that the pattern (regular expression) can appear anywhere in the checked string, and you don't lock him to either side.
There are also several characters ' * ', ' + ', and '? ', which they use to indicate the number or order in which a character can appear. They say, "zero or more", "one or more", and "zero or a." Here are some examples:


"ab*": matches strings A and 0 or more B consisting of a string ("a", "AB", "abbb", etc);
"ab+": Same as above, but at least one B ("AB", "abbb", etc);
"AB": Match 0 or one B;
"a?b+$": matches a string that ends with one or 0 a plus more than one B.


You can also limit the number of characters that appear in curly braces, such as


"Ab{2}": Match one A followed by two B (one is not less) ("ABB");
"Ab{2,}": Minimum of two B ("ABB", "abbbb", etc);
"ab{3,5}": 2-5 B ("abbb", "abbbb", or "abbbbb").

You also have to notice that you must always specify (i.e, "{0,2}", not "{, 2}"). Again, you have to notice, ' * ', ' + ', and '? ' the same as three range labels, "{0,}", "{1,}", and "{0,1}", respectively.

Now put a certain number of characters in parentheses, for example:


"A (BC) *": Match a followed by 0 or a "BC";
"A (BC) {1,5}": one to 5 "BC."


There is also a character ' │ ', equivalent to or operation:


"Hi│hello": matches a string containing "hi" or "hello";
"(B│CD) EF": Matches a string containing "bef" or "cdef";
"(a│b) *c": match contains such-multiple (including 0) A or B, followed by a C
string for the string;


A point ('. ') Can represent all the single characters:


"A.[0-9]": A followed by a character followed by a number (a string containing such a string will be matched, omit this parenthesis later)
"^. {3}$ ": ends with three characters.


Bracketed content matches only one single character


"[AB]": matches a single A or B (as with "a│b");
"[A-d]": a single character matching ' a ' to ' d ' (same as "a│b│c│d" and "[ABCD]");
"^[a-za-z]": matches a string beginning with a letter
"[0-9]%": matches a string containing the form x percent
", [a-za-z0-9]$": matches a string that ends with a comma in addition to a number or letter


You can also put the words you don't want to be in brackets, you just need to use ' ^ ' as the opening in the total brackets (i.e., "%[^a-za-z]%" matches two percent of the sign containing a non-alphabetic string).

In order to be able to explain, but "^. [$ () │*+? {/"As characters with special meaning, you must precede these characters with ' ', and in php3 you should avoid using/, for example, regular expressions in front of the pattern" (/$│?[ 0-9]+ "This should call Ereg (" (//$│?[ 0-9]+ ", $str) (Don't know if PHP4 is the same)

Don't forget that the characters inside the brackets are the exceptions to this rule. Inside the brackets, all of the special characters, including (' '), will lose their special properties (i.e., "[*/+?{}.]" Matches a string containing these characters). Also, as REGX's Handbook tells us: "If the list contains '] ', it is best to use it as the first character in the table (possibly following the ' ^ '). If it contains '-', it is best to place it on the front or the last side, or the second end of a range (i.e. [a-d-0-9] in the middle of the '-' will be effective.

In order to complete, I should involve collating sequences, character classes, with buried equivalence classes. But I do not want to say in these aspects too detailed, these in the following article does not need to be involved. You can get more information in the Regex man pages.

How to build a pattern to match the input of a currency quantity

Well, now we're going to use what we've learned to do something useful: build a matching pattern to check whether the information entered is a number that represents money. We think that there are four ways to represent money: "10000.00" and "10,000.00", or there are no decimal parts, "10000" and "10,000". Now let's start building this matching pattern:

^[1-9][0-9]*$

This is the variable that must start with a number other than 0. But it also means that a single "0" cannot pass the test. Here's how to fix it:

^ (0│[1-9][0-9]*) $

"Only 0 and numbers not starting with 0 match", we can also allow a minus sign before the number:

^ (0│-? [1-9] [0-9]*) $

This is: "0 or one with 0 may have a minus sign in front of the number." Okay, okay, now let's not be so rigorous, allow to start with 0. Now let's give up the minus sign because we don't need it when it comes to representing coins. We now specify the pattern to match the fractional part:

^[0-9]+ (/.[ 0-9]+)? $

This implies that the matched string must begin with at least one Arabic numeral. But note that in the above mode "10." is mismatched, only "10" and "10.2" can be. (Do you know why?)

^[0-9]+ (/.[ 0-9]{2})? $

We have to specify two decimal places after the decimal point. If you think this is too harsh, you can change it to:

^[0-9]+ (/.[ 0-9]{1,2})? $

This will allow one to two characters after the decimal point. Now we add a comma for readability (every three bits), so we can say:

^[0-9]{1,3} (, [0-9]{3}) * (/.[ 0-9]{1,2})? $

Don't forget that the plus ' + ' can be replaced by multiplication sign ' * ' If you want to allow blank strings to be entered (why?). Also do not forget that the backslash '/' may have errors (very common errors) in the PHP string. Now that we can confirm the string, we now take all the commas out of Str_replace (",", "", $money) and then treat the type as a double and we can do the math with him.
Constructs a regular expression to check e-mail

OK, let's go ahead and discuss how to verify an email address. There are three parts in a full email address: The POP3 username (everything on the left of ' @ '), ' @ ', the server name (the rest of the section). The user name can contain uppercase and lowercase Arabic numerals, a period ('. '), minus ('-'), and an underscore ('_'). The server name also conforms to this rule, except of course the underscore.

Now, the start and end of the user name cannot be a period. The same is true for servers. And you can't have two consecutive periods. There is at least one character between them, so let's take a look at how to write a matching pattern for the user name:

^[_a-za-z0-9-]+$

There is no time to allow the period to exist. We add it to:

^[_a-za-z0-9-]+ (/.[ _a-za-z0-9-]+) *$

This means: "At least one canonical character (except. Unexpected) begins, followed by 0 or more strings starting with a dot."

To make it simpler, we can replace Ereg () with eregi (). Eregi () is not case sensitive, we don't need to specify two ranges "A-Z" and "A-Z"? You just have to specify one:

^[_a-z0-9-]+ (/.[ _a-z0-9-]+) *$

The following server name is the same, but to remove the underscore:

^[a-z0-9-]+ (/.[ a-z0-9-]+) *$

Done. Now just use "@" to connect the two parts:

^[_a-z0-9-]+ (/.[ _a-z0-9-]+) *@[a-z0-9-]+ (/.[ a-z0-9-]+) *$


This is the complete email authentication matching mode, only need to call

Eregi (' ^[_a-z0-9-]+ (/.[ _a-z0-9-]+) *@[a-z0-9-]+ (/.[ a-z0-9-]+) *$ ', $eamil)

You can get an email.
Other uses of regular expressions

Extracting a string

Ereg () and eregi () have a feature that allows a user to extract a portion of a string from a regular expression (you can read the manual for specific usage). For example, do we want to extract file names from Path/url? The following code is what you need:

Ereg ("([^///]*) $", $PATHORURL, $regs);
echo $regs [1];

High-level substitution

Ereg_replace () and Eregi_replace () are also useful: if we want to replace all the interval minus signs with commas:


Ereg_replace ("[/n/r/t]+", ",", Trim ($STR));


PHP is widely used in the background CGI development of the web, usually after the user data data to obtain a certain result, but if the user input data is incorrect, there will be problems, such as someone's birthday is "February 30"! What should I do to check whether the summer vacation is correct? With the support of regular expressions in PHP, we can make data matching very convenient.

2 What is a regular expression:
In short, regular expressions are a powerful tool that can be used for pattern matching and substitution. Find traces of regular expressions in almost all software tools based on the Unix/linux system, such as Perl or PHP scripting languages. In addition, JavaScript, a client-side scripting language, also provides support for regular expressions, and now regular expressions have become a common concept and tool that is widely used by various technical staff.
On one of the Linux sites, "If you ask what Linux enthusiasts like best, he may answer regular expressions, and if you ask him what he fears most, he will say regular expressions in addition to the cumbersome installation configuration." "
As said above, the regular expression looks very complex, scary, most of the PHP beginners will skip here, continue the following learning, but the regular expression in PHP has the ability to use pattern matching to find a qualifying string, It is a pity to judge whether a string is eligible or to replace a string with a specified string, such as a strong function.


3 Basic syntax for regular expressions:
A regular expression, divided into three parts: delimiters, expressions, and modifiers. The
delimiter can be any character other than a special character (such as "/!"). And so on), the commonly used delimiter is "/". Expressions are made up of special characters (see below for special characters) and non-special strings, such as "[a-z0-9_-]+@[a-z0-9_-.] + "can match a simple e-mail string. Modifier is used to turn on or off a function/mode. Here is an example of a complete regular expression:
/hello.+?hello/is
above the regular expression "/" is the delimiter, two "/" between the expression, the second "/" after the string "is" is the modifier.
If you have delimiters in an expression, you need to use the escape symbol "/", such as "/hello.+?//hello/is". Escape symbols can execute special characters in addition to delimiters, and all the special characters that are composed of letters require "/" to escape, such as "/d" for all numbers.


4 special characters for regular expressions: special characters in
Regular expressions are divided into metacharacters, positional characters, and so on. The
metacharacters are special-meaning characters in a regular expression that describe how their leading characters (the characters preceding the metacharacters) appear in the matched object. The meta-character itself is a single character, but different or identical meta-characters can be combined to form large meta-characters.
metacharacters:
curly braces: curly braces are used to precisely specify the number of occurrences of a match metacharacters, such as "/pre{1,5}/" to indicate that the matching object can be "pre", "Pree", "preeeee", and then a string of 1 to 5 "E" appears after PR. or "/pre{,5}/" on behalf of the pre appears 0 this to 5 times between. The
plus sign: The character before the "+" character is used to match the characters before the metacharacters appear one or more times. For example, "/ac+/" means that the matched object can be "act", "account", "ACCCC", and so on, after "a", one or more "C" strings. "+" equals "{1,}". The
asterisk: "*" character appears 0 or more times before the character is used to match a meta character. For example, "/ac*/" means that the matched object can be "app", "ACP", "ACCP" and so on "a" after "a" appears 0 or more "C" string. "*" corresponds to "{0,}".
question mark: "?" Characters appear 0 or 1 times before the character is used to match a meta character. For example, "/ac?/" means that the matching object can be a "a", "ACP", "ACWP" so that a 0 or 1 "C" string appears after "a". "?" There is also a very important role in regular expressions, namely "greedy mode".

There are also two very important special characters that are "[]". They can match the characters that appear in "[]", such as "/[az]/" can match a single character "a" or "Z", if the above expression is changed to such "/[a-z]/", you can match any single lowercase letter, such as "A", "B" and so on.
If "^" appears in "[]", it means that this expression does not match the characters appearing in "[]", such as "/[^a-z]/" does not match any lowercase letters! And the regular expression gives the default values of several "[]":
[: Alpha:]: Matches any letter
[: Alnum:]: Matches any letter and number
[:d Igit:]: Matches any number
[: Space:]: Match whitespace
[: Upper:]: matches any uppercase letter
[: Lower:]: matches any lowercase letter
[:p UNCT:]: matches any punctuation
[: Xdigit:]: Matches any 16 binary digits

In addition, the following special characters are represented by the escape symbol "/" after escaping the following meanings:
S: matches individual whitespace
S: Used to match all characters except a single space character.
D: Used to match numbers from 0 to 9, equivalent to "/[0-9]/".
W: Used to match letters, numbers, or underscore characters, equivalent to "/[a-za-z0-9_]/".
W: used to match all characters that do not match W, equivalent to "/[^a-za-z0-9_]/".
D: Used to match any numeric characters that are not 10 binary.
.: Used to match all characters except the newline character, if the modifier "s" is decorated, "." can represent any character.

The use of the above special characters can be very convenient to express some of the more cumbersome pattern matching. For example, "//d0000/" uses the above regular expression to match the integer string above, 100,001.

Positioning characters:
A positional character is another very important character in a regular expression, and its main purpose is to describe the position of the character in the matching object.
^: Indicates that the matching pattern appears at the beginning of the matching object (and differs in "[]")
$: Indicates that the matching pattern appears at the end of the matching object
Spaces: Indicates that the matching pattern is one of the two boundaries at the beginning and end
"/^he/": You can match a string that begins with the "he" character, such as Hello, height, and so on;
"/he$/": You can match a string that ends with the "he" character, she, and so on;
"/he/": The beginning of a space, and the function of ^, matching the string with the beginning of he;
"/he/": The space ends, and the function of $, matches a string ending with he;
"/^he$/": indicates that only the string "he" is matched.

Brackets:
In addition to the regular expression can be user-matching, you can also use parentheses "()" to record the required information, stored up, to the subsequent expression read. Like what:
/^ ([a-za-z0-9_-]+) @ ([a-za-z0-9_-]+) (. [ A-za-z0-9_-]) $/
Is the user name that records the e-mail address, and the server address of the email address (in the form of username@server.com), which, if you want to read the recorded string, only needs to be read with "escape character + record order". For example, "/1" is the equivalent of the first "[a-za-z0-9_-]+", "/2" equivalent to the second ([a-za-z0-9_-]+), "/3" is the third (. [ A-za-z0-9_-]). However, in PHP, "/" is a special character that needs to be escaped, so "" in PHP expression should be written "//1".
Other special symbols:
"|" : or symbol "|" And in PHP or the same, but a "|", not php two "| |"! This means that it can be a character or another string, such as "/abcd|dcba/" may match "ABCD" or "DCBA".


5 Greedy Mode:
Earlier in the meta-character mentioned "?" There is also an important role, namely "greedy mode", what is "greedy mode"?
For example, we want to match the letter "a" at the beginning of the letter "B" end of the string, but need to match the string after "a" contains a lot of "B", such as "a bbbbbbbbbbbbbbbbb", that the regular expression will match the first "B" or the Last "B"? If you use greedy mode, it will match to the last "B", and vice versa only to the first "B".
Expressions that use greedy mode are as follows:
/a.+?b/
/a.+b/u
The following are not used for greedy mode:
/a.+b/
The above uses a modifier u, as described in the following section.


6 Modifiers:
Modifiers in regular expressions can change many of the regular features, making regular expressions more appropriate for your needs (note: Modifiers are sensitive to case, meaning "E" is not equal to "E"). The modifiers inside the regular expression are as follows:
I: If you add "I" to the modifier, the regular will remove the case sensitivity, i.e. "a" and "a" are the same.
M: Default regular start "^" and end "$" just for regular strings if you add "M" to the modifier, then the start and end will refer to each line of the string: the beginning of each line is "^" and the End is "$".
S: If "s" is added to the modifier, then the default "." Any character that represents anything other than a newline character will become any characters, including line breaks!
x: If the modifier is added, the white space character in the expression will be ignored unless it has been escaped.
E: This modifier is only useful for replacement, and represents the PHP code in replacement.
A: If you use this modifier, the expression must be the beginning of the matching string. For example, "/a/a" matches "ABCD".
E: In contrast to "M", if this modifier is used, then "$" will match the end of the absolute string, not the line break, which is turned on by default.
U: Similar to question mark, used to set "greedy mode".


7 Pcre-related regular expression functions:
PHP's Perl-compatible regular expressions provide multiple functions, including pattern matching, substitution and matching numbers, and so on:
1, Preg_match:
function format: int preg_match (string pattern, string subject, array [matches]);
This function uses the pattern expression in string to match, and if given [Regs], a string is recorded in [Regs][0], [regs][1] represents the first string that is recorded using the parentheses "()", [regs][2] Represents a second string that is recorded, and so on. Preg if a matching pattern is found in the string, it returns "true", otherwise "false" is returned.

2, Preg_replace:
function format: Mixed preg_replace (mixed pattern, mixed replacement, mixed subject);
This function replaces all strings that match the pattern in the string with the expression replacement. If you need to include some of the pattern's characters in replacement, you can use "()" to record that, in replacement, you just need to use "/1" to read.

3, Preg_split:
function format: Array preg_split (string pattern, string subject, int [limit]);
This function, like the function split, distinguishes between matching strings using simple regular expressions only with split, while Preg_split uses full Perl-compatible regular expressions. The third parameter, the limit, represents the number of eligible values that are allowed to be returned.

4, Preg_grep:
function format: Array preg_grep (string patern, array input);
This function and the Preg_match function are basically, but Preg_grep can match all the elements in the given array input, returning a new array.

For example, let's check if the email address is in the correct format:


function Emailisright ($email) {
if (Preg_match ("^[_/.0-9a-z-]+@" ([0-9a-z][0-9a-z-]+/.) +[a-z]{2,3}$ ", $email)) {
return 1;
}
return 0;
}
if (emailisright (' y10k@963.net ')) echo ' correct
';
if (!emailisright (' y10k@fffff ')) echo ' incorrect
';
?>

The program above will output the correct
Not correct. "

The differences between Perl-compatible regular expressions and perl/ereg regular expressions in 8.PHP:
Although called "Perl-compatible regular Expressions", PHP is still a bit different from Perl's regular expressions, such as the modifier "G", which represents all matches in Perl, but does not include support for this modifier in PHP.
There is the difference between the Ereg series functions, Ereg is also the regular expression function provided in PHP, but compared with preg, it is much weaker.

1, Ereg inside is not necessary and can not use separators and modifiers, so ereg function than preg to a lot weaker.
2, about "." : The point in the regular is usually all characters except the newline character, but in the Ereg "." Is any character, which includes line breaks! If in Preg hope "." To include line breaks, you can add "s" to the modifier.
3, ereg default use greedy mode, and can not be modified, this gives a lot of replacement and matching trouble.
4, Speed: This may be a lot of people are concerned about the problem, will not preg powerful is to exchange speed for? Do not worry, preg speed is far faster than Ereg, the author has done a program test:

Time Test:

PHP Code:

echo "Preg_replace used time:";
$start = time ();
for ($i =1; $i <=100000; $i + +) {
$str = "Ssssssssssssssssssssssssssss";
Preg_replace ("/s/", "", $str);
}
$ended = Time ()-$start;
Echo $ended;
echo "
Ereg_replace used time: ";
$start = time ();
for ($i =1; $i <=100000; $i + +) {
$str = "Ssssssssssssssssssssssssssss";
Ereg_replace ("s", "", $str);
}
$ended = Time ()-$start;
Echo $ended;
echo "
Str_replace used time: ";
$start = time ();
for ($i =1; $i <=100000; $i + +) {
$str = "Sssssssssssssssssssssssssssss";
Str_replace ("s", "", $str);
}
$ended = Time ()-$start;
Echo $ended;
?>
Results:
Preg_replace used Time:5
Ereg_replace used Time:15
Str_replace used Time:2


Str_replace because there is no need to match so very fast, and preg_replace faster than ereg_replace to a lot faster.


9. About PHP3.0 support for Preg:
Preg support was added by default in PHP 4.0, but not in 3.0. If you want to use the Preg function in 3.0, you must load the Php3_pcre.dll file, just add "extension = Php3_pcre.dll" in the extension section of php.ini and then start PHP again!
In fact, regular expressions are often used in the implementation of Ubbcode, many PHP forums use this method (such as Zforum zphp.com or VB vbullent.com), but the specific code is relatively long.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.