Basic syntax summarization of PHP regular Expressions _c language

Source: Internet
Author: User
Tags character classes email string ereg first string lowercase php regular expression preg uppercase letter

First, Let's take a look at two special characters: ' ^ ' and ' $ ' they are used to match the start and end of the string, respectively, to illustrate

"^the": matches a string that starts with "the";
"Of despair$": matches a string ending with "of despair";
"^abc$": matches a string that begins with ABC and ends with ABC, and is actually only matched by ABC
"Notice": matches a string containing notice

You can see if you don't use the two characters we mentioned (the last example), which means that the pattern (regular expression) can appear anywhere in the string being tested, you don't lock him on either side
There are also several characters ' * ', ' + ', and '? ', which are used to indicate the number of times a character can appear or order. They said: "Zero or more", "one or more", and "zero or a." Here are some examples:

"ab*": matches strings A and 0 or more B strings ("A", "AB", "abbb", etc.);
"ab+": Same as above, but at least one B ("AB", "abbb", etc.);
"AB": Match 0 or one B;
"a?b+$": a string that matches the end of one or 0 a plus more than one B.

You can also limit the number of characters appearing in curly braces, such as

"Ab{2}": Match a A followed by two B (one also can not be less) ("ABB");
"Ab{2,}": At least two B ("ABB", "abbbb", etc.);
"ab{3,5}": 2-5 B ("abbb", "abbbb", or "abbbbb").

You should also note that you must always specify (i.e, "{0,2}", not "{, 2}"). Again, you have to notice, ' * ', ' + ', and '? ' The three range callouts are the same, "{0,}", "{1,}", and "{0,1}" respectively.

Now put a certain number of characters in parentheses, for example:

"A (BC) *": Match a followed by 0 or a "BC";
"A (BC) {1,5}": one to 5 "BC."

There is also a character ' │ ', which is equivalent to an OR operation:

"Hi│hello": matches a string containing "hi" or "hello";
"(B│CD) EF": Matches a string containing "bef" or "cdef";
"(a│b) *c": match contains this-multiple (including 0) A or B, followed by a C
string of strings;

A point ('. ') Can represent all the single characters:

"A.[0-9]": A character followed by a number (a string containing such a string will be matched to omit this parenthesis later)
"^. {3}$: ends with three characters.

The content enclosed in brackets matches only one single character

"[AB]": Match a single A or B (as with "a│b");
[a-d]: a single character that matches ' a ' to ' d ' (same as ' a│b│c│d ' and ' [ABCD] ' effect);
' ^[a-za-z] ': matches a string beginning with a letter
' [0-9]% ': matches a string containing x-percent
", [a-za-z0-9]$": matches a string with a comma at the end of a number or letter


You can also put the characters you don't want in the brackets, you just need to use ' ^ ' in the parentheses as the opening (i.e., "%[^a-za-z]%" matches contain two percent signs with a non-alphanumeric string).

In order to be able to explain, but "^. [$ () │*+? {/"As a character with special meaning, you have to add" in front of these characters, and in php3 you should avoid using the front of the pattern/, for example, regular expressions "(/$│?[ 0-9]+ "should call Ereg ("//$│?[ 0-9]+ ", $str) (Don't know if PHP4 is the same)

Don't forget that the characters inside the brackets are exceptions to this rule-within the brackets, all the special characters, including ('), will lose their special properties (i.e., "[*/+?{}.]" Matches a string containing these characters. And, as REGX's Handbook tells us: "If the list contains '," It's best to use it as the first character in the table (probably following ' ^ '). If it contains '-', it is best to put it at the front or end, or the second end of a range (i.e. [a-d-0-9] in the middle of '-' will be valid.

In order to be complete, I should involve collating sequences, character classes, buried equivalence classes. But I do not want to say in these areas too detailed, these in the following articles should not be involved. You can get more information from the Regex man pages.

How to build a pattern to match the amount of money entered

Well, now we're going to use what we've learned to do something useful: build a matching pattern to check whether the input information is a number that represents money. We think there are four ways to represent money: "10000.00" and "10,000.00", or no decimal parts, "10000" and "10,000". Now let's start building this matching pattern:

^[1-9][0-9]*$

This is the variable that must begin with a number other than 0. But it also means that a single "0" cannot pass the test. Here's how to fix it:

^ (0│[1-9][0-9]*) $

"Only 0 matches the number that does not start with 0," we can also allow a minus sign to precede the number:

^ (0│-?) [1-9] [0-9]*) $

This is: "0 or an opening with 0 may have a minus sign in front of the number." Okay, okay, now let's not be so rigorous, let's start with 0. Now lets give up the minus sign because we don't need it when it comes to coins. We now specify the pattern to match the decimal part:

^[0-9]+ (/.[ 0-9]+)? $

This implies that the matching string must begin with at least one Arabic numeral. Note, however, that in the above mode "10." is not matched, only "10" and "10.2" can be. (Do you know why?)

^[0-9]+ (/.[ 0-9]{2})? $

We specified above the decimal point must have two decimal digits. If you think this is too harsh, you can change it to:

^[0-9]+ (/.[ 0-9]{1,2})? $

This will allow one to two characters after the decimal point. Now we add commas to increase readability (every three digits), and we can say this:

^[0-9]{1,3} (, [0-9]{3}) * (/.[ 0-9]{1,2})? $

Don't forget the plus sign ' + ' can be multiplication ' * ' If you want to allow a blank string to be entered (why?). Also don't forget that the backslash '/' may have errors in the PHP string (a common error). Now that we can confirm the string, we're going to remove all the commas str_replace (",", "", $money) and then look at the type as a double and then we can do the math by him.
Construct regular expressions to check email

OK, let's continue to discuss how to verify an email address. There are three parts in a full email address: POP3 username (everything on the ' @ ' left), ' @ ', server name (that's the rest). User names can contain uppercase and lowercase Arabic numerals, periods ('. '), minus sign ('-'), and underline ('_'). The server name also complies with this rule, with the exception of the underscore.

Now, the start and end of a username cannot be a period. The same is true for servers. And you can't have two consecutive periods there's at least one character between them, okay now let's take a look at how to write a matching pattern for the user name:

^[_a-za-z0-9-]+$

It is not possible to allow a period to exist yet. We add it to:

^[_a-za-z0-9-]+ (/.[ _a-za-z0-9-]+) *$

The above means: "At least one canonical character (except. unexpected), followed by 0 or more strings starting with dots."

To simplify, we can replace Ereg () with eregi (). Eregi () is insensitive to case, we don't need to specify two ranges "A-Z" and "A-Z" – just specify one:

^[_a-z0-9-]+ (/.[ _a-z0-9-]+) *$

The following server name is the same, but to remove the underline:

^[a-z0-9-]+ (/.[ a-z0-9-]+) *$

Done. Now just use the "@" to connect the two parts:

^[_a-z0-9-]+ (/.[ _a-z0-9-]+) *@[a-z0-9-]+ (/.[ a-z0-9-]+) *$


This is the complete email authentication matching mode, only need to call

Eregi (' ^[_a-z0-9-]+ (/.[ _a-z0-9-]+) *@[a-z0-9-]+ (/.[ a-z0-9-]+) *$ ', $eamil)

We can get an email.
Other uses of regular expressions

Extract string

Ereg () and eregi () has an attribute that allows the user to extract part of the string through a regular expression (you can read the manual). For example, we want to extract the filename from Path/url – The following code is what you need:

Ereg ("([^///]*) $", $PATHORURL, $regs);
echo $regs [1];

High-level substitution

Ereg_replace () and Eregi_replace () are also useful: if we want to replace all the interval minus signs with commas:

Ereg_replace ("[/n/r/t]+", ",", Trim ($STR));

PHP is a large number of web-used background CGI development, usually after the user data data to produce some results, but if the user input data is incorrect, there will be problems, such as someone's birthday is "February 30!" How should it be like to test the summer vacation is correct? The support of regular expressions is added to PHP so that we can easily match data.

2 What is a regular expression:

In short, regular expressions are a powerful tool that can be used for pattern matching and substitution. Find traces of regular expressions in almost all unix/linux-based software tools, such as Perl or PHP scripting languages. In addition, JavaScript, a client-side scripting language, also provides support for regular expressions, and now regular expressions have become a common concept and tool that is widely used by all kinds of technicians.

On a Linux website there is a saying: "If you ask what a Linux enthusiast likes most, he may answer regular expressions; If you ask him what he is most afraid of, he will certainly say regular expressions in addition to the cumbersome installation configuration." "

As the above says, regular expressions look very complex and scary, and most PHP beginners skip here and continue with the following learning, but regular expressions in PHP have strings that can be matched by pattern matching, It's a shame to judge whether a string is qualified or to substitute a specified string for a qualified string.

3 basic syntax for regular expressions:

A regular expression that is divided into three parts: delimiters, expressions, and modifiers.

The delimiter can be any character other than a special character (such as "/!"). And so on), the commonly used separator is "/". An expression consists of special characters (see below for special characters) and special strings, such as [a-z0-9_-]+@[a-z0-9_-.] + "can match a simple email string. Modifiers are used to turn a function/mode on or off. The following is an example of a complete regular expression:

/hello.+?hello/is

The above regular expression "/" is the delimiter, and the two "/" is the expression, and the string "is" after the second "/" is the modifier.

If you include a separator in an expression, you need to use the escape symbol "/", such as "/hello.+?//hello/is". Escape symbols can perform special characters in addition to delimiters, and all special characters made up of letters require "/" to be escaped, such as "/d" to represent the whole number.


4 special characters for regular expressions:

Special characters in regular expressions are divided into meta characters, positional characters, and so on.

A metacharacters is a special type of character in a regular expression that describes how its leading character (the character before the metacharacters) appears in the matched object. The meta character itself is a single character, but the combination of different or identical metacharacters can constitute a large meta character.

Metacharacters

Braces: Braces are used to precisely specify the number of occurrences of a matching meta character, such as "/pre{1,5}/" to indicate that the matched object can be "pre", "Pree", "preeeee" so that a string of 1 to 5 "E" appears after "PR". or "/pre{,5}/" on behalf of the pre appears between 0 this to 5 times.

Plus: the "+" character is used to match the characters before the character appears once or more times. For example, "/ac+/" means that objects that are matched can be "act", "account", "ACCCC", and so on, one or more "C" strings that appear after "a". "+" is equivalent to "{1,}".

Asterisk: the "*" character used to match the character before the metacharacters appears 0 or more times. For example, "/ac*/" means that the objects that are matched can be "app", "ACP", "ACCP", and so on, and then 0 or more "C" strings appear after "a". "*" is equivalent to "{0,}".

Question mark: "?" Characters used to match the metacharacters appear 0 or 1 times. For example, "/ac?/" means that the matching object can be "a", "ACP", "ACWP" so that a string of 0 or 1 "C" appears after "a". "?" In regular expressions There is also a very important role in the "greedy mode."

Two other very important special characters are "[]". They can match the characters that appear in "[]", such as "/[az]/" to match a single character "a" or "Z", and if you change the above expression to "/[a-z]/", you can match any single lowercase letter, such as "A", "B", and so on.

If "^" appears in "[]", this expression does not match the characters appearing in "[]", such as "/[^a-z]/" does not match any lowercase letters! And the regular expression gives several default values for []:

[: Alpha:]: Matches any letter
[: Alnum:]: Match any letter and number
[:d Igit:]: Match any number
[: Space:]: Matching spaces
[: Upper:]: Match any uppercase letter
[: Lower:]: Match any lowercase letter
[:p UNCT:]: Match any punctuation
[: Xdigit:]: Match any 16 binary digits

In addition, the following special characters, after escaping the symbol "/" escape, represent the following meanings:

S: Match a single spaces
S: Used to match all characters except a single spaces.
D: Used to match numbers from 0 to 9, equivalent to "/[0-9]/".
W: Used to match letters, numbers, or underscore characters, equivalent to "/[a-za-z0-9_]/".
W: used to match all characters that do not match W, equivalent to "/[^a-za-z0-9_]/".
D: Used to match any numeric character that is not in the 10 binary.
.: Used to match all characters except line breaks, if decorated with modifiers "s", "." can represent any character.

Using the above special characters can be very convenient to express some of the more cumbersome pattern matching. For example, "//d0000/" uses the above regular expression to match more than 100,001 integer strings.

Position character:

A positional character is another very important character in a regular expression, and its main function is to describe the position of a character in a matching object.

^: The pattern that represents the match appears at the beginning of the matching object (and differs in "[]")

$: Indicates a matching pattern appears at the end of the matched object

Spaces: One of the two boundaries that represent matching patterns that are now beginning and ending

"/^he/": You can match a string that starts with the "he" character, such as Hello, height, etc.

"/he$/": can match with "he" character the end of the string that she, etc.;

"/he/": The beginning of a space, and the function of ^, matches a string beginning with he;

"/he/": The space ends, and the function of $, matches the string ending with he;

"/^he$/": represents only matches the string "he".

Brackets:

In addition to being able to match a user, a regular expression can be used to record the required information in parentheses "()", to be stored and read to the subsequent expression. Like what:

/^ ([a-za-z0-9_-]+) @ ([a-za-z0-9_-]+) (. [ A-za-z0-9_-]) $/

is to record the e-mail address of the user name, and the e-mail address of the server address (in the form of username@server.com, and so on), in the following if you want to read the recorded string, just need to use the "escape character + record order" to read. For example "/1" is equivalent to the first "[a-za-z0-9_-]+", "/2" equivalent to the second ([a-za-z0-9_-]+), "/3" is the third (. [ A-za-z0-9_-]). But in PHP, "/" is a special character that needs to be escaped, so "//1" should be written in the PHP expression.

Other special symbols:
"|" : or symbol "|" and PHP inside or the same, but a "|", not php two "| |"! This means that it can be a character or another string, such as "/abcd|dcba/" that might match "ABCD" or "DCBA".


5 Greedy Mode:

Previously mentioned "?" in the meta character. There is also an important role in the "greedy model", what is the "greedy model"?

Like we're going to match a string that ends with the letter "a" at the beginning of the letter "B", but the string that needs to be matched has a lot of "B" after "a", such as "a bbbbbbbbbbbbbbbbb", and that regular expression matches the first "B" or the Last "B"? If you use greedy mode, it will match to the last "B", and vice versa only to the first "B".

Expressions that use greedy mode are as follows:
/a.+?b/
/a.+b/u
The following are not used in greedy mode:
/a.+b/
The above uses a modifier u, as described in the following section.

6 Modifiers:
Modifiers inside regular expressions can change many of the regular features, making regular expressions more appropriate for your needs (note: Modifiers are sensitive to capitalization, which means "E" is not equal to "E"). The modifiers in the regular expression are as follows:

I: If you add "I" to the modifier, it will remove the case sensitivity, that is, "a" and "a" are the same.

m: The default regular start "^" and end "$" just for regular strings if you add "M" to the modifier, the start and end will refer to each line of the string: "^" at the beginning of each line, and "$" at the end.

S: If you add "s" to the modifier, the default "." Any character that represents anything other than a line break will become any character, including a newline character!

x: If you add the modifier, the white space character in the expression will be ignored unless it has been escaped.

e: This modifier is only useful for replacement, and represents the PHP code in replacement.

A: If you use this modifier, the expression must be the beginning of the matching string. For example, "/a/a" matches "ABCD".

E: In contrast to "M", if this modifier is used, then "$" matches the end of the absolute string, rather than the newline character, which is turned on by default.

U: Similar to the question mark, used to set "greedy mode."

7 Pcre-related regular expression functions:
PHP's Perl-compatible regular expressions provide multiple functions, including pattern matching, substitution and matching numbers, and so on:

1, Preg_match:
function format: int preg_match (string pattern, string subject, array [matches]);
This function matches the pattern expression in string, and if [regs] is given, the string is recorded in [Regs][0], and [Regs][1] represents the first string that is recorded with parentheses "()", [regs][2] Represents the second string that is recorded, and so on. Preg returns "true" if a matching pattern is found in string, otherwise returns "false".

2, Preg_replace:
function format: Mixed preg_replace (mixed pattern, mixed replacement, mixed subject);
This function replaces all strings that match the expression pattern in string with an expression replacement. If you need to include part of the pattern in the replacement, you can use "()" to record that the replacement only needs to be read with "/1".

3, Preg_split:
function format: Array preg_split (string pattern, string subject, int [limit]);
This function is the same as the function split, and the difference is that only with split you can use a simple regular expression to split a matching string, while Preg_split uses a full Perl-compatible regular expression. The third parameter, limit, represents how many eligible values are allowed to be returned.

4, Preg_grep:
function format: Array preg_grep (string patern, array input);
This function and the Preg_match function are basically, but Preg_grep can match all the elements in the given array input and return a new array.

Here's an example, for example, to check that the email address is in the correct format:

Copy Code code as follows:

<?php
function Emailisright ($email) {
if (Preg_match (^[_/.0-9a-z-]+@) ([0-9a-z][0-9a-z-]+/.) +[a-z]{2,3}$ ", $email)) {
return 1;
}
return 0;
}
if (emailisright (' y10k@963.net ')) echo ' correct <br> ';
if (!emailisright (' y10k@fffff ')) echo ' Incorrect <br> ';
?>

The above program will output "correct <br> incorrect".

The difference between Perl-compatible regular expressions and perl/ereg regular expressions in 8.PHP:

Although called "Perl-compatible regular expressions," PHP is different from Perl's regular expressions, such as the modifier "G", which represents all matches in Perl, but does not include support for this modifier in PHP.

There is the difference with the Ereg series functions, Ereg is also provided in PHP regular expression functions, but compared with preg, much weaker.

1, Ereg inside is does not need also cannot use the separator and the modifier, therefore the EREG function is weaker than the preg to be many.

2, about "." : The point in the regular inside is generally except the line break character all characters, but in Ereg inside "." Is any character, including line breaks! If you want "." In Preg. Can include a newline character, which adds "s" to the modifier.

3, Ereg By default use greedy mode, and can not modify, this gives a lot of substitution and match bring trouble.

4, Speed: This may be a lot of people concerned about the problem, will not preg powerful is to speed in exchange for? Do not worry, preg speed is much faster than Ereg, the author has done a program test:

Time Test:

PHP Code:

Copy Code code as follows:

<?php
echo "Preg_replace used time:";
$start = time ();
For ($i =1 $i <=100000; $i + +) {
$str = "Ssssssssssssssssssssssssssss";
Preg_replace ("/s/", "", $str);
}
$ended = Time ()-$start;
Echo $ended;
echo "
Ereg_replace used time: ";
$start = time ();
For ($i =1 $i <=100000; $i + +) {
$str = "Ssssssssssssssssssssssssssss";
Ereg_replace ("s", "", $str);
}
$ended = Time ()-$start;
Echo $ended;
echo "
Str_replace used time: ";
$start = time ();
For ($i =1 $i <=100000; $i + +) {
$str = "Sssssssssssssssssssssssssssss";
Str_replace ("s", "", $str);
}
$ended = Time ()-$start;
Echo $ended;
?>

Results:
Preg_replace used Time:5
Ereg_replace used Time:15
Str_replace used Time:2

Str_replace because there is no need to match so the speed is very fast, and preg_replace speed is much faster than ereg_replace.

9. With regard to PHP3.0 support for Preg:
Preg support was added by default in PHP 4.0, but not in 3.0. If you want to use the Preg function in 3.0, you must load the Php3_pcre.dll file, just add "extension = Php3_pcre.dll" to the extension section of php.ini and start PHP again!

In fact, regular expressions are often used for Ubbcode implementations, many PHP forums use this method (such as Zforum zphp.com or VB vbullent.com), but the specific code is relatively long.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.