PHP half Hour Proficient regular expression _ regular expression

Source: Internet
Author: User
Tags ereg lowercase regular expression
Presumably many people have a headache for regular expressions. Today, I have my knowledge, add some articles on the Internet, I hope to use ordinary people can understand the way of expression. To share learning experience with you.

Opening, still have to say ^ and $ they are used to match the start and end of a string, respectively, as follows:

"^the": The beginning must have "the" string;
"Of despair$": The end must have a string of "of despair";

"^abc$": a string that requires the beginning of ABC and the end of ABC, which is actually only an ABC match;
"Notice": matches a string containing notice;

You can see if you don't use the two characters we mentioned (the last example), which means that the pattern (regular expression) can appear anywhere in the string being tested, you don't lock him on either side.

Then, say ' * ' + ' and '? '
They are used to indicate the number of times a character can appear or the order they represent:
"Zero or more" equals {0,}
"One or more" equals {1,}
"Zero or one." Equivalent to {0,1}

Here are some examples:

"ab*": and Ab{0,} synonymous, matching with a start, followed by 0 or n B of the string ("a", "AB", "abbb", etc.);
"ab+": and Ab{1,} synonymous with the same as above, but at least a B exists ("AB" "abbb", etc.);
"Ab?" : and ab{0,1} synonymous, can not or only a B;
"a?b+$": a string that matches the end of one or 0 a plus more than one B.

Main points: ' * ' + ' and '? ' just the character in front of it.

You can also limit the number of characters appearing in curly braces, such as:

"Ab{2}": Require a must be followed by two B (one also can not be less) ("ABB");
"Ab{2,}": Require a must have two or more than two B (such as "ABB" "abbbb", etc.);
' ab{3,5} ': Requires that a 2-5 B ("abbb", "abbbb", or "abbbbb") be followed by a.

Now we put a few characters into parentheses, such as:

"A (BC) *": Match a followed by 0 or a "BC";
"A (BC) {1,5}": one to 5 "BC";

There is also a character ' | ', which is equivalent to an OR operation:

"Hi|hello": matches a string containing "hi" or "hello";
"(B|CD) EF": Matches a string containing "bef" or "cdef";
"(a|b) *c": matches a string containing such multiple (including 0) A or B, followed by a C;

A point ('. ') can represent all single characters, excluding "\ n"

What if you want to match all the individual characters, including "\ n"?

with ' [\ n.] ' This pattern.

"A.[0-9]": A plus one character plus a number of 0 to 9;
"^. {3}$: Three any character end.

The content enclosed in brackets matches only one single character

"[AB]": Match a single A or B (as with "a│b");
[a-d]: a single character that matches ' a ' to ' d ' (same as ' a│b│c│d ' and ' [ABCD] ' effect);

Generally we use [a-za-z] to specify characters for a single case in English:

"^[a-za-z]": matches a string that begins with the uppercase and lowercase letters;
"[0-9]%": matches a string containing a shape such as x percent;
", [a-za-z0-9]$": A string that matches a comma with a number or letter ending;

You can also put you don't want to get the character column in brackets, you just need to use ' ^ ' in the parentheses as the opening "%[^a-za-z]%" match contains two percent signs inside a non-alphanumeric string.

Important: ^ When used at the beginning of brackets, the characters in parentheses are excluded.

For PHP to be able to explain, you have to "add" around these character faces and escape some characters.

Don't forget that the characters inside the brackets are exceptions to this rule road-inside the brackets, all the special characters, including ('), will lose their special properties "[*\+?{}.]" Matches a string containing these characters:

And, as REGX's Handbook tells us: "If the list contains '," It's best to use it as the first character in the table (probably following ' ^ '). If it contains '-', it is best to put it on the front or the last side, or or the second end of a range [a-d-0-9] in the middle of '-' will be valid.

Looking at the above example, you should understand {n,m}. Note that both N and m cannot be negative integers, and n is always less than M. In this way, you can match at least N times and match up to M times. such as "p{1,5}" will match the first five p in "PVPPPPPP"

Let's talk about the beginning of the

\b said he was used to match a word boundary, which is ... such as ' ve\b ', can match the love of VE and do not match very have ve

\b is exactly the opposite of the \b above. I'm not going to give you an example.

..... It suddenly occurred to me that .... You can go to to see other syntax that starts with \

OK, let's do an application: How to build a pattern to match the amount of money entered.

Build a matching pattern to check whether the information entered is a number that represents money. We think there are four ways to represent money: "10000.00" and "10,000.00", or no decimal parts, "10000" and "10,000". Now let's start building this matching pattern:


This is the variable that must begin with a number other than 0. But it also means that a single "0" cannot pass the test. Here's how to fix it:

^ (0| [1-9] [0-9]*) $

"Only 0 matches the number that does not start with 0", we can also allow a minus sign before the number:

^ (0|-?) [1-9] [0-9]*) $

This is: 0 or a number that starts with 0 and may have a minus sign in front of it. Well, now let's not be so rigorous, allow 0 to begin with. Now let's give up the minus sign because we don't need to use it when it comes to coins. We now specify the pattern to match the decimal part:

^[0-9]+ (\.[ 0-9]+)? $

This implies that the matching string must begin with at least one Arabic numeral. But note that in the above mode "10." is not a match, only "10" and "10.2" can be, you know why?

^[0-9]+ (\.[ 0-9]{2})? $

We have to specify two decimal digits after the decimal point. If you think this is too harsh, you can change it to:

^[0-9]+ (\.[ 0-9]{1,2})? $

This will allow one to two characters after the decimal point. Now we add commas to increase readability (every three digits), and we can say this:

^[0-9]{1,3} (, [0-9]{3}) * (\.[ 0-9]{1,2})? $

Don't forget that ' + ' can be replaced if you want to allow a blank string to be entered, do not forget that the backslash ' \ ' may have errors in the PHP string (a common error):

Now that we can confirm the string, we're going to remove all the commas str_replace (",", "", $money) and then look at the type as a double and then we can do the math by him.

One more:

Construct regular expressions to check email

There are three parts in a full email address:

1. Username (everything on the ' @ ' left)
2. ' @ '
3. Server name (that's the rest of the section)

User names can contain uppercase and lowercase Arabic numerals, periods ('. ') Minus sign ('-') and underscore ' _ '. The server name also complies with this rule, with the exception of the underscore.

Now, the start and end of a username cannot be a period, as is the case with the server. And you can't have two consecutive periods there's at least one character between them, okay now let's take a look at how to write a matching pattern for the user name:


It is not possible to allow a period to exist yet. We add it to:

^[_a-za-z0-9-]+ (\.[ _a-za-z0-9-]+) *$

The above means: start with at least one canonical character (except.) followed by 0 or more strings starting with dots.

To simplify, we can use EREGI () instead of Ereg (), eregi () is not sensitive to case, we do not need to specify two range "A-Z" and "A-Z" only need to specify one on it:

^[_a-z0-9-]+ (\.[ _a-z0-9-]+) *$

The following server name is the same, but to remove the underline:

^[a-z0-9-]+ (\.[ a-z0-9-]+) *$

Good. Now just use the "@" to connect the two parts:

^[_a-z0-9-]+ (\.[ _a-z0-9-]+) *@[a-z0-9-]+ (\.[ a-z0-9-]+) *$

This is the complete email authentication matching mode, only need to call:

Eregi ("^[_a-z0-9-]+" (\.[ _a-z0-9-]+) *@[a-z0-9-]+ (\.[ a-z0-9-]+) *$ ", $eamil)

We can get an email.

Other uses of regular expressions

Extract string

Ereg () and eregi () has an attribute that allows the user to extract part of the string through a regular expression (you can read the manual). For example, we want to extract the filename from Path/url, and the following code is what you need:

Ereg ("([^\\/]*) $", $PATHORURL, $regs);
echo $regs [1];

High-level substitution

Ereg_replace () and Eregi_replace () are also useful if we want to replace all the interval minus signs with commas:

Ereg_replace ("[\n\r\t]+", ",", Trim ($STR));

Finally, I have another series of check email regular expression to let you look at the article analysis:

"^[-!#$%&\ ' *+\\./0-9=?" A-z^_ ' a-z{|} ~]+'.' @'.' [-!#$%&\ ' *+\\/0-9=? A-z^_ ' a-z{|} ~]+\.'.' [-!#$%&\ ' *+\\./0-9=? A-z^_ ' a-z{|} ~]+$"

If you can easily read, then the purpose of this article is achieved.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.