Half an hour of proficiency in regular expressions

Source: Internet
Author: User
Tags ereg expression lowercase regular expression require requires
Surely many people have a headache for regular expressions. Today, I have my knowledge, plus some articles on the Internet, I hope to be able to understand the average person's expression. To share the learning experience with you.

At the beginning, you have to say ^ and $ they are used to match the start and end of a string, respectively, as illustrated below


"^the": The beginning must have "the" string;
"Of despair$": The end must have a string of "of despair";

So
"^abc$": a string that requires the beginning of ABC and the end of ABC, which is actually only an ABC match
"Notice": matches a string containing notice


You can see if you don't use the two characters we mentioned (the last example), which means that the pattern (regular expression) can appear anywhere in the string being tested, you don't lock him on either side

Then, say ' * ', ' + ', and '? ',
They are used to indicate the number or order in which a character can occur. They said separately:
"Zero or more" equals {0,},
"One or more" equals {1,},
"Zero or one." Equivalent to {0,1}, here are some examples:


"ab*": and Ab{0,} synonymous, matching with a start, followed by 0 or n B of the string ("a", "AB", "abbb", etc.);
"ab+": and Ab{1,} synonymous with the same as above, but at least one B exists ("AB", "abbb", etc.);
"AB": Synonymous with ab{0,1}, can be without or only a B;
"a?b+$": a string that matches the end of one or 0 a plus more than one B.

Points, ' * ', ' + ', and '? ' Just the character in front of it.


You can also limit the number of characters appearing in curly braces, such as


"Ab{2}": Require a must be followed by two B (one also can not be less) ("ABB");
"Ab{2,}": Require a must have two or more than two B (such as "ABB", "abbbb", etc.);
' ab{3,5} ': Requires that a 2-5 B ("abbb", "abbbb", or "abbbbb") be followed by a.

Now we put a few characters into parentheses, such as:

"A (BC) *": Match a followed by 0 or a "BC";
"A (BC) {1,5}": one to 5 "BC."


There is also a character ' │ ', which is equivalent to an OR operation:


"Hi│hello": matches a string containing "hi" or "hello";
"(B│CD) EF": Matches a string containing "bef" or "cdef";
"(a│b) *c": matching contains such multiple (including 0) A or B, followed by a C
The string;


A point ('. ') can represent all single characters, excluding "\ n"

What if you want to match all the individual characters, including "\ n"?

Yes, with ' [\ n.] ' This pattern.


"A.[0-9]": A plus one character plus a number 0 to 9
"^. {3}$: Three any character end.


The content enclosed in brackets matches only one single character


"[AB]": Match a single A or B (as with "a│b");
[a-d]: a single character that matches ' a ' to ' d ' (same as ' a│b│c│d ' and ' [ABCD] ' effect); Generally we use [a-za-z] to specify characters for a single case of English
' ^[a-za-z] ': matches a string that starts with the uppercase and lowercase letters
' [0-9]% ': matches a string containing x-percent
", [a-za-z0-9]$": matches a string that ends with a comma plus a number or letter


You can also put you don't want to get the character column in brackets, you just need to use ' ^ ' in the parentheses as the opening "%[^a-za-z]%" match contains two percent signs inside a non-alphanumeric string.

Important: ^ With the beginning of brackets, it means excluding the characters in parentheses.

For PHP to be able to explain, you have to add "escape" to these character faces, and some words will be added.

Don't forget that the characters inside the brackets are exceptions to this rule road-inside the brackets, all the special characters, including ('), will lose their special properties "[*\+?{}.]" Matches a string containing these characters.

And, as REGX's Handbook tells us: "If the list contains '," It's best to use it as the first character in the table (probably following ' ^ '). If it contains '-', it is best to put it on the front or the last side, or or the second end of a range [a-d-0-9] in the middle of '-' will be valid.

Looking at the example above, you should understand {n,m}. Note that N and m cannot be negative integers, and that n is always less than M. In this way, you can match at least N times and match up to M times. such as "p{1,5}" will match the first five p in "PVPPPPPP"

Let's talk about the beginning of the

\b said he was used to match a word boundary, which is ... such as ' ve\b ', can match the love of VE and do not match very have ve

\b is exactly the opposite of the \b above. I'm not going to give you an example.

..... It suddenly occurred to me that .... You can go to http://www.phpv.net/article.php/251 to see other syntax that starts with \

Okay, let's do an application:

How to build a pattern to match the amount of money entered

Build a matching pattern to check whether the information entered is a number that represents money. We think there are four ways to represent money: "10000.00" and "10,000.00", or no decimal parts, "10000" and "10,000". Now let's start building this matching pattern:

^[1-9][0-9]*$

This is the variable that must begin with a number other than 0. But it also means that a single "0" cannot pass the test. Here's how to fix it:

^ (0│[1-9][0-9]*) $

"Only 0 matches the number that does not start with 0", we can also allow a minus sign before the number:

^ (0│-?) [1-9] [0-9]*) $

This is: "0 or a number that begins with 0 and May have a minus sign in front of it." Well, now let's not be so rigorous, allow 0 to begin with. Now let's give up the minus sign, because we don't need it when we're talking about coins. We now specify the pattern to match the decimal part:

^[0-9]+ (\.[ 0-9]+)? $

This implies that the matching string must begin with at least one Arabic numeral. Note, however, that in the above mode "10." is not matched, only "10" and "10.2" can be. (Do you know why?)

^[0-9]+ (\.[ 0-9]{2})? $

We specified above the decimal point must have two decimal digits. If you think this is too harsh, you can change it to:

^[0-9]+ (\.[ 0-9]{1,2})? $

This will allow one to two characters after the decimal point. Now we add commas to increase readability (every three digits), and we can say this:

^[0-9]{1,3} (, [0-9]{3}) * (\.[ 0-9]{1,2})? $

Don't forget that ' + ' can be replaced by ' * ' if you want to allow a blank string to be entered (why?). Also don't forget that the backslash ' \ ' may have errors in the PHP string (a common mistake).

Now that we can confirm the string, we're going to remove all the commas str_replace (",", "", $money) and then look at the type as a double and then we can do the math by him.

One more:

construct regular expressions to check email

There are three parts in a full email address:
1. Username (all on the left of ' @ '),
2. ' @ ',
3. Server name (that's the rest of the section).

User names can contain uppercase and lowercase Arabic numerals, periods ('. '), minus sign ('-'), and underline ('_'). The server name also complies with this rule, with the exception of the underscore.

Now, the start and end of a username cannot be a period. The same is true for servers. And you can't have two consecutive periods there's at least one character between them, okay now let's take a look at how to write a matching pattern for the user name:

^[_a-za-z0-9-]+$

It is not possible to allow a period to exist yet. We add it to:

^[_a-za-z0-9-]+ (\.[ _a-za-z0-9-]+) *$

The above means: "At least one canonical character (except.), followed by 0 or more strings starting with dots."

To simplify, we can replace Ereg () with eregi (). Eregi () is insensitive to case, we don't need to specify two ranges "A-Z" and "A-Z" – just specify one:

^[_a-z0-9-]+ (\.[ _a-z0-9-]+) *$

The following server name is the same, but to remove the underline:

^[a-z0-9-]+ (\.[ a-z0-9-]+) *$

OK, now you just need to connect the two parts with "@":

^[_a-z0-9-]+ (\.[ _a-z0-9-]+) *@[a-z0-9-]+ (\.[ a-z0-9-]+) *$


This is the complete email authentication matching mode, only need to call

Eregi (' ^[_a-z0-9-]+ (\.[ _a-z0-9-]+) *@[a-z0-9-]+ (\.[ a-z0-9-]+) *$ ', $eamil)

We can get an email.
Other uses of regular expressions

Extract string

Ereg () and eregi () has an attribute that allows the user to extract part of the string through a regular expression (you can read the manual). For example, we want to extract the filename from Path/url – The following code is what you need:

Ereg ("([^\\/]*) $", $PATHORURL, $regs);
echo $regs [1];

High-level substitution

Ereg_replace () and Eregi_replace () are also useful: if we want to replace all the interval minus signs with commas:


Ereg_replace ("[\n\r\t]+", ",", Trim ($STR));

Finally, I have another series of check email regular expression let you read the article analysis.

"^[-!#$%&\ ' *+\\./0-9=?" A-z^_ ' a-z{|} ~]+'.' @'.' [-!#$%&\ ' *+\\/0-9=? A-z^_ ' a-z{|} ~]+\.'.' [-!#$%&\ ' *+\\./0-9=? A-z^_ ' a-z{|} ~]+$"

If you can easily read, then the purpose of this article is achieved.

Another: If you find any errors in the above article, please correct me. If you want to reprint, please make sure there is link to this page



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.