Rules and interpretations of regular expressions

Last Update:2015-02-05 Source: Internet

Author: User

Tags alphabetic character ereg

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Regular expression rules

Character Description:

\: Marks the next character as a special character or literal. For example, "n" matches the character "n".

"\ n" matches the line break. The sequence "\" matches "" \ ("and" ("matches").

^: matches the start position of the input.

$: Matches the end of the input.

*: Matches the previous character 0 or more times. For example, "zo*" can Match "Z", "Zoo".

+: Matches the previous character one or more times. For example, "zo+" can Match "zoo", but does not match "Z".

? : matches the previous character 0 or one time. For example, "A?ve?" Can match "VE" in "Never".

.: matches any character other than the line break.

(pattern) matches the pattern and remembers the match. The matching substring can be used from the Matches collection as the result of Item [0] ... [n] obtained. If you want to match the parentheses character (and), you can use "\ (" or "\").

X|y: Matches x or Y. For example, "Z|food" can match "Z" or "food". "(z|f) Ood" matches "zoo" or "food".

{N}:n is a non-negative integer.} Matches exactly n times. For example, "o{2}" cannot match "O" in "Bob", but can match the first two o in "Foooood".

{N,}: N is a non-negative integer. Matches at least n times. For example, "o{2,}" does not match "O" in "Bob", but matches all o in "Foooood". "O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "o*".

{n,m}: M and n are non-negative integers. Matches at least n times, up to M times. For example, "o{1,3}" matches the top three o in "Fooooood". "o{0,1}" is equivalent to "O?".

[XYZ]: a character set. Matches one of the characters in the parentheses. For example, "[ABC]" matches "a" in "plain".

[^XYZ]: a negative character set. Matches any character that is not in this parenthesis. For example, "[^ABC]" can match "P" in "plain".

[A-Z]: represents a range of characters. Matches any character within the specified range. For example, "[A-z]" matches any lowercase alphabetic character between "a" and "Z".

[^m-z]: the negative character range. Matches characters that are not within the specified interval. For example, "[M-z]" matches any character that is not between "M" to "Z".

\b: Matches the boundary of the word, which is the position between the word and the space. For example, "er\b" matches "er" in "never", but does not match "er" in "verb".

\b: Matches a non-word boundary. "ea*r\b" matches "ear" in "Never early".

\d: Matches a numeric character. equivalent to [0-9].

\d: Matches non-numeric characters. equivalent to [^0-9].

\f: Matches the page break.
\
N: matches the line break character.

\ r: Matches the carriage return character.

\s: Matches any white-space character, including spaces, tabs, page breaks, and so on. Equivalent to "[\f\n\r\t\v]".

\s: Matches any non-whitespace character. Equivalent to "[^ \f\n\r\t\v]".

\ t: matches the tab.

\v: Matches the Vertical tab.

\w: Matches any word character, including underscores. Equivalent to "[a-za-z0-9_]".

\w: Matches any non-word character. Equivalent to "[^a-za-z0-9_]".

\num: Matches num, where num is a positive integer. Reference back to the remembered match. For example, "(.) \1 "matches two consecutive identical characters.

\ nthe match n, where N is an octal code-break value. The octal code value must be 1, 2, or 3 digits long. For example, both "\11" and "\011" are matched with a tab character. "\0011" is equivalent to "\001" and "1". Octal code value must not exceed 256. Otherwise, only the first two characters are considered part of an expression. Allows ASCII code to be used in regular expressions.

\xn: Matches n, where n is a hexadecimal code-break value. The hexadecimal code value must be exactly two digits long. For example, "\x41" matches "A". "\x041" is equivalent to "\x04" and "1". Allows ASCII code to be used in regular expressions.

RegularexpressionvalidatOr there are two main properties for validation. The controltovalidate contains a value for validation. If the value in the text box is removed. such as controltovalidate= "TextBox1" validationexpression contains a regular expression for validation. Well, with the above narrative, let's give an example to illustrate the regular expression. For example, if we want to verify the e-mail that the user enters, what kind of data is considered a legitimate email? I can enter this:[email protected], of course, I will also enter:[email protected], but such input is illegal: [email protected] @com. cn or @xxx.com.cn, etc., so we conclude that a legitimate e-mail address should meet at least the following conditions:

1. Must contain one and only one symbol "@"

2. The first character must not be "@" or "."

3. "@" is not allowed to appear. or [email protected] 4. The end must not be the character "@" or "."

So based on the above principles and the syntax in the table above, it is easy to get the template as follows:

"=" ^\w+ ((-\w+) | ( \.\w+)) *\@[a-za-z0-9]+ ((\.| -) [a-za-z0-9]+] *\. [a-za-z0-9]+$]
A lot of people have a headache with regular expressions. Today, I know, plus some online articles, hoping to use ordinary people can understand the expression. To share your learning experience.

At the beginning, still have to say ^ and $ they are respectively used to match the start and end of the string, the following examples illustrate
"^the": The beginning must have "the" string;
"Of despair$": Must end with a string of "of despair";
So
"^abc$": A string that requires ABC to start and end with ABC, which is actually only ABC match
"Notice": matches a string containing notice
You can see if you're not using the two characters we mentioned (the last one), which means that the pattern (regular expression) can appear anywhere in the checked string, and you don't lock him to either side.
Then, say ' * ', ' + ', and '? ',
They are used to indicate the number or order in which a character can appear. They respectively said:
"Zero or more" equals {0,},
"One or more" equals {1,},
"Zero or one." Equivalent to {0,1}, here are some examples:
"ab*": Synonymous with ab{0,}, match with A, followed by a string of 0 or N B ("a", "AB", "abbb", etc.);
"ab+": Synonymous with Ab{1,}, same as above, but at least one B exists ("AB", "abbb", etc.);
"AB": Synonymous with ab{0,1}, can have no or only one B;
"a?b+$": matches a string that ends with one or 0 a plus more than one B.
Key points, ' * ', ' + ', and '? ' Just the character in front of it.
You can also limit the number of characters that appear in curly braces, such as
"Ab{2}": Requires a must be followed by two B (one can not be less) ("ABB");
"Ab{2,}": Requires a must have two or more than two B (such as "ABB", "abbbb", etc.);
"ab{3,5}": Requires a can have 2-5 B ("abbb", "abbbb", or "abbbbb") after a.
Now let's put a few characters into parentheses, for example:
"A (BC) *": Match a followed by 0 or a "BC";
"A (BC) {1,5}": one to 5 "BC."
There is also a character ' │ ', equivalent to or operation:
"Hi│hello": matches a string containing "hi" or "hello";
"(B│CD) EF": Matches a string containing "bef" or "cdef";
"(a│b) *c": Match contains so many (including 0) A or B, followed by a C
The string;
A point ('. ') can represent all single characters, not including "\ n"
What if you want to match all of the individual characters, including "\ n"?
Yes, with ' [\ n.] ' This mode.
"A.[0-9]": A plus one character plus a number 0 to 9
"^. {3}$ ": three characters end.
Bracketed content matches only one single character
"[AB]": matches a single A or B (as with "a│b");
"[A-d]": a single character matching ' a ' to ' d ' (same as "a│b│c│d" and "[ABCD]"); In general, we use [a-za-z] to specify a character for a case in English
"^[a-za-z]": matches a string that begins with a case letter
"[0-9]%": matches a string containing the form x percent
", [a-za-z0-9]$": matches a string that ends with a comma plus a number or letter
You can also put the words you don't want to be in brackets, you just need to use ' ^ ' as the first "%[^a-za-z]%" to match the two percent sign containing a non-alphabetic string.
Important: ^ When you start with brackets, you exclude the characters in parentheses.
In order for PHP to be able to explain, you must add ' ' to these character faces and escape some characters.
Do not forget that the characters inside the brackets are exceptions to this rule-in brackets, all special characters, including (' '), will lose their special properties "[*\+?{}.]" Matches a string containing these characters.
Also, as REGX's Handbook tells us: "If the list contains '] ', it is best to use it as the first character in the table (possibly following the ' ^ '). If it contains '-', it is best to put it on the front or the last side, or the '-' in the middle of the second end of a range [a-d-0-9] will be valid.
Looking at the above example, you should understand {n,m}. Note that both N and m cannot be negative integers, and n is always less than M. This way, you can match at least n times and up to M times. such as "p{1,5}" will match the first five p in "PVPPPPPP"
Let's start with the following words.
\b The book says he is used to match a word boundary, that is ... such as ' ve\b ', can match love in the VE and does not match very has ve
\b is exactly the opposite of the \b above. I'm not going to give you an example.
..... It suddenly occurred to me that .... can go tohttp://www.phpv.net/article.php/251Look at the other syntax that starts with \
OK, let's do an application:
How to build a pattern to match the input of a currency quantity
Build a matching pattern to check whether the information entered is a number that represents money. We think that there are four ways to represent money: "10000.00" and "10,000.00", or there are no decimal parts, "10000" and "10,000". Now let's start building this matching pattern:
^[1-9][0-9]*$
This is the variable that must start with a number other than 0. But it also means that a single "0" cannot pass the test. Here's how to fix it:
^ (0│[1-9][0-9]*) $
"Only 0 and numbers not starting with 0 match", we can also allow a minus sign before the number:
^ (0│-? [1-9] [0-9]*) $
This is: "0 or a number that starts with 0 and may have a minus sign in front of it." Well, now let's not be so rigorous, allow to start with 0. Now let's give up the minus sign, because we don't need it when it comes to representing coins. We now specify the pattern to match the fractional part:
^[0-9]+ (\.[ 0-9]+)? $
This implies that the matched string must begin with at least one Arabic numeral. But note that in the above mode "10." is mismatched, only "10" and "10.2" can be. (Do you know why?)
^[0-9]+ (\.[ 0-9]{2})? $
We have to specify two decimal places after the decimal point. If you think this is too harsh, you can change it to:
^[0-9]+ (\.[ 0-9]{1,2})? $
This will allow one to two characters after the decimal point. Now we add a comma for readability (every three bits), so we can say:
^[0-9]{1,3} (, [0-9]{3}) * (\.[ 0-9]{1,2})? $
Don't forget that ' + ' can be replaced by ' * ' If you want to allow blank strings to be entered (why?). Also do not forget that the backslash ' \ ' may have errors (very common errors) in the PHP string.
Now that we can confirm the string, we now take all the commas out of Str_replace (",", "", $money) and then treat the type as a double and we can do the math with him.
One more:
Constructs a regular expression to check e-mail
There are three sections in a full email address:
1. User name (everything on the left of ' @ '),
2. ' @ ',
3. The name of the server (which is the remaining part).
The user name can contain uppercase and lowercase Arabic numerals, a period ('. '), minus ('-'), and an underscore ('_'). The server name also conforms to this rule, except of course the underscore.
Now, the start and end of the user name cannot be a period. The same is true for servers. And you can't have two consecutive periods. There is at least one character between them, so let's take a look at how to write a matching pattern for the user name:
^[_a-za-z0-9-]+$
There is no time to allow the period to exist. We add it to:
^[_a-za-z0-9-]+ (\.[ _a-za-z0-9-]+) *$
This means: "At least one canonical character (except.) begins, followed by 0 or more strings starting with a dot."
To make it simpler, we can replace Ereg () with eregi (). Eregi () is not case sensitive, we do not need to specify two ranges "A-Z" and "A-Z" – just specify one:
^[_a-z0-9-]+ (\.[ _a-z0-9-]+) *$
The following server name is the same, but to remove the underscore:
^[a-z0-9-]+ (\.[ a-z0-9-]+) *$
All right, now just use "@" to connect the two parts:
^[_a-z0-9-]+ (\.[ _a-z0-9-]+) *@[a-z0-9-]+ (\.[ a-z0-9-]+) *$
This is the complete email authentication matching mode, only need to call
Eregi (' ^[_a-z0-9-]+ (\.[ _a-z0-9-]+) *@[a-z0-9-]+ (\.[ a-z0-9-]+) *$ ', $eamil)
You can get an email.
Other uses of regular expressions
Extracting a string
Ereg () and eregi () have a feature that allows a user to extract a portion of a string from a regular expression (you can read the manual for specific usage). For example, we want to extract the file name from Path/url – the following code is what you need:
Ereg ("([^\\/]*) $", $PATHORURL, $regs);
echo $regs [1];
High-level substitution
Ereg_replace () and Eregi_replace () are also useful: if we want to replace all the interval minus signs with commas:
Ereg_replace ("[\n\r\t]+", ",", Trim ($STR));
Finally, I put another string of check email regular expression to see the article you to analyze.
"^[-!#$%&\ ' *+\\./0-9=?" A-z^_ ' a-z{|} ~]+ '. ' @‘.‘ [-!#$%&\ ' *+\\/0-9=? A-z^_ ' a-z{|} ~]+\. '. ' [-!#$%&\ ' *+\\./0-9=? A-z^_ ' a-z{|} ~]+$"

Rules and interpretations of regular expressions

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More