Out-of-the-regular expressions

Last Update:2016-04-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Positive expression

Suppose you look for hi in an English novel, you can use regular expressions hi

\b is a special code (meta-character) of a regular expression that represents the beginning or end of a word, which is where the word is broken down. Although the commonly expected words are delimited by spaces, punctuation, or line breaks, \b does not match any one of these word-delimited characters, it only matches one position.

Example

\bhi\b.*\blucy\b = hi+ any character +lucy

0\D\D-\D\D\D\D\D\D\D\D=0\D{2}-\D{8}

Start with 0, then two digits, then a hyphen "-" and finally 8 characters

Regular Expression Test Tool

· Regexbuddy

· JavaScript Regular expression online test tool

\s matches any whitespace character, including spaces, tabs (tab), line breaks, Chinese Quanjiao spaces, and so on. \w matches characters or numbers or underscores or kanji.

\ba\w*\b matches a word that begins with the letter A – idle is to touch a word to start out (\b). Then there is the letter A, then any number of letters or numbers (\w*), and finally the end of the word (\b).

\d+ matches 1 or more consecutive digits. Here the + is similar to the meta-character, the difference is that the match repeats any number of times (possibly 0 times), and + matches repeats 1 or more times.

\b\w{6}\b matches exactly 6 characters of a word.

. Match any characters that are unexpected except for line breaks

\w match letters or numbers or underscores or kanji

\s matches any whitespace character

\d Matching numbers

\b Match the beginning or end of a word

^ Start of matching string

$ match End of string

such as a website if you need to fill in the QQ number must be 5 to 12 digits, you can use: ^\d{5,12}$

{5,12} The number of repetitions cannot be less than 5 times, not more than 12 times, or they do not match

If you want to find. or * or \ The problem arises: you can't specify them if you use \ to escape.

All qualifiers in the regular expression (specified number of codes, columns such as *,{5,12}, etc.):

Repeat 0 or more times
Repeat the shipment more than once

？ Repeat 0 or one time

{n} repeats n times

{n,} repeats n or more times

{N,m} repeats n to M times

Wnindows\d+ matches 1 or more digits behind windows

^\w+ matches the first word of a line (or the first word of the entire string, depending on the meaning of the option setting)

Example

  想要匹配没有预定义元字符的字符集合   比如元音字母（a,e,i,o,u）你只需要在放括号里列出他们就行了 [aeiou]

[.?!] Match punctuation (. or? or!)

[0-9] The meaning of the representation is exactly the same as \d: a number

Similarly [a-z0-9a-z] is also exactly the same as \w (if only English is considered)

The following is a more complex expression: (? 0\d{2}[)-]?\d{8}).

Let's do some analysis of it: first an escape character (that can occur 0 or 1 times (?), then a 0, followed by 2 digits (\d{2}), then a-or-or-or-space, it appears 1 times or does not appear (?), and finally 8 digits (\d{8}).

(010) 88886666, or 022-22334455, or 02912345678

But can also match 010) 123456789

Branch conditions:

 0\d{2}-\d{8} | 0 \d{3}-\d{7}这个表达式能匹配两种以连字号分隔的电话号码：

One is a three-bit area code, 8 is a Dorsett (010-12345678) Another 0101-1234567

(? 0\d{2})? [-]?\d{8}| 0\d{2}[-]?\d{8} This expression looks at itself.

\d{5}-\d{4}| \d{5} This expression is used to match the U.S. ZIP code. The rules of the U.S. ZIP Code are 5 digits, or 9 digits spaced with hyphens. The reason to give this example is because it illustrates a problem: when using branching conditions, be aware of the order of each condition. If you change it to \d{5}|\d{5}-\d{4} then it will only match the 5-bit ZIP code (and the top 5 digits of the 9-bit zip code). The reason is that when matching the branching conditions, each condition will be tested from left to right, and if a branch is satisfied, it will not be able to control the other conditions.

分组：我们已经提到了怎么重复单个字符（直接在字符后面加上限定符就行了） ： 但如果想要重复多个字符可以用小括号来指定子表达式(也叫分组)，然后1你就可以指定这个子表达式的重复次数了，你也可以对自表达式进行其他一些才做（后面会有介绍）

(\d{1,3}.) {3}\d{1,3} is a simple IP-address matching expression. To understand this expression, parse it in the following order: \d{1,3} matches a number from 1 to 3 digits, (\d{1,3}.) {3} matches three digits plus an English period (this whole is the group) repeats 3 times, and finally adds a one to three digits (\d{1,3}).

Each number in the IP address cannot be greater than 255. Often people ask me, 01.02.03.04 such a number in front with 0, is not the correct IP address? The answer is: Yes, the number in the IP address can contain a leading 0 (leading zeroes).

Unfortunately, it will also match 256.300.888.999, an IP address that cannot exist. If you can use arithmetic comparisons, you may be able to solve this problem simply, but the regular expression does not provide any function about mathematics, so you can only use a lengthy grouping, select, character class to describe the correct IP address: ((2[0-4]\d|25[0-5]|[ 01]?\d\d?).) {3} (2[0-4]\d|25[0-5]| [01]?\d\d?].

The key to understanding this expression is to understand 2[0-4]\d|25[0-5]| [01]?\d\d, here I will not elaborate, you should be able to analyze the meaning of it.

Anti-righteousness

something needs to be searched for characters that don't belong to a character class that can be simply defined. For example, if you want to find anything other than a digital accident, it is necessary to use the opposite justification:

\w matches any characters that are not letters, numbers, underscores, kanji

\s matches any character that is not a white letter

\d matches any non-numeric character

\b Match is not the beginning of the word or the location of the sleepover

[^x] matches any character except X

[^aeiou] matches any character except the letters AEIOU

Example

\s+ matches a string that does not shout blank

]+> matches a string that starts with a in angle brackets

Back to reference

使用小括号指定一个自表达式后，匹配这个自表达式的文本（也就是次分组捕获的内容）可以再表达式或其它程序中做进一步的处理。默认情况下，每个分组会自动拥有一个组号，规则是：从左到右，以分组的左括号为标志，第一个出现的分组的组号为1，第二个为2，以此类推

A back reference is used to repeat the search for text that precedes a grouping match. For example, \1 represents the text for grouping 1 matches.

\b (\w+) \b\s+\1\b can be used to match duplicate words, like go go, or kitty kitty the expression is first a word, that is, more than one letter or number between the beginning and end of the word (\b (\w+) \b), The word is captured in a group numbered 1, followed by 1 or more whitespace characters (\s+), and finally the content captured in Group 1 (that is, the first matching word) (\1).

You can also specify the group name of the sub-expression yourself. To specify a group name for a sub-expression, ask for this syntax:

(? \w+) (or replace the angle brackets with

' Also;(?

Word ' \w+) so that the \w+ group name is specified as Word. To reverse-reference the captured content of this group, you can use \k, so the previous case can also be written like this: \b (? \w+) \b\s+\k\b.

Common grouping syntax

Capture (exp) matches exp, and captures text into an automatically named group

    (?              匹配exp，并捕获文本到名称为name的组里，也可以写成    exp)     (?’name’ exp)      (?:exp)        匹配exp，不捕获匹配的文本，也不给此分组分配组号

0 Wide assertion (? =exp) matches the position of the exp front

       (?<=exp)      匹配exp后面的位置       (?!exp)        匹配后面跟的不是exp的位置       (?匹配前面不是exp的位置

Note (?) This type of grouping has no effect on the processing of regular expressions and is used to provide comments that let people read

The third (?: EXP) does not change the way regular expressions are handled, except that such groups of matching content are not captured in a group as in the first two, and do not have a group number. "Why would I want to do that?" "--good question, why do you think?"

0 Wide Assertion

The next four are used to find things that are behind the shipment of something (but not including it), meaning they want to \b,^,$ that to specify a location that should satisfy certain conditions (and assertions), so they are also known as collar-width assertions. It's best to take an example to illustrate it:

    (?=exp)也叫零宽度正预测先行断言，它断言自身出现的位置的后面能匹配表达式exp。比如\b\w+(?=ing\b)，匹配以ing结尾的单词的前面部分(除了ing以外的部分)，如查找I‘m singing while you‘re dancing.时，它会匹配sing和danc。

(? <=exp) also called 0 width is recalling the post assertion, which asserts that the front of the position itself appears to match the expression exp. For example (<=\bre) \w+\b matches the second half of a word that begins with re (except for parts other than re), such as when looking for reading a book, which matches ading.

If you want to add a comma to each of the three digits in a long number (of course, from the right), you can look for the part that needs to be preceded and added with a comma: ((? <=\d) \d{3}) +\b, which is 234567890 when it finds 1234567890.

The following example uses both of these assertions: (? <=\s) \d+ (? =\s) to match numbers separated by whitespace (again, these whitespace characters are not included).

Negative 0 Wide assertion:

We mentioned earlier how to find a method that is not a character or a character that is not in a character class (antisense). But what if we just want to make sure that a character doesn't appear, but doesn't want to match it? For example, if we want to find a word like this – it has the letter Q, but the q is not followed by the letter u, we can try this:

\b\w*q[^u]\w*\b matches a word that contains the letter Q that is not followed by the letter U. But if you do more testing (or if your mind is sharp enough to see it directly), you will find that if Q appears at the end of the word, like Iraq,benq, the expression will go wrong. This is because [^u] always matches a character, so if Q is the last character of the word, the subsequent [^u] will match the word delimiter after Q (possibly a space, or a period or something else), and the \w*\b will match the next word, so \b\w*q[^u]\w*\ B will be able to match the entire Iraq fighting. A negative 0-wide assertion solves this problem because it matches only one location and does not consume any characters. Now, we can solve this problem: \b\w*q (?! u) \w*\b.

0 width Negative lookahead assertion (?! EXP), asserts that after this position does not match the expression exp. For example: \d{3} (?! \d) matches three digits, and this three-digit number cannot be followed by a number; \b (?! ABC) \w) +\b matches words that do not contain continuous string ABC.

Similarly, we can use the (? 0 width Negative review post assertion to assert that the front of this position cannot match an expression Exp: (?

Out-of-the-regular expressions

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More