[Review] Regular expressions

Source: Internet
Author: User
Tags printable characters

Regular expression is the last semester teacher spent a lecture on the content, more useful and difficult to remember, so review the regular expression of the content.

Why do you study regular expressions? It is a tool for text matching that easily captures what you need to parse the information contained in a piece of content.

1. Getting Started

1.1 First list Some examples of regular expressions:\d+(matches one or more numbers, + represents one or more.) )

Hi (Directly match hi to the string, whether it's a separate or a different word) \bhi\b (\b meaning the beginning or end of a word, does not match the space between words, is just a position)

. *(. Means match any character, * represents 0 or more,. * Meaning to match any number of characters)

                              style=           0\d\d-\d\d\d\d\d\d\d\d (on behalf of three-bit area code plus-after 8 digits of phone number, three-bit or four-bit area code: 0\d\d\d?-\d\d\d\d\d\d\d\d, here? represents 0 or one)

\d{2 }(\d must occur 2 times) \d{1,2}(\d appears at least once, up to 2 times) /c7>

[1-9] (match a number, equivalent to \d)

1.2 Test Regular expression: http://tool.oschina.net/regex#

1.3 Special code: like. \d {}, etc.

1.4 Non-printable characters

character meaning
\cx Matches the control character indicated by X. For example, \cm matches a control-m or carriage return. The value of x must be one of a-Z or a-Z. Otherwise, c is treated as a literal ' C ' character.
\f Matches a page break. Equivalent to \x0c and \CL.
\ n Matches a line break. Equivalent to \x0a and \CJ.
\ r Matches a carriage return character. Equivalent to \x0d and \cm.
\s Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v].
\ t Matches a tab character. Equivalent to \x09 and \ci.
\v Matches a vertical tab. Equivalent to \x0b and \ck.

1.5 Qualifiers

character Description
* Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * Equivalent to {0,}.
+ Matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but not "Z". + equivalent to {1,}.
? Matches the preceding subexpression 0 or one time. For example, "Do (es)?" can match "do" in "do" or "does".? Equivalent to {0,1}.
N N is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '.
{N,} N is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ', but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '.
{N,m} Both M and n are non-negative integers, where n <= m. Matches at least n times and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' O? '. Note that there can be no spaces between a comma and two numbers.

You cannot use qualifiers on \b.

1.6 Character escapes: When you look for the special code itself, you need to add a \ to the code, such as find (you should use \ (lookup).

1.7 Effect of three kinds of parentheses

Since the regular expression syntax is complex, I have summed up my own use of parentheses, brackets, and curly braces for easy comprehension.

parentheses : 1. Grouping: use parentheses to specify a subexpression, and the sub-expression in parentheses defaults to group-by-A-,...... For example, I want to match repeated words:\b\w+\b\s+\1\b here \b\w+\b means a word or number greater than or equal to 1,\s+\1\b means a number of whitespace characters plus the first set of sub-expressions above. Define Group name : (? <word>\w+) named \w+ Word. The parentheses have a side effect that the relevant match is cached and available at this time?: Put the first option to eliminate this side effect. That is (?: \ w+) | (\d+) The result that the previous option matches to will not be cached.

2. Location designation: (? =exp) is also called the 0 wide antecedent assertion, which matches certain positions in the text, where the back of these positions can match the given suffix exp. For example \b\w+ (? =ing\b), matches the previous part of the word in ing (except for the part of ING), if you are looking for I ' m singing while you ' re dancing., it will match the sing and Danc.

(? <=exp) is also called the 0 wide-down assertion, which matches certain positions in the text, where the prefix of these positions matches exp. For example (<=\bre) \w+\b matches the second half of a word that begins with re (except for parts other than re), such as when looking for reading a book, which matches ading.

0 wide Negative lookahead assertion (?! EXP), will only match the location where the suffix exp does not exist. \d{3} (?! \d) matches a three-digit number, and the three-digit number cannot be followed by a number.

Similarly, we can use the (? <!exp), 0-wide negative back row assertion to find the prefix where exp does not exist:(? <![ A-z] \d{7} matches a seven-digit number that was not preceded by a lowercase letter (an error was found in the experiment?) Note whether your "case-sensitive" first item is selected).

3. Comment function: Another use of parentheses is to have a syntax (? #comment) to contain comments.

brackets : The brackets are usually characters, and special symbols lose their effect in the brackets. [Aeiou] matches any vowel letter, [*+?] Match * or + or?. [1-9] matches any number. [1-9] {2} matches two digits.

Curly braces:{n} repeats n times, {n,} repeats n or more times, {n,m} repeats n times to M times.

1.8 Greed and laziness: A.*b matches a string at the end of a beginning B, as long as possible. A.*?b matches as few characters as possible. If there are character Aabab, the greedy match will match the Aabab, and the lazy match will match aab,ab.

1.9 Anti-righteousness:

Table 3. Commonly used antisense code
Code/Syntax Description
\w Match any character that is not a letter or a number
\s Match any character that is not a whitespace character
\d Match any non-numeric character
\b Match a position that is not the beginning or end of a word
[^x] Matches any character except X
[^aeiou] Matches any character except for the letters AEIOU
2. Apply (1) test a pattern for the string. For example, you can enter a string to test to see if there is a phone number pattern or a credit card pattern in the string, which becomes the validity test of the data.
(2) Replace the text. You can use a regular expression in your document to represent specific words, and then you can delete them all or replace them with other text.
(3) Extracts a substring from a string based on pattern matching. Can be used to find specific text in text or input fields.

A regular expression is a text pattern consisting of ordinary characters, such as characters A through z, and special characters (called metacharacters). This pattern describes one or more strings to match when looking up a text body. A regular expression, as a template, matches a character pattern to the string you are searching for.

[Review] Regular expressions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.