Regular expression is the last semester teacher spent a lecture on the content, more useful and difficult to remember, so review the regular expression of the content.
Why do you study regular expressions? It is a tool for text matching that easily captures what you need to parse the information contained in a piece of content.
1. Getting Started
1.1 First list Some examples of regular expressions:\d+(matches one or more numbers, + represents one or more.) )
Hi (Directly match hi to the string, whether it's a separate or a different word) \bhi\b (\b meaning the beginning or end of a word, does not match the space between words, is just a position)
. *(. Means match any character, * represents 0 or more,. * Meaning to match any number of characters)
style= 0\d\d-\d\d\d\d\d\d\d\d (on behalf of three-bit area code plus-after 8 digits of phone number, three-bit or four-bit area code: 0\d\d\d?-\d\d\d\d\d\d\d\d, here? represents 0 or one)
\d{2 }(\d must occur 2 times) \d{1,2}(\d appears at least once, up to 2 times) /c7>
[1-9] (match a number, equivalent to \d)
1.2 Test Regular expression: http://tool.oschina.net/regex#
1.3 Special code: like. \d {}, etc.
1.4 Non-printable characters
character |
meaning |
\cx |
Matches the control character indicated by X. For example, \cm matches a control-m or carriage return. The value of x must be one of a-Z or a-Z. Otherwise, c is treated as a literal ' C ' character. |
\f |
Matches a page break. Equivalent to \x0c and \CL. |
\ n |
Matches a line break. Equivalent to \x0a and \CJ. |
\ r |
Matches a carriage return character. Equivalent to \x0d and \cm. |
\s |
Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v]. |
\s |
Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v]. |
\ t |
Matches a tab character. Equivalent to \x09 and \ci. |
\v |
Matches a vertical tab. Equivalent to \x0b and \ck. |
1.5 Qualifiers
character |
Description |
* |
Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * Equivalent to {0,}. |
+ |
Matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but not "Z". + equivalent to {1,}. |
? |
Matches the preceding subexpression 0 or one time. For example, "Do (es)?" can match "do" in "do" or "does".? Equivalent to {0,1}. |
N |
N is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '. |
{N,} |
N is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ', but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '. |
{N,m} |
Both M and n are non-negative integers, where n <= m. Matches at least n times and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' O? '. Note that there can be no spaces between a comma and two numbers. |
You cannot use qualifiers on \b.
1.6 Character escapes: When you look for the special code itself, you need to add a \ to the code, such as find (you should use \ (lookup).
1.7 Effect of three kinds of parentheses
Since the regular expression syntax is complex, I have summed up my own use of parentheses, brackets, and curly braces for easy comprehension.
parentheses : 1. Grouping: use parentheses to specify a subexpression, and the sub-expression in parentheses defaults to group-by-A-,...... For example, I want to match repeated words:\b\w+\b\s+\1\b here \b\w+\b means a word or number greater than or equal to 1,\s+\1\b means a number of whitespace characters plus the first set of sub-expressions above. Define Group name : (? <word>\w+) named \w+ Word. The parentheses have a side effect that the relevant match is cached and available at this time?: Put the first option to eliminate this side effect. That is (?: \ w+) | (\d+) The result that the previous option matches to will not be cached.
2. Location designation: (? =exp) is also called the 0 wide antecedent assertion, which matches certain positions in the text, where the back of these positions can match the given suffix exp. For example \b\w+ (? =ing\b), matches the previous part of the word in ing (except for the part of ING), if you are looking for I ' m singing while you ' re dancing., it will match the sing and Danc.
(? <=exp) is also called the 0 wide-down assertion, which matches certain positions in the text, where the prefix of these positions matches exp. For example (<=\bre) \w+\b matches the second half of a word that begins with re (except for parts other than re), such as when looking for reading a book, which matches ading.
0 wide Negative lookahead assertion (?! EXP), will only match the location where the suffix exp does not exist. \d{3} (?! \d) matches a three-digit number, and the three-digit number cannot be followed by a number.
Similarly, we can use the (? <!exp), 0-wide negative back row assertion to find the prefix where exp does not exist:(? <![ A-z] \d{7} matches a seven-digit number that was not preceded by a lowercase letter (an error was found in the experiment?) Note whether your "case-sensitive" first item is selected).
3. Comment function: Another use of parentheses is to have a syntax (? #comment) to contain comments.
brackets : The brackets are usually characters, and special symbols lose their effect in the brackets. [Aeiou] matches any vowel letter, [*+?] Match * or + or?. [1-9] matches any number. [1-9] {2} matches two digits.
Curly braces:{n} repeats n times, {n,} repeats n or more times, {n,m} repeats n times to M times.
1.8 Greed and laziness: A.*b matches a string at the end of a beginning B, as long as possible. A.*?b matches as few characters as possible. If there are character Aabab, the greedy match will match the Aabab, and the lazy match will match aab,ab.
1.9 Anti-righteousness:
Table 3. Commonly used antisense code
Code/Syntax |
Description |
\w |
Match any character that is not a letter or a number |
\s |
Match any character that is not a whitespace character |
\d |
Match any non-numeric character |
\b |
Match a position that is not the beginning or end of a word |
[^x] |
Matches any character except X |
[^aeiou] |
Matches any character except for the letters AEIOU |
2. Apply (1) test a pattern for the string. For example, you can enter a string to test to see if there is a phone number pattern or a credit card pattern in the string, which becomes the validity test of the data.
(2) Replace the text. You can use a regular expression in your document to represent specific words, and then you can delete them all or replace them with other text.
(3) Extracts a substring from a string based on pattern matching. Can be used to find specific text in text or input fields.
A regular expression is a text pattern consisting of ordinary characters, such as characters A through z, and special characters (called metacharacters). This pattern describes one or more strings to match when looking up a text body. A regular expression, as a template, matches a character pattern to the string you are searching for.
[Review] Regular expressions