Learn the basics of regular expressions

Source: Internet
Author: User
Tags character classes expression engine
This article describes the basic JavaScript regular expression, we are not familiar with JavaScript regular expression can be learned, speaking of JavaScript is the basis of the regular expression, unfamiliar classmates let's have a look!

Regular expression Learning (continuous update)

Today learning JavaScript when learning RegExp object, take the opportunity to learn the regular expression, had not contacted before, take the opportunity to learn a wave, very comfortable.

Reference website: Regular Expression 30-minute introductory tutorial click the Open link

1. What is a regular expression

A regular expression is a rule used to describe a string match.

2. Related concepts of regular expressions

2.1 Meta characters

Metacharacters is a special symbol that is specified in the regular expression and is added to the regular expression to replace some rules.

\b Indicates the beginning or end of a word
. Represents any character except a newline
* Represents any number of occurrences of this character, such as a *, indicating any number preceded by a (repeated 0 or more times)
+ The character that represents any number of preceding occurrences, such as A +, that represents an arbitrary number of a (repeated 1 or more times)
? Repeat 0 or 1 times
N Repeat n times
{N,} Repeat greater than or equal to n times
{N, m} Repeat N to M times
\d Represents a number of 0-9
\w Match letters or numbers or underscores or kanji
\s Matches any whitespace character, including spaces, tabs, line breaks, Chinese full-width spaces, and so on
^ Match the start of a string
& Match the end of a string

3. Simple Regular Expressions

Go straight from the example:

Example 1: When I want to match the word hello, the regular expression (matching rule) that is written is:Hello

This will match all the words that contain the Hello, such as: HelloWorld is also matched, but if you just want to match hello, you need to use the meta-character \b to break the hello back and forth to form a separate word hello, then the regular expression should be: \bhello\b

Example 2: When you want to find the prompt of any word followed by a world, you should use a meta-character. and *, the regular expression is \bhello\b.*\world\b

Example 3: When you want to match a phone number similar to 021-xxxxxxx, you should use 021-\d\d\d\d\d\d\d, which " 021- "is a simple character that does not represent any special meaning, and the \d used later is a meta-character. This regular expression can be abbreviated to 021-\D{7}, which represents the \d repeated 7 times.

Example 4: Matching 1 or more consecutive digits, \d+

Example 5: Match a word that starts with a, \ba\w*\b

Example 6: Match 5-12-digit QQ number, ^\d{5, 12}&

4. Character Escapes

If the string you want to find has metacharacters, you need to precede the metacharacters with \ To convert the metacharacters to normal characters.

5. Character classes

The problem with this part is that if you want to match characters without corresponding metacharacters, then we need to create a character class manually.

For example, if the number 0-9 does not match \d, then when we want to find 0-9 of any number, we can create a character class [0-9], which functions exactly the same as \d.

For example, the regular expression \ (? 0\d{2}[),-]?\d{8} can be used to match the phone number, in turn, to explain the \ ( on behalf of the pair (escaped,? indicates that it repeats 0 or 1 times,\d represents two numbers,[),-] and--The character class,? indicates that it repeats 0 or 1 times, followed by 8 digits.

6. Branching conditions

A regular expression such as \ (? 0\d{2}[),-]?\d{8} may match (01012345678 or (010-12345678) to an incorrect string, for which case a branching condition can be used, a branching condition with the logic in JS or | | Similar, and are all short-circuiting operators, from left to right match to a condition can be judged when the end.

For the above situation can be written as \ (0\d{2}\) \d{8}|0\d{2}-\d{8}|\ (0\d{2}\) \d{8}

7. Grouping

The part in order to resolve duplicates is not a single character, but rather a question of multiple characters. When repeating a single character, we can use the qualifier in the character + metacharacters, but when repeated characters are multiple, you can add () to the repeating character. For example, the following regular expression can be used to represent an IP address.

(2[0-4]\d|25[0-5]| [01]\d\d?\.) {3} (2[0-4]\d|25[0-5]| [01]?\d\d?]

8. Anti-righteousness

When you need to find characters that are not part of a simple definition, such as characters other than XXX, you need to use the inverse

\w

Matches any character that is not a letter, number, underscore, or kanji

\d Match any character that is not a number
\b Matches any character that is not the beginning or end of a word
\s Match any character that is not blank
[^x] Matches a character other than X
[^aeiou] Matches a character other than Aeiou

For example, the regular expression ^\s+& is used to match a character that does not contain white space characters string

9. Back To reference

The content of this section is matched to the previous groupings, and when we use () to group the characters, we can continue to refer to them by numbering, for grouping by (), grouping from 1 in the order in which they appear, such as regular expression \b (\w+) \s+\1\b, Can be used to match repeated occurrences of a word, such as go go, which is referred to earlier groupings by \1.

Other back reference syntaxes that are involved are:

(exp) Match exp, and capture the current content into an automatic grouping
(? <name>exp) Match exp, and capture the current content and assign the group name to name
(?: EXP) Match exp, do not assign group name to captured content

10.0 Wide Assertion

Used to find parts that are before or after a certain part of the content but do not include that content.

The regular expression (? =exp) means that the part that appears after the assertion can match the expression exp. For example, \b\w+ (? =ing\b) matches the front part of a word that ends with ING. For example, finding I ' m dancing and singing will match dance and sing (because there is no \w+ match for s).

the regular expression (? <=exp) means that the previous part of the assertion can match the expression exp. For example (? <=\bre) \w+\b matches the second half of a word that begins with re, such as finding reading matches to ading.

If you want to give a very long number every three bits plus a comma, for example, 123456789 plus a comma, you can use the regular expression ((? <=\d) \d{3}) +\b, find the result is 234567890 (this part of the search rule did not read ...) )

The following example uses both assert (? =<\s) \d+ (? =\s) to match the number between two whitespace characters, excluding whitespace characters.

Overall, the purpose of the 0 wide assertion is to determine the starting or ending point of a matching character in accordance with certain rules.

11. Negative 0 Wide Assertion

As mentioned earlier, use the inverse to find a character that is not or is not in a word character.

For example, if you want to find a word with the letter Q and the following is not U. May write \bq[^u]\w*\b. But for such an expression, the error occurs when Q appears at the end of the word because [^u] matches the word's spacer, which in turn matches the next word, which matches a string such as Iraq fighting.

To solve the problem of the usage of the inverse, we can use a negative 0 wide assertion, because it matches only one position and does not consume any characters. The above expression can be written as \BQ (?! u) \w*\b.

Similarly, we use (? <!exp) to match characters that are not previously exp, such as (? <![ A-z] \d{7} to match the 7 digits that are not preceded by the lowercase letter A-Z.

A more complex example: (?<=< (\w+) >). * (?=<\/\1>)

See the front (? <=) and the back (? =), you know both the zero-width assertion, and < (\w+) > represents the HTML tag, if the previous is

12. Notes

Include comments by syntax (? #comment), such as 2[0-4]\d (? #200-249).

13. Greed and laziness

When dealing with string matching problems, the usual behavior is to match as many characters as possible. The expression A.*b and string aabab, for example, match the Aabab, not the match AB, which is called a greedy match.

and sometimes we need to match the lazy match with as few characters as possible, and then we need to add the following qualifier. , such as A.*?b, the greedy match is converted to a lazy match, which matches to AaB (1-3 characters) and AB (4-5 characters) (specific reason relates to regular expression matching rules).

14. Processing Options

Similar to the flag inside JS, there are different case, multi-line mode, global mode and so on.

15. Balance group/recursive matching

In order to deal with a matching problem, such as in a mathematical expression (5*3)), this section cannot be simply written as \ (. *\), which matches the entire expression. Then the matching strategy should be similar to the learning of the parentheses matching problem, with the stack to solve, encountered (stack, encountered) the stack, if the last stack is empty, this indicates that the expression inside the parentheses exactly match, if not empty, the regular expression engine will backtrack to make the parentheses match.

Related recommendations:

How JS's regular expression is used

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.