JavaScript advanced programming (version 3rd) Study Notes 12js Regular Expressions _ basic knowledge-js tutorial

Source: Internet
Author: User
When analyzing the source code of PhoneGap, we once summarized the usage of a regular expression. To ensure the integrity of different series of articles, we need to point out that, here we only summarize the commonly used and relatively simple syntaxes of regular expressions, instead of all syntaxes. In my opinion, mastering these commonly used syntaxes is enough to deal with daily applications. Regular expressions are not only applied in ECMAScript, but also in JAVA,. Net, Unix, etc. This article is based on the regular expressions in ECMAScript.

I. Regular Expression Basics

1. common characters: letters, numbers, underscores, Chinese characters, and all characters without special meanings, such as ABC123. When matching, match the same character.

2. special characters: (escape using the Backslash "" when necessary)

Character Description Character Description Character Description Character Description
\ Ringtone = \ x07 ^ Start position of matching string \ B Start or end of a matching word {N} Match n times
\ F Break = \ x0C $ End position of matching string \ B Match is not the start or end of a word {N ,} Match at least n times
\ N Line Break = \ x0A () Mark the start and end of a subexpression \ D Matching number {N, m} Match n to m times
\ R Carriage Return = \ x0D [] Custom character combination matching \ D Match any character that is not a number [0-9] Match any number from 0 to 9
\ T Tab = \ x09 {} Symbol of the number of matches \ S Match any blank characters [F-m] Match any letter from f to m
\ V Vertical tab = \ x0B . Match characters except line breaks \ S Match any non-blank characters
\ E ESC operator = \ x1B ? Match 0 or 1 time \ W Match letters, numbers, underscores, or Chinese Characters
\ XXX It is represented in two hexadecimal notation and can match the character of this number. + Match 1 or multiple times \ W Match any character that is not a letter, number, underline, or Chinese Character
\ UXXXX It is represented in four-digit hexadecimal notation and can match the character of this number. * Match 0 or multiple times [^ X] Match All characters except x
\ X {XXXXXX} Use any hexadecimal representation to match the characters of the specified number. | "Or" Relationship Between Expressions on both sides [^ Aeiou] Match All characters except aeiou

The special characters listed above can be roughly divided:

(1) inconvenient to write characters, such as bell rings (a), page breaks (f), line breaks (), carriage returns (), tabs (), and ESC characters (\ e)

(2) hexadecimal characters, such as two (\ x02), four (\ x012B), and any (x {A34D1 })

(3) represents the location character, such as string start (^), string end ($), word start and end (), word Center (\ B)

(4) number of characters: for example, 0 or 1 (?) , 1 or multiple times (+), 0 or multiple times (*), n times ({n}), and at least n times ({n ,}), n to m times ({n, m })

(5) modifier characters: such as modifier count ({}), custom combination match ([]), subexpression (())

(6) Negative characters:

(A) via case-insensitive: such as and B, d and D, s and S, w and W

(B) via [^] antsense: for example, [^ x], [^ aeiou]

(C) Other special cases: such as and. also constitute assignative.

(7) range characters: for example, numbers ([0-9]) and letters ([f-m])

(8) logical characters: such as representation or (|)

3. Escape

(1) escape a single character with the Backslash ""

(2) Use the "Q... \ E" escape to take all the characters in the expression as common characters

(3) Use the Escape Character "U... \ E" to take all the characters in the expression as common characters, and convert the lowercase letters into uppercase and lowercase letters for matching.

(4) use the Escape Character "L... \ E" to take all the characters in the expression as common characters and convert the uppercase letters to lowercase letters for matching.

4. Greedy and lazy Modes

If a regular expression contains a number of characters, it usually matches as many characters as possible. For example, if l * n is used to match linjisong, it matches linjison instead of lin, this mode is the greedy mode of the regular expression. Correspondingly, you can add the character "?" To set it to the lazy mode, that is, to match as few characters as possible. For example *? Indicates that the request is repeated 0 or multiple times, but as few as possible.

5. Group and reverse reference

(1) enclose an expression with parentheses (), so that the expression can be processed as a whole to achieve the goal of grouping.

(2) by default, each group will automatically obtain a group number, numbered from 1 to backward in the order of left parentheses.

(3) when processing, the engine saves the content that matches the internal expression in parentheses to facilitate further processing during or after matching, you can use the backslash and group number to reference this content. For example, 1 indicates the text matching the first group.

(4) You can also customize the group name. The syntax is (? Exp). \ k can also be used for reverse reference. .

(5) You can also do not save the Matching content or allocate group numbers. The syntax is (? : Exp ).

(6) parentheses have some other special syntaxes, which are listed here and will not be discussed in depth:

Category Code/syntax Description
Capture (Exp) Match exp and capture text to automatically named group
(? Exp) Match exp and capture the text to the group named name. You can also write (? 'Name' exp)
(? : Exp) Matches exp, does not capture matched text, and does not assign group numbers to this group
Assertion with Zero Width (? = Exp) Match the position before exp
(? <= Exp) Match position after exp
(?! Exp) The position behind matching is not exp
(? Match the position that is not exp
Note (? # Comment) This type of grouping does not affect the processing of regular expressions. It is used to provide comments for reading.

At this point, it is sufficient to understand common regular expressions. If you want to continue learning regular expressions, refer to the regular expression 30-minute getting started tutorial. The following describes the implementation of regular expressions in Javascript.

Ii. RegExp, a regular expression object in Javascript

1. Create a regular expression

(1) Use literal: syntax var exp =/pattern/flags;

A. pattern is any regular expression.

B. There are three types of flags: g indicates global mode, I indicates case-insensitive mode, and m indicates multi-row mode.

(2) Use RegExp built-in constructor: syntax var exp = new RegExp (pattern, flags );

A. When using constructors, pattern and flags are both strings. Therefore, double escape is required for escape characters. For example:

Literal Constructor
/\ [Bc \]/ "\ [Bc \]"
/\./ "\."
/Name \/age/ "Name \/age"
/\ D. \ d {1, 2 }/ "\ D. \ d {1, 2 }"
/\ W \ helllo \ 123/ "\ W \ hello \ 123"

Note: ECMAScript 3 shares a RegExp instance when using a literal. Using new RegExp (pattern, flags), an instance is created for each regular expression. ECMAScript 5 specifies that a new instance is created each time.

2. instance attributes

(1) global: Boolean value, indicating whether the g flag is set.

(2) ignoreCase: Boolean value, indicating whether the I flag is set.

(3) multiline: Boolean value, indicating whether the m flag is set.

(4) lastIndex: an integer that indicates the character position of the next match, counted from 0.

(5) source: string, which indicates the string mode created in the literal form. Even if the instance is created using a constructor, the string mode is stored in the literal form.

3. instance method

(1) exec () method

A. A parameter, that is, the string to apply the pattern, returns the array of the first matching item information. If there is no matching, null is returned.

B. The returned Array is an Array instance, but there are additional input and index attributes, indicating the position of the string and matching item applying the regular expression in the string respectively.

C. When matching, in the returned array, 1st items are strings that match the entire pattern, and other items are strings that match the grouping in the pattern (if there is no grouping, returns only one array ).

D. For exec (), even if g is set, a match is returned each time. The difference is that g is set and the start search location of exec is different for multiple calls, if no g is set, the search starts every time.

(2) test () method

If a string parameter is accepted, true is returned for matching. If the parameter is not matched, false is returned.

Iii. instance analysis

The following shows a regular expression used for formatting in the source code of PhoneGap.

The Code is as follows:


Var pattern = /(.*?) % (.)(.*)/;
Var str = 'lin % jisong ';
Var match = pattern.exe c (str );
Console.info (match. join (','); // lin % jisong, lin, %, jisong

Var pattern2 =/(. *) % (.)(.*)/;
Var match2 = pattern2.exec (str );
Console.info (match2.join (','); // lin % jisong, lin %, j, isong



Analysis: both pattern and pattern2 contain three groups: 2nd and 3 are the same, and 2nd are (.) match any non-newline character, 3rd groups (. *) match as many non-line characters as possible (Greedy mode). The 1st groups (. *?) Match as few non-line characters as possible (lazy mode), while the 1st groups (. *) in pattern2 are as many (Greedy mode) as possible to match any non-line characters. Therefore, when the entire pattern match is successful (and a % character needs to be retained for matching % in the regular expression), the 1st groups in pattern are matched to lin, in pattern2, the first group matches lin %, and the output in the above example is not hard to understand.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.