Regular Expression parsing in Javascript

Source: Internet
Author: User
Tags character classes

 

A regular expression is an object that describes the character mode.

The Regexp object and string object in Javascript define a method that uses regular expressions to execute powerful pattern matching and text retrieval and replacement functions.

In JavaScript, regular expressions are represented by a Regexp object. Of course, you can use a Regexp () constructor to create a Regexp object. You can also use
Javascript
A new Special syntax added in 1.2 to create a Regexp object. Just as the string is defined as a string containing characters in quotation marks, the regular expression is also defined as contained in a slash.
(/). Therefore, JavaScript may contain the following code:

VaR pattern =
/S $ /;

This line of code creates a new Regexp object and assigns it to the parttern variable. this special Regexp object matches all strings ending with the letter "S. regexp () can also be used to define an equivalent regular expression. The Code is as follows:

VaR
Pattern = new
Regexp ("S $ ");

It is easy to create a Regexp object, whether using a regular expression or using a constructor Regexp. A more difficult task is to use regular expression syntax to describe the character mode. javascript uses a fairly complete subset of Perl's regular expression syntax.

The pattern specification of a regular expression is composed of a series of characters. Most characters (including all letters, numbers, and characters) Describe the characters matching the literal meaning. In this way, the Regular Expression
/Java/matches all strings containing the sub-string "Java. although other characters in the regular expression do not match by literal meaning, they all have special meanings. regular Expression/S $/
Contains two characters.

The first special character "S" matches itself by literal meaning. the second character "$" is a special character that matches the end of a string. so regular expression/S $/
Match with the letter "S"
End
.

  1. directly count characters


We have found that all the letters and numbers in the regular expression match their own literal meanings. the JavaScript regular expression also supports some non-

Letter character. For example, a sequence
"/N"
A string matches a direct line feed. In a regular expression, many punctuation marks have special meanings. The following are the meanings of these characters:

The direct character count of the regular expression.

Character
Match
________________________________
Letter/digit character
/F page feed
/N linefeed
/R
Enter
/T Tab
/V vertical Tab
// One/Direct Volume
// One/Direct Volume
/. A. Direct Volume
/
* One * Direct Volume
/+ One + Direct Volume
/? One? Direct Volume
/| One | direct quantity
/(One (Direct Volume
/
) One) Direct Volume
/[A [Direct Volume
/] One] Direct Volume
/{One {direct quantity
/} Direct quantity of one}
/
The ASCII character specified by the decimal number XXX.
/Xnn ASCII characters specified by hexadecimal NN
/CX control character ^ X. For example,
/CI is equivalent to/t, And/CJ is equivalent
/N

___________________________________________________

If you want to use special punctuation marks in a regular expression, you must add
"/"
.

  2. character classes

Put a single direct character in brackets to form a character class. A character class matches any one of its characters, so the Regular Expression
/[ABC] // and the letter "A", "B", "C"
All of them match. you can also define a negative character class that matches all characters except those contained in brackets. when defining a negative character tip, you must set a ^
The symbol is the first character counted from the left braces. The set of regular expressions is/[a-zA-z0-9]/
.

Because some character classes are very commonly used, the regular expression syntax of JavaScript contains some special characters and escape sequences to represent these commonly used classes. For example
Matches space characters, tabs and other blank characters, and/s matches any character other than blank characters.

Regular Expression gray character classes

Character
Match
____________________________________________________
[...]
Any character in parentheses
[^...] Any character not in parentheses
Any character except line breaks is equivalent to [^/n]
/W any single character,
Equivalent to [a-zA-Z0-9]
/W any non-single character, equivalent to [^ a-zA-Z0-9]
/S any blank space character, equivalent to [/T/N/R/f/
V]
/S any non-blank character, equivalent to [^/T/N/R/f/V]
/D any number, equivalent to [0-9]
/D
Any character except a number is equivalent to [^ 0-9].
[/B]
Direct amount of a single backspace (Special Case)
________________________________________________________________

  3. Copy


The above regular expression syntax can describe two digits
// D/, describe the four digits as // D/d
/. However, there is no way to describe any number with Multiple Digits or

String. the string consists of three characters and a digit following the letter. these complex patterns use the regular expression syntax to specify the number of times each element in the expression will appear again.

The characters specified for replication always follow the pattern in which they are applied. Because some replication types are quite common, some special characters are specially used to represent them. For example:
+ The matching number is the mode in which the previous mode is copied one or more times. The following table lists the replication syntax. Let's look at an example first:

// D {2, 4 }/
// Match the numbers between 2 and 4.

// W {3}/D? /// Match three single-character characters and an arbitrary number.

// S + Java/S +/
// Match the string "Java", and there can be one or more spaces before and after the string.

/[^ "] */
// Match zero or multiple non-quoted characters.

Duplicate characters of Regular Expressions

Character
Description
__________________________________________________________________
{N,
M} matches the first item at least N times, but cannot exceed M times
{N,} matches the previous item n times or multiple times.
{N} matches the first item EXACTLY n times.
?
Match the first item 0 or 1, that is, the first item is optional. equivalent to {0, 1}
+ Match the previous item once or multiple times, equivalent to {1 ,}
*
Match the previous item 0 or multiple times. equivalent to {0 ,}
___________________________________________________________________

  4. Select, group, and reference

The regular expression syntax also includes specifying selection items, grouping subexpressions, and referencing special characters of the previous subexpression. |
Used to separate the selected characters. for example,/AB | cd | EF/matches the string "AB", "cd", or "Ef ". // d {3} | [A-Z] {4 }/
The match is either a three-digit number or four lower-case letters. brackets in regular expressions have several functions. its main function is to group A single project into sub-expressions, so that it can be used like processing an independent unit.
*, +, Or? To process those projects. For example:/Java (SCRIPT )? /Match the string "Java", which can be followed by either "script" or no.
/(AB | cd) + | ef)/matches either the string "Ef" or the string "AB" or "cd.

In a regular expression, the second purpose of parentheses is to define the child pattern in the complete pattern. When a regular expression matches the target string, you can extract the part that matches the child pattern in the brackets from the target string. for example, if the pattern we are retrieving is one or more letters followed by one or more digits, we can use the pattern
/[A-Z] +/d + /. however, given that we really care about the numbers at the end of each matching, if we put the numeric part of the pattern in brackets (/[A-Z] + (/d + )/)
Then, we can extract numbers from any matching results, and then we will parse the numbers.

Another purpose of the parentheses subexpression is to allow us to reference the previous subexpression after the same regular expression/
Followed by one or multiple digits. A number refers to the position of the subexpression of the parentheses in the regular expression. for example,/1 references the first child expression in parentheses. /3
References the third child expression in parentheses. Note that because the child expression can be nested in other child expressions, its position is the left bracket position to be counted.

For example, the following regular expression is specified as/2:
/([JJ] Ava ([ss] Ghost)/SIS/s (fun/W *)
/

The reference to the first subexpression in a regular expression is not the pattern of the subexpression, but the text that matches the pattern. in this way, the reference is not just a shortcut to help you enter the duplicate part of the regular expression, but also implements a protocol, that is, the separated parts of a string contain identical characters. for example, the following regular expression matches all characters in single or double quotation marks. however, it requires that the start and end quotation marks match (for example, both are double quotation marks or both are single quotation marks ):

/['"] [^'"] * ['"]/

If the start and end quotation marks are required to match, we can use the following reference:

/(['"]) [^'"] */1/

/1 matches the pattern matched by the first child expression in parentheses. in this example, it implements a statute, that is, the start quotation marks must match the ending quotation marks. note: If the number followed by the backslash is more than the number of subexpressions in parentheses, it will be parsed into a decimal escape sequence instead of a reference. you can use the complete three characters to represent the escape sequence, which can avoid confusion. for example
/044 instead of/44. Below are the selection, grouping, and reference characters of the regular expression:

Character
Description
______________________________________
|
Select. match either the child expression on the left of the symbol or the child expression on the right of the symbol.
(...) Grouping. Several projects are divided into one Unit. This unit can be
*, + ,? And |. You can also remember the characters that match this group for future reference.
/N
Matches the characters matching the nth group. The group is a subexpression (which may be nested) in parentheses. The group number is the number of left parentheses counted from left to right.
______________________________________

  5. Specify the matched location

As we can see, many elements in a regular expression can match a character in a string. For example:
The/s match is only a blank space character. Some elements in the regular expression match the space with a width of 0 between characters, rather than the actual characters such as:/B
Match the boundary of a word, that is, the boundary between A/W character and A/W non-word character. Image/B
Such characters do not specify any character in a matched string. They specify a valid position for matching. sometimes we call these elements the anchor of a regular expression. because they locate the pattern in a specific position in the search string. the most common anchor element is
^, Which causes the pattern to depend on the start of the string, while the $ anchor element causes the pattern to be located at the end of the string.

For example, to match the word "JavaScript", we can use a regular expression/^ JavaScript $/. If we want to retrieve the word "Java ",
(Unlike in "JavaScript", we can use the pattern // s Java/S /,
It requires space before and after the word java. But there are two problems in this case. First: If "Java"
It appears at the beginning or end of a character. This mode will not match, unless there is a space at the beginning and end. Second:
When a matched character is found in this mode, both the front-end and backend of the matched string returned by it have spaces. This is not what we want. therefore, we use the boundary/B of words to replace the real space character/s.
Matching. The result expression is // B Java/B /.

The following are the anchor characters of the regular expression:

Character
Description
____________________________________________________________________
^
Match the beginning of a character. In multi-row search, match the beginning of a line
$ Matches the end of a character. In multi-row search, it matches the end of a row.
/B
Match the boundary of a word. In short, it is located between the character/W and/W. (Note: [/B] matches the return character)
/B
Matched non-word boundary characters
_____________________________________________________________________

  6. Attributes


The regular expression syntax has the last element, which is the attribute of the regular expression. It describes the rules for advanced pattern matching. Unlike other regular expression syntaxes, the attribute is
/Except the symbol. That is, they do not appear between two slashes, but are located behind the second slash. Javascript 1.2 supports two attributes. attribute I
It indicates that pattern matching should be case insensitive. Attribute g
It indicates that the pattern match should be global. That is to say, all the matches in the string to be searched should be found. These two attributes can be combined to perform a global, case-insensitive match.

For example, you need to perform an insensitive search to find the words "Java" (or "Java" or "Java)
We can use a non-sensitive regular expression // B Java/B/I. if you want to find all the specific values of "Java" in a string, you can also add the attribute G,
That is, // B Java/B/GI.

The following are the attributes of a regular expression:

Character
Description
_________________________________________
I. Perform case-insensitive matching.
G
Execute a global match. In short, it is to find all the matches, instead of stopping them after finding the first one.
_________________________________________

Except attributes
In addition to G and I, regular expressions do not have other features like properties. If you set the static attribute multiline OF THE Regexp constructor to true
, The pattern match will be performed in multiline mode. in this mode, the anchor character ^ and $ match not only the start and end of the search string, but also the beginning and end of a row inside the search string. example: Mode
/Java $/matches "Java", but does not match "Java/NIS fun". If we set multiline
Attribute, the latter will also be matched:

Regexp. multiline = true;

Regular
Expression) the object contains a regular expression pattern ). It has the properties and methods that use the regular expression mode to match or replace a string with a specific character (or character set combination ). To add attributes to a single regular expression, you can use the regular expression Constructor
Function), regardless of when the pre-configured regular expression to be called has a static attribute (the predefined Regexp object has static
Properties that are set whenever any regular expression is used,
I don't know if it is correct. I will list the original text. Please translate it by yourself ).

  • Create:
    A text format or regular expression Constructor
    Text Format:/pattern/flags
    Regular Expression constructor: New
    Regexp ("pattern" [, "Flags"]);
  • Parameter description:
    Pattern -- a regular expression text
    Flags -- if it exists, it will be the following values:
    G: Global match
    I:
    Case Insensitive
    GI: combination of the above

[Note
]Parameters in the text format are not enclosed in quotation marks, but must be enclosed in quotation marks when constructors are used. For example,/AB + C/I new
Regexp ("AB + C", "I") implements the same functions. In constructors, some special characters need to be converted (Add "/" before special characters "/"). For example, Re = new
Regexp ("// W + ")

Special characters in Regular Expressions

Character Meaning
/

In turn, that is, the characters after "/" are not interpreted as original meaning, such as/B/matching character "B ", when B is added to the front of the backslice bar // B/, it is converted to match the boundary of a word.
-Or-
Restores the function characters of a regular expression. For example, if "*" matches the previous metacharacters 0 or multiple times,/a */matches a, AA, AAA, after "/" is added,/a/*/will only match "*".

^ Matches the beginning of an input or a line,/^ A/matches "an A", but does not match "an"
$ Matches the end of an input or line,/a $/matches "an A", but does not match "an"
* Match the previous metacharacters 0 or multiple times./Ba */matches B, Ba, Baa, baaa
+ Match the previous metacharacters once or multiple times./Ba */matches Ba, Baa, baaa
? Match the first metacharacters 0 or 1 times,/Ba */will match B, Ba
(X) Match X and save X in the variable $1... $9.
X | y Match X or Y
{N} Exact match n times
{N ,} Match more than N times
{N, m} Match N-m times
[Xyz] Character set (Character Set), which matches any one of the characters (or metacharacters) in the set)
[^ XYZ] Does not match any character in this set
[/B] Match a return character
/B Match the boundary of a word
/B Match non-boundary of a word
/CX Here, X is a controller, // cm/matches Ctrl-m
/D Match a word character, // D/=/[0-9]/
/D Match a non-word character, // D/=/[^ 0-9]/
/N Match A linefeed
/R Match a carriage return.
/S Matches a blank character, including/N,/R,/F,/t,/V, etc.
/S Match a non-blank character, equal to/[^/n/F/R/T/V]/
/T Match a tab
/V Match a Duplicate Tab
/W Match a character that can make up a word (alphanumeric, this is my free translation, containing numbers), including underscores, such as [/W] matching 5 in "$5.98", equal to [a-zA-Z0-9]
/W Match a character that cannot make up a word, such as [/W] matching $ in "$5.98", equal to [^ a-zA-Z0-9].

 

After talking about this, let's look at some examples of the practical application of Regular Expressions:

Email address verification:
Function test_email (stremail ){
VaR myreg =
/^ [_ A-z0-9] + @ ([_ a-z0-9] +/.) + [a-z0-9] {2, 3} $ /;
If (myreg. Test (stremail ))
Return true;
Return false;
}
HTML code shielding
Function
Mask_htmlcode (strinput ){
VaR myreg =/<(/W +)> /;
Return
Strinput. Replace (myreg,
"& Lt; $1 & gt ;");
}

Attributes and methods of a regular expression object

A predefined regular expression has the following static attributes: input,
Multiline, lastmatch, lastparen, leftcontext,
Rightcontext and $1 to $9. Input and Multiline can be pre-set. Values of other attributes are assigned different values based on different conditions after the exec or test method is executed. Many attributes have both long and short (Perl style) names, and these two names point to the same value. (JavaScript simulates the Regular Expression of Perl)
Attributes of a regular expression object

Attribute Description
$1... $9 If it exists, it is a matched substring.
$ _ See Input
$ * See multiline
$ & See lastmatch
$ + See lastparen
$' See leftcontext
$' See rightcontext
Constructor Create a special function prototype for an object
Global Match in the entire string (bool type)
Ignorecase Whether to ignore the case sensitivity when matching (bool type)
Input Matched string
Lastindex Last matched Index
Lastparen Substring enclosed in parentheses
Leftcontext The last match takes the left substring
Multiline Whether multi-row matching is performed (bool type)
Prototype Allow attributes to be attached to objects
Rightcontext The last matched substring to the right
Source Regular Expression Mode
Lastindex Last matched Index


Regular Expression object Method

Method Description
Compile Regular Expression comparison
Exec Execute search
Test Matching
Tosource Returns the definition of a specific object (literal
The value can be used to create a new object. This is obtained by reloading the object. tosource method.
Tostring Returns the string of a specific object. The result is obtained by reloading the object. tostring method.
Valueof Returns the original value of a specific object. Obtain the value of the object. valueof method.

Example
<Script
Language = "JavaScript">
VaR myreg =/(/W +)/S (/W + )/;
VaR STR = "John
Smith ";
VaR newstr = Str. Replace (myreg, "$2,
$1 ");
Document. Write (newstr );
</SCRIPT>
Output "Smith, John"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.