Regular Expression (2)

Source: Internet
Author: User
Tags character classes

Regular expression syntax on JavaScript

Author: no power
Source: http://www.blog.edu.cn/user2/afan/archives/2006/1215847.shtml

 

A regular expression is an object that describes the character mode.
The Regexp object and string object in Javascript define a method that uses regular expressions to execute powerful pattern matching and text retrieval and replacement functions.

In JavaScript, regular expressions are represented by a Regexp object. Of course, you can use a Regexp () constructor to create a Regexp object,
You can also use a new special syntax added in Javascript 1.2 to create a Regexp object, just as the string's direct quantity is defined as a character contained in quotation marks,
A regular expression is also defined as a character that contains a slash (/). Therefore, JavaScript may contain the following code:

VaR pattern =/S $ /;

This line of code creates a new Regexp object and assigns it to the parttern variable. this special Regexp object matches all strings ending with the letter "S. you can also use Regexp () to define
An equivalent regular expression. The Code is as follows:

VaR pattern = new Regexp ("S $ ");

It is easy to create a Regexp object, whether using a regular expression to directly count or using the constructor Regexp (). A more difficult task is to describe the character pattern using the regular expression syntax.
Javascript uses a fairly complete subset of Perl's regular expression syntax.

The pattern specification of a regular expression is composed of a series of characters. most characters (including all letters, numbers, and characters) Describe character matching by literal meaning. in this way, the regular expression/Java/and
All strings containing the substring "Java" are matched. although other characters in the regular expression do not match by literal meaning, they all have special meanings. regular Expression/S $/contains two characters.
The first special character "S" matches itself by literal meaning. the second character "$" is a special character that matches the end of a string. therefore, the regular expression/S $/matches the string ending with the letter "S ".
.

1. directly count characters

We have found that all the letters and numbers in the regular expression match their own literal meanings. the JavaScript regular expression also supports some non-

Letter character. for example, the sequence "/N" matches a direct line feed in a string. many punctuation marks have special meanings in regular expressions. the following are the characters and their meanings:

The direct character count of the regular expression.

Character matching
________________________________
Letter/digit character
/F page feed
/N linefeed
/R press ENTER
/T Tab
/V vertical Tab
// One/Direct Volume
// One/Direct Volume
/. A. Direct Volume
/* A * Direct Volume
/+ One + Direct Volume
/? One? Direct Volume
/| One | direct quantity
/(One (Direct Volume
/) One) Direct Volume
/[A [Direct Volume
/] One] Direct Volume
/{One {direct quantity
/} Direct quantity of one}
/Xxx ASCII characters specified by the decimal number XXX
/Xnn ASCII characters specified by hexadecimal NN
/CX control character ^ X. For example,/CI is equivalent to/t,/CJ is equivalent to/n

___________________________________________________

To use special punctuation marks in a regular expression, you must add "/" before them "/".

2. character classes

Put a separate direct character in brackets to form a character class. A character class matches any one of its characters, so any of the Regular Expressions/[ABC]/and letters "A", "B", "C"
All match. you can also define a negative character class that matches all characters except those contained in brackets. when defining a negative character tip, you must use a ^ symbol as
A character. The set of regular expressions is/[a-zA-z0-9]/.

Because some character classes are very commonly used, the regular expression syntax of JavaScript contains some special characters and escape sequences to represent these commonly used classes. for example,/s matches space characters, tabs, and other blank characters,/s
Match any character except the blank space.

Regular Expression gray character classes

Character matching
____________________________________________________
[...] Any character in parentheses
[^...] Any character not in parentheses
Any character except line breaks is equivalent to [^/n]
/W any single character, equivalent to [a-zA-Z0-9]
/W any non-single character, equivalent to [^ a-zA-Z0-9]
/S any blank space character, equivalent to [/T/N/R/f/V]
/S any non-blank character, equivalent to [^/T/N/R/f/V]
/D any number, equivalent to [0-9]
/D any character except number, equivalent to [^ 0-9]
[/B] A return direct quantity (Special Case)
________________________________________________________________

3. Copy

Using the regular expression syntax above, we can describe two digits as // D/and four digits as // D /. however, there is no way to describe any number with Multiple Digits or

String. the string consists of three characters and a digit following the letter. these complex patterns use the regular expression syntax to specify the number of times each element in the expression will appear again.

The specified characters always follow the pattern in which they are applied. some replication types are quite common. therefore, some special characters are used to indicate them. for example, if the number + matches, the previous mode is copied once.

Or multiple modes. The following table lists the replication syntax. Let's look at an example first:

// D {2, 4} // match the numbers between 2 and 4.

// W {3}/D? /// Match three single-character characters and an arbitrary number.

// S + Java/S + // match the string "Java", and there can be one or more spaces before and after the string.

/[^ "] * // Matches zero or multiple non-quoted characters.

Duplicate characters of Regular Expressions

Character meaning
__________________________________________________________________
{N, m} matches the first item at least N times, but cannot exceed M times
{N,} matches the previous item n times or multiple times.
{N} matches the first item EXACTLY n times.
? Match the first item 0 or 1, that is, the first item is optional. equivalent to {0, 1}
+ Match the previous item once or multiple times, equivalent to {1 ,}
* Match the first item 0 or multiple times. It is equivalent to {0 ,}
___________________________________________________________________

4. Select, group, and reference

The regular expression syntax also includes specifying selection items, grouping subexpressions, and referencing special characters of the previous subexpression. character | used to separate the selected characters. for example,/AB | cd | EF/matches the string "AB", or

String "cd", or "Ef ". // d {3} | [A-Z] {4}/matches either a three-digit number or four lower-case letters. brackets in regular expressions have several functions. it groups individual projects.

Into a sub-expression, so that it can be processed as an independent unit using *, +, or? To process those projects. For example:/Java (SCRIPT )? /Match the string "Java", which can be followed by either "script" or no ./

(AB | cd) + | ef)/matches either the string "Ef" or the string "AB" or "cd.

In a regular expression, the second purpose of parentheses is to define the child pattern in the complete pattern. When a regular expression matches the target string, you can extract the regular expression from the target string and match the child pattern in the brackets.

. For example, if the pattern we are retrieving is one or more letters followed by one or more digits, we can use the pattern/[A-Z] +/d + /. however, given that we really care about each matching

The number at the end of the pattern, If we place the numeric part of the pattern in brackets (/[A-Z] + (/d + )/), we can extract numbers from any matching results, and then we will parse the numbers.

Another purpose of the subexpression of parentheses is to allow us to reference the previous subexpression after the same regular expression. this is achieved by adding one or more digits to the string/suffix. number refers to the bracket

Position of a subexpression in a regular expression. for example,/1 references the first child expression in parentheses. /3 references the child expression of the third parenthesis. note that subexpressions can be nested in other subexpressions,

Therefore, its position is the left parenthesis of the count.
For example, the following regular expression is specified as/2:
/([JJ] Ava ([ss] Ghost)/SIS/s (fun/W *)/

The reference to the first subexpression in a regular expression is not the pattern of the subexpression, but the text that matches the pattern. in this way, the reference is not just to help you enter the duplicate part of the regular expression.

It also implements a protocol, that is, the separation of each character string contains exactly the same characters. for example, the following regular expression matches all words in single or double quotation marks.

But it requires that the start and end quotation marks match (for example, both are double quotation marks or both are single quotation marks ):
/['"] [^'"] * ['"]/

If the start and end quotation marks are required to match, we can use the following reference:
/(['"]) [^'"] */1/

/1 matches the pattern matched by the first child expression in parentheses. in this example, it implements a statute, that is, the start quotation marks must match the ending quotation marks. note: If the ratio of digits following the backslash is

If the number of child expressions in parentheses is large, it will be parsed into a decimal escape sequence instead of a reference. you can use the complete three characters to represent the escape sequence, which can avoid confusion. for example,

Use/044 instead of/44. The selection, grouping, and reference characters of the regular expression are as follows:

Character meaning
____________________________________________________________________
| Select. match either the child expression on the left of the symbol or the child expression on the right of the symbol.
(...) Grouping. Several projects are divided into one Unit. This unit can be divided by *, + ,? And |

Use
/N matches the characters matching the nth group. The group is a subexpression (which may be nested) in the brackets. The group number is the number of left parentheses counted from left to right.
____________________________________________________________________

5. Specify the matched location

As we can see, many elements in a regular expression can match a character in a string. for example,/s matches only a blank character. some elements in the regular expression match the character width

0 space, rather than the actual characters. For example,/B matches the boundary of a word, that is, the boundary between A/W character and A/W non-character. characters such as/B do not specify any matching

The characters in the string, which specify the valid location where the match occurs. sometimes we call these elements the anchor of a regular expression. because they locate the pattern in a specific position in the search string. the most common anchor Element

Element is ^, which causes the pattern to depend on the start of the string, while element $ of the anchor causes the pattern to be located at the end of the string.

For example, to match the word "JavaScript", we can use a regular expression/^ JavaScript $ /. if we want to retrieve the word "Java" itself (not as prefix in "JavaScript"), we can

Use mode // s Java/S/, which requires spaces before and after the word Java. but there are two problems. first, if "Java" appears at the beginning or end of a character. this mode will not match,

There is not a space at the beginning and end. second: When this mode finds a matched character, it returns a matched string with spaces at the front end and backend. This is not what we want. therefore, we use words

The result expression is // B Java/B /.
The following are the anchor characters of the regular expression:

Character meaning
____________________________________________________________________
^ Match the beginning of a character. In multi-row search, match the beginning of a line
$ Matches the end of a character. In multi-row search, it matches the end of a row.
/B matches the boundary of a word. In short, it is located between the character/W and/W. (Note: [/B] matches the return character)
/B matches non-word boundary characters
_____________________________________________________________________

6. Attributes

The regular expression syntax also has the last element, which is the attribute of the regular expression. It describes the rules for advanced pattern matching. unlike other regular expression syntaxes, attributes are described outside the/symbol. that is, it

They do not appear between two slashes, but are placed behind the second slashes. javascript 1.2 supports two attributes. attribute I indicates that pattern matching is case insensitive. attribute G indicates that pattern matching should be global. also

That is to say, we should find all the matches in the searched string. These two attributes can be combined to execute a global, case-insensitive match.

For example, you need to perform an insensitive search to find the first specific value of the word "Java" (or "Java", "J *** a", etc, we can use a non-sensitive regular expression // B Java/B/I. if you want

Find all the specific values of "Java" in a string. We can also add the attribute g, that is, // B Java/B/GI.

The following are the attributes of a regular expression:

Character meaning
_________________________________________
I. Perform case-insensitive matching.
G executes a global match. In short, it finds all the matches, instead of stopping them after finding the first one.
_________________________________________

In addition to attributes G and I, regular expressions do not have other features like properties. if you set the static attribute multiline of the Regexp constructor to true, the pattern matching is performed in multiline mode. here

In this mode, the anchor character ^ and $ match not only the start and end of the search string, but also the beginning and end of a row inside the search string. for example, the pattern/Java $/matches "Java" but does not match

"Java/NIS fun". If the multiline attribute is set, the latter will also be matched:

Regexp. multiline = true;

The regular expression object contains a regular expression pattern ). It has the properties and methods that use the regular expression mode to match or replace a string with a specific character (or character set combination ). To add attributes to a single regular expression, you can use the regular expression constructor function ), the pre-configured regular expression has static properties (the predefined Regexp object has static properties that are set whenever any regular expression is used, I don't know if it is correct. I will list the original text. Please translate it by yourself ).

  • Create:
    A text format or regular expression Constructor
    Text Format:/pattern/flags
    Regular Expression constructor: New Regexp ("pattern" [, "Flags"]);
  • Parameter description:
    Pattern -- a regular expression text
    Flags -- if it exists, it will be the following values:
    G: Global match
    I: case insensitive
    GI: combination of the above

[Note]Parameters in the text format are not enclosed in quotation marks, but must be enclosed in quotation marks when constructors are used. For example,/AB + C/I new Regexp ("AB + C", "I") implements the same function. In constructors, some special characters need to be converted (Add "/" before special characters "/"). For example, Re = new Regexp ("// W + ")

Special characters in Regular Expressions

Character Meaning
/

In turn, that is, the characters after "/" are not interpreted as original meaning, such as/B/matching character "B ", when B is added to the front of the backslice bar // B/, it is converted to match the boundary of a word.
-Or-
Restores the function characters of a regular expression. For example, if "*" matches the previous metacharacters 0 or multiple times,/a */matches a, AA, AAA, after "/" is added,/a/*/will only match "*".

^ Matches the beginning of an input or a line,/^ A/matches "an A", but does not match "an"
$ Matches the end of an input or line,/a $/matches "an A", but does not match "an"
* Match the previous metacharacters 0 or multiple times./Ba */matches B, Ba, Baa, baaa
+ Match the previous metacharacters once or multiple times./Ba */matches Ba, Baa, baaa
? Match the first metacharacters 0 or 1 times,/Ba */will match B, Ba
(X) Match X and save X in the variable $1... $9.
X | y Match X or Y
{N} Exact match n times
{N ,} Match more than N times
{N, m} Match N-m times
[Xyz] Character set (Character Set), which matches any one of the characters (or metacharacters) in the set)
[^ XYZ] Does not match any character in this set
[/B] Match a return character
/B Match the boundary of a word
/B Match non-boundary of a word
/CX Here, X is a controller, // cm/matches Ctrl-m
/D Match a word character, // D/=/[0-9]/
/D Match a non-word character, // D/=/[^ 0-9]/
/N Match A linefeed
/R Match a carriage return.
/S Matches a blank character, including/N,/R,/F,/t,/V, etc.
/S Match a non-blank character, equal to/[^/n/F/R/T/V]/
/T Match a tab
/V Match a Duplicate Tab
/W Match a character that can make up a word (alphanumeric, this is my free translation, containing numbers), including underscores, such as [/W] matching 5 in "$5.98", equal to [a-zA-Z0-9]
/W Match a character that cannot make up a word, such as [/W] matching $ in "$5.98", equal to [^ a-zA-Z0-9].

After talking about this, let's look at some examples of the practical application of Regular Expressions:

Email address verification:
Function test_email (stremail ){
VaR myreg =/^ [_ a-z0-9] + @ ([_ a-z0-9] +/.) + [a-z0-9] {2, 3} $ /;
If (myreg. Test (stremail) return true;
Return false;
}
HTML code shielding
Function mask_htmlcode (strinput ){
VaR myreg =/<(/W +)> /;
Return strinput. Replace (myreg, "& lt; $1 & gt ;");
}Attributes and methods of a regular expression object
Predefined regular expressions have the following static attributes: input, multiline, lastmatch, lastparen, leftcontext, rightcontext, and $1 to $9. Input and Multiline can be pre-set. Values of other attributes are assigned different values based on different conditions after the exec or test method is executed. Many attributes have both long and short (Perl style) names, and these two names point to the same value. (JavaScript simulates the Regular Expression of Perl)
Attributes of a regular expression object

Attribute Description
$1... $9 If it exists, it is a matched substring.
$ _ See Input
$ * See multiline
$ & See lastmatch
$ + See lastparen
$' See leftcontext
$' See rightcontext
Constructor Create a special function prototype for an object
Global Match in the entire string (bool type)
Ignorecase Whether to ignore the case sensitivity when matching (bool type)
Input Matched string
Lastindex Last matched Index
Lastparen Substring enclosed in parentheses
Leftcontext The last match takes the left substring
Multiline Whether multi-row matching is performed (bool type)
Prototype Allow attributes to be attached to objects
Rightcontext The last matched substring to the right
Source Regular Expression Mode
Lastindex Last matched Index


Regular Expression object Method

Method Description
Compile Regular Expression comparison
Exec Execute search
Test Matching
Tosource Returns the definition of a specific object (literal representing). Its value can be used to create a new object. This is obtained by reloading the object. tosource method.
Tostring Returns the string of a specific object. The result is obtained by reloading the object. tostring method.
Valueof Returns the original value of a specific object. Obtain the value of the object. valueof method.

Example
<Script language = "JavaScript">
VaR myreg =/(/W +)/S (/W + )/;
VaR STR = "John Smith ";
VaR newstr = Str. Replace (myreg, "$2, $1 ");
Document. Write (newstr );
</SCRIPT>
Output "Smith, John"

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.