Character matching for regular expression tutorials

Source: Internet
Author: User
Tags character set expression lowercase printable characters range regular expression

Ordinary characters

Normal characters are made up of all print and nonprinting characters that are not explicitly specified as metacharacters. This includes all uppercase and lowercase alphabetic characters, all numbers, all punctuation marks, and some symbols.

The simplest regular expression is a single ordinary character that matches the character itself in the searched string. For example, the single character pattern ' a ' can match the letter ' a ' that appears anywhere in the searched string. Here are some examples of word single-character patterns:

/a/ /7/ /M/

The equivalent VBScript word single-character expression is:

"a" "7" "M"

You can combine multiple single characters together to get a larger expression. For example, the following JScript regular expression is nothing more than an expression created by combining a single character expression ' a ', ' 7 ', and ' M '.

/a7M/

The equivalent VBScript expression is:

"a7M"

Please note that there are no connection operators here. All you have to do is put one character behind another character.

Special characters

There are a number of metacharacters that require special processing when trying to match them. To match these special characters, you must first escape the characters, which means that you use a backslash (\) earlier. These special characters and their meanings are given in the following table:

Special Characters Description
$ Matches the end position of the input string. If the Multiline property of the RegExp object is set, then $ also matches ' \ n ' or ' \ R '. To match the $ character itself, use \$.
( ) Marks the start and end position of a subexpression. The subexpression can be obtained for later use. To match these characters, use \ (and \).
* Matches the preceding subexpression 0 or more times. To match the * character, use \*.
+ Matches the preceding subexpression one or more times. to match the + character, use \+.
. Matches any single character except the newline character \ n. to match., please use \.
[ Marks the beginning of a bracket expression. To match [, use \[.
? Matches the preceding subexpression 0 or more times, or indicates a non-greedy qualifier. Want to match? characters, please use \?.
\ Marks the next character as either a special character, or a literal character, or a back reference, or a octal escape character. For example, ' n ' matches the character ' n '. ' \ n ' matches line breaks. The sequence ' \ \ ' matches ' \ ' and ' \ (' matches '.
^ Matches the starting position of the input string, unless used in a bracket expression, at which point it means that the character set is not accepted. To match the ^ character itself, use \^.
{ The beginning of a tag qualifier expression. To match {, use \{.
| Indicates a choice between two items. to match |, use \|.

-->

Non-printable characters

There are a number of very useful nonprinting characters that must be used occasionally. The following table shows the escape sequences used to represent these nonprinting characters:

character meaning
\cx Matches the control character indicated by x . For example, \cm matches a control-m or carriage return character. The value of x must be one-a-Z or a-Z. Otherwise, c is treated as a literal ' C ' character.
\f Matches a page feed character. Equivalent to \x0c and \CL.
\ n Matches a line feed character. Equivalent to \x0a and \CJ.
\ r Matches a carriage return character. Equivalent to \x0d and \cm.
\s Matches any white space character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s Matches any non-white-space character. equivalent to [^ \f\n\r\t\v].
\ t Matches a tab character. Equivalent to \x09 and \ci.
\v Matches a vertical tab. Equivalent to \x0b and \ck.

-->

Character matching

a period (.) matches any single printed or nonprinting character in a string, except for line breaks (\ n). The following JScript regular expressions can match ' AAC ', ' abc ', ' ACC ', ' ADC ' and so on, and can also match ' A1c ', ' a2c ', a-c ' and a#c ':

/a.c/

The equivalent VBScript regular expression is:

"a.c"

If you try to match a string that contains a filename, where the period (.) is part of the input string, you can implement this requirement by preceding the period in the regular expression with a backslash (\) character. For example, the following JScript regular expression can match ' filename.ext ':

/filename\.ext/

For VBScript, an equivalent expression looks like this:

"filename\.ext"

These expressions are still quite limited. They only allow matching of any single character. In many cases, it is useful to match Special characters from a list. For example, if your input text contains chapter headings that are represented numerically as Chapter 1, Chapter 2, and so on, you may need to find these chapter headings.

Bracket expression

You can create a list to match by putting one or more single characters in a square bracket ([and]). If the character is enclosed in parentheses, the list is called a bracket expression . In parentheses and anywhere else, ordinary characters represent themselves, that is, they match the one that appears in the input text. Most special characters lose their meaning when they are in a bracket expression. Here are some exceptions:

    • The '] ' character will end a list if it is not the first item. To match the '] ' character in the list, place it in the first item, immediately after the start ' ['.
    • ' \ ' is still an escape character. to match the ' \ ' character, please use ' \ '.

The characters contained in the bracket expression match only a single character of the bracket expression where it is located in the regular expression. The following JScript regular expressions can match ' Chapter 1 ', ' Chapter 2 ', ' Chapter 3 ', ' Chapter 4 ', and ' Chapter 5 ':

/Chapter [12345]/

To match the same chapter headings in VBScript, use the following expression:

"Chapter [12345]"

Note that the word ' Chapter ' and the following spaces are fixed with the position of the characters in parentheses. Therefore, the bracket expression is used only to specify a character set that satisfies the single characters position immediately following the word ' Chapter ' and a space. This is the Nineth character position.

If you want to use a range instead of the character itself to represent a character to be matched, you can use a hyphen to separate the start and end characters of the range. The character value of each character determines its relative order within a range. The following JScript regular expression contains a range expression that is equivalent to the list of parentheses shown above.

/Chapter [1-5]/

An expression of the same functionality in Vbscipt is shown below:

"Chapter [1-5]"

If you specify a range in this manner, both the start and end values are included within that range. It is particularly important to note that the starting value in a Unicode sort must precede the end value.

If you want to include hyphens in parentheses expressions, you must use one of the following methods:

    • Use a backslash to escape it:
      [\-]
    • Place hyphens at the beginning and end of the parentheses list. The following expression matches all lowercase letters and hyphens:
      [-a-z] [a-z-]
    • Creates a range where the value of the start character is less than the hyphen, and the end character's value is equal to or greater than the hyphen. The following two regular expressions all meet this requirement:
      [!--] [!-~]

Similarly, by placing an caret (^) at the beginning of the list, you can find all characters that are not in the list or range. If the caret appears elsewhere in the list, it matches itself without any special meaning. The following JScript regular expression matches chapter headings with chapter numbers greater than 5:

/Chapter [^12345]/

For VBScript use:

"Chapter [^12345]"

In the example shown above, the expression matches any numeric character except 1, 2, 3, 4, or 5 in the nineth position. So, ' Chapter 7 ' is a match, same ' Chapter 9 '.

The expression above can be expressed using a hyphen (-). For JScript:

/Chapter [^1-5]/

Or, for VBScript:

"Chapter [^1-5]"

A typical use of bracket expressions is to specify a match for any uppercase or lowercase alphabetic characters or any number. The following JScript expression gives this match:

/[A-Za-z0-9]/

The equivalent VBScript expression is:

"[A-Za-z0-9]"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.