The usage of metacharacters in the regular expression tutorial.

Source: Internet
Author: User
Tags character classes control characters printable characters alphanumeric characters

The usage of metacharacters in the regular expression tutorial.

This document describes how to use metacharacters in the regular expression tutorial. We will share this with you for your reference. The details are as follows:

Note: In all examples, regular expression matching results are included between [and] in the source text, and some examples are implemented using Java. If the regular expression is used in java, it will be described in the corresponding area. All java examples have passed the test under JDK1.6.0 _ 13.

1. Escape special characters

Metacharacters are characters with special meanings in regular expressions. Because metacharacters have special meanings in regular expressions, they cannot be used to represent themselves. Add a backslash before the metacharacters to escape the metacharacters. In this way, the escape sequence matches the character itself rather than its special metacharacters. For example, if you want to match [and], You must escape it:

And
.

To escape metacharacters, you need to use a slash (\), which means that the \ character is also a metacharacter. to match the \ character itself, it must be converted \\. For example, match the windows file path.

2. Match blank characters

Metacharacters can be roughly divided into two types: one is used to match text (such as.), and the other is required by the regular expression syntax (such as [and]).

During regular expression search, we often encounter the need to match non-printable blank characters in the original text. For example, we may need to find all the tabs, or we need to find the line breaks. It is difficult to directly input such characters into a regular expression, in this case, we can use the special metacharacters listed below to enter them:

\ B Roll back (and delete) one character (Backspace key)
\ F Page feed
\ N Line Break
\ R Carriage Return
\ T Tab (Tab key)
\ V Vertical Tab

Let's look at an example and remove the blank lines in the file:

Text:

8 5 4 1 6 3 2 7 9
7 6 2 9 5 8 3 4 1
9 3 1 4 2 7 8 5 6

6 9 3 8 7 5 1 2 4
5 1 8 3 4 2 6 9 7
2 4 7 6 1 9 5 3 8

3 26 7 8 4 9 1 5
4 8 9 5 3 1 7 6 2
1 7 5 2 9 6 4 8 3

Regular Expression: \ r \ n

Analysis: \ r \ n matches a combination of carriage return and line feed. In windows, it is used as the end label of the text line. The search by using the regular expression \ r \ n will match two consecutive row-end labels, which are exactly blank rows.

Note: in Unix and Linux operating systems, only one line break is used to end a text line. In other words, in Unix or Linux systems, only \ n can be used to match blank lines, \ r is not required. In addition, Regular Expressions applicable to windows and Unix/Linux should include a preemptible \ r and a must-match \ n, that is, \ r? \ N \ r? \ N, which will be discussed in later articles.

The Java code is as follows:

Public static void matchBlankLine () throws Exception {BufferedReader br = new BufferedReader (new FileReader (new File ("E:/ .txt"); StringBuilder sb = new StringBuilder (); char [] cbuf = new char [1024]; int len = 0; while (br. ready () & (len = br. read (cbuf)> 0) {br. read (cbuf); sb. append (cbuf, 0, len);} String reg = "\ r \ n"; System. out. println ("original content: \ n" + sb. toString (); System. out. println ("after processing: -----------------------------"); System. out. println (sb. toString (). replaceAll (reg, "\ r \ n "));}

The running result is as follows:

Original content: 8 5 4 1 6 3 2 7 97 6 2 9 5 8 3 4 19 3 1 4 2 2 8 5 66 9 3 8 7 5 1 2 45 1 8 3 4 2 6 9 72 4 7 6 1 9 5 3 83 2 6 7 8 4 9 1 54 8 9 5 3 1 7 6 21 7 5 2 9 6 4 8 3 after processing: --------------------------- 8 5 4 1 6 3 2 7 97 6 2 9 5 8 3 4 19 3 1 4 2 2 7 8 5 66 9 3 8 7 5 1 2 45 1 8 3 4 2 6 9 72 4 7 6 1 9 5 3 83 2 6 7 8 4 9 1 54 8 9 5 3 1 7 6 21 7 5 2 9 6 4 8 3

3. match a specific character category

Character Set combination (matching one of multiple characters) is the most common form of matching, and some common character set combinations can be replaced by special metacharacters. These metacharacters match a certain type of characters (metacharacters). metacharacters are not essential, it can be used to enumerate relevant characters one by one or to define a character range to match a certain type of characters. However, the regular expressions constructed by these characters are simple and easy to understand and are often used in practical applications.

1. matching numbers and non-Numbers

\ D any number, equivalent to [0-9] or [0123456789]
\ D any non-digit, equivalent to [^ 0-9] or [^ 0123456789]

2. Match letters, numbers, non-letters, and numbers

Letters (A-Z is case-insensitive), numbers, underscores are a common character set, the following metacharacters are available:

\ W any letter (case-insensitive), number, and underline, equivalent to [0-9a-zA-Z _]
\ W any non-letter number and underline, equivalent to [^ 0-9a-zA-Z _]

3. Match blank and non-blank characters

\ S any blank character, equivalent to [\ f \ n \ r \ t \ v]
\ S any blank character, equivalent to [^ \ f \ n \ r \ t \ v]

Note: The Escape metacharacter \ B is not within the range of \ s.

4. Match the hexadecimal or octal value

Hexadecimal: given by the prefix \ x. For example, \ x0A corresponds to ASCII character 10 (line break), and its effect is equivalent to \ n.
Octal character: given by the prefix \ 0, the value itself can be two or three digits, for example, \ 011 corresponds to ASCII character 9 (Tab), and its effect is equivalent to \ t.

4. Use POSIX character classes

POSIX character classes are short form supported by many regular expressions. Java also supports it, But JavaScript does not. POSIX characters are as follows:

[: Alnum:] Any letter or number, equivalent to a [a-zA-Z0-9]
[: Alpha:] Any letter is equivalent to [a-zA-Z]
[: Blank:] Space or tab, equivalent to [\ t]
[: Cntrl:] ASCII control characters (ASCII 0 to 31, plus ASCII 127)
[: Digit:] Any number is equivalent to [0-9].
[: Graph:] Any printable character, excluding spaces
[: Lower:] Any lowercase letter, equivalent to [a-z]
[: Print:] Any printable character
[: Punct:] It does not belong to any character of [: alnum:] or [: cntrl :]
[: Space:] Any blank character, including space, is equivalent to [^ \ f \ n \ r \ t \ v]
[: Upper:] Any capital letter, equivalent to a [A-Z]
[: Xdigit:] Any hexadecimal number is equivalent to a [a-fA-F0-9]

The POSIX character is not the same as the metacharacters we have seen before. Let's look at an example of using a regular expression to match the color in a webpage:

Text: <span style = "background-color: # 3636FF; height: 30px; width: 60px;"> test </span>

Regular Expression: # [[: xdigit:] [[: xdigit:] [[: xdigit:] [[: xdigit:] [[: xdigit:] [[: xdigit:]

Result: <span style = "background-color: [# 3636FF]; height: 30px; width: 60px;"> test </span>

Note: The mode used here ends with [starting with [, starting with]. This is required for POSIX character classes. POSIX characters must be included between [: And, the outer [and] characters are used to define a set. The inner [and] characters are components of the POSIX character class.

The POSIX characters in java are different, not between [: And:], but starting with \ p, including between {And}, and are case sensitive, \ p {ASCII} is added as follows:

\ P {Alnum} Alphanumeric characters: [\ p {Alpha} \ p {Digit}]
\ P {Alpha} Letter: [\ p {Lower} \ p {Upper}]
\ P {ASCII} All ASCII: [\ x00-\ x7F]
\ P {Blank} Space or tab: [\ t]
\ P {Cntrl} Control Character: [\ x00-\ x1F \ x7F]
\ P {Digit} Decimal number: [0-9]
\ P {Graph} Visible characters: [\ p {Alnum} \ p {Punct}]
\ P {Lower} Lowercase letter: [a-z]
\ P {Print} Printable characters: [\ p {Graph} \ x20]
\ P {Punct} Punctuation :! "# $ % & '() * +,-./:; <=>? @ [\] ^ _ '{| }~
\ P {Space} Blank characters: [\ t \ n \ x0B \ f \ r]
\ P {Upper} Uppercase letter: [A-Z]
\ P {XDigit} Hexadecimal number: [0-9a-fA-F]

PS: here we will provide two very convenient Regular Expression tools for your reference:

JavaScript Regular Expression online testing tool:
Http://tools.jb51.net/regex/javascript

Regular Expression generation tool:
Http://tools.jb51.net/regex/create_reg

I hope this article will help you learn regular expressions.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.