Regular Expression (2) -- metacharacters

Source: Internet
Author: User

The characters in a regular expression are divided into metacharacters and general characters,General charactersMatch itself (what is written, for example, to write 'A' in a regular expression, and match 'A' in a match '), metacharacters are the most basic and core content of regular expressions. Basically, all regular expressions are composed of metacharacters, which are the basis of regular expressions. This section describes the character groups, multi-choice structures, periods, and predefined character sets.

What is metacharacters"

Metacharacters are special characters that have special meanings in regular expressions.These metacharacters (special characters or character groups) can replace one or more characters. The following figure shows the basic metacharacters.

The figure above contains basic 'metacharacters, including quantifiers, boundary matching, character matching, logic, grouping, surround view, and some special structures.

Metacharacters-character groups

There are two character groups:Common Character groups and excluded character groups

1. Normal character group

"[·]", The regular expression that appears in the structure on the left is called a normal character group, it will match any character in the [...] struct.

Example: [ABC]

The character 'a' can match 'A', the character 'B' can match 'B', and the character 'C' can match 'C ', [ABC] matches any of the three characters 'A', 'B', and 'c.

From the programming point of view, the condition for judging a 'or' is equivalent to another regular expression (A | B | C). Either it matches 'A' or 'B ', otherwise, it matches 'C '.

Let's take a simple example.

Regular Expression: 'gr [AE] y'

First match 'G', then 'R', then a 'A' or 'E', and then a 'y '. We can see from this process that the text that this regular expression can match is 'Gray 'or 'Gray'

We can also use the hyphen '-' to represent a range,Match a character in a range with the [...] struct. For example:

Regular Expression: [a-d0-9]

This regular expression can match any character in 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, and D. Two regular expressions that contain more than two ranges are called'Multi-range'

'-' A hyphen can only be a metacharacters in the [...] struct; otherwise, it can only match common hyphens. Of course, if the '-' character starts with the [...] struct, it is also a common character, for example

Regular Expression :[-!.? ABC]

In this case, '-' indicates itself. This regular expression can match '-','! ','.','? ', 'A',' B ', any of 'C'

Ii. Excluded character groups

"[^ ·]", The regular expression that appears in the structure on the left is called an excluded character group, it will match any character other than the character in the [^ ·] struct, that is, all other characters except the character in the ^ ·] struct can match.

Example: [^ ABC]

The above regular expression can match D, match E, match D-Z, A-Z, match any number, match '? ', But it cannot match any of the three characters 'A',' B ', and 'c'.

Metacharacters-point numbers match any character

'.' Can match any character except the linefeed \ n (As long as the DoT number is not in the character group, the metacharacter meaning in the character group is itself). Of course, if you want to match the dot itself, you need to use the Escape Character '\' to escape, that is '\.'

Metacharacters-multiple-choice Structure

The multiple-choice structure is similar to the character group, which means "or". You can select one of them. But the biggest difference between them is that The 'character group' can only be selected between a single character. The multiple-choice structure is selected between multiple expressions. For example:

Regular Expression: '(gray | gray )'

The above regular expression means either matching gray or gray. The matching result is equivalent to 'gr [AE] Y'. Another example is as follows:

'Jeff (Rey | ERY )'

Match 'J' first, then match 'E', then match 'F', then match 'F', and then match 'rey 'or 'ery, this is the process of executing the multi-choice structure.

Metacharacters-predefined character sets

The predefined character set is like the one shown in. They use the '\ letter' format to represent multiple characters. \ D stands for 0 to 9, and \ s stands for blank characters, \ W represents 0 to 9, A to Z, A to Z. While their larger forms indicate their difference sets

Metacharacters also include quantifiers, location match characters, Parentheses, and reverse references. These contents will be separately explained later.

Reference: Mastering regular expressions-Third Edition

Python Regular Expression tutorial

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.