Regular Expression BASICS (Reading Notes), regular expression Reading Notes

Source: Internet
Author: User

Regular Expression BASICS (Reading Notes), regular expression Reading Notes

A regular expression (regex) is a tool.

Two basic functions of a regular expression: search and replace.

 

. Character (English period) can match any single character. (. Characters can match any single character, number, letter, or even. character itself .)

\ Escape character. This is a metacharacter, which indicates that this character has a special meaning rather than its own meaning .)

 

(Conclusion:. can match any character; \ is used to escape the character .)

 

[And] do not match any characters. They are only responsible for defining a character set combination.

 

-A hyphen (-) is a metacharacters that can be used to define a character range. As a metacharacters, it can only be used between [and]. It is only a common character except for a desirable character set.

 

Valid character range:

A-Z, matching all the uppercase letters from A to Z;

A-z: matches all lowercase letters from a to z;

A-z: matches all letters from ASCII character A to ASCII letter z (not commonly used );

 

^ Non-characters are also metacharacters used to perform non-operations on a character set combination.

 

Metacharacters can be roughly divided into two types: one is used to match text (for example,.), and the other is required by the regular expression syntax (for example, [and]).

 

// 2015.02.17

Blank metacharacters:

[\ B]

Roll back (and delete) one character (Backspace key)

\ F

Page feed

\ N

Line Break

\ R

Carriage Return

\ T

Tab (Tab key)

\ V

Vertical Tab

 

Numeric metacharacters:

\ D

Any numeric character (equivalent to [0-9])

\ D

Any non-numeric character (equivalent to [^ 0-9])

 

Alphanumeric metacharacters:

\ W

Any letter, digit (case-sensitive) or underscore (equivalent to [a-zA-Z0-9 _])

\ W

Any non-alphanumeric or non-underline character (equivalent to [^ a-zA-Z0-9 _])

 

Blank metacharacters:

\ S

Any blank character (equivalent to [\ f \ n \ r \ t \ v])

\ S

Any non-blank character (equivalent to [^ \ f \ n \ r \ t \ v])

 

+ Match one or more characters at a time or multiple times (at least one character does not match zero characters ).

 

* Matches one or more characters zero or multiple times.

 

? Matches zero or one occurrence of one or more characters.

 

{N} sets an exact value for the number of repeated matches (for example, {3} indicates that the previous character or character set must appear three times in a row ).

 

{N, m} sets an interval for the number of repeated matches (for example, {2, 4} indicates that the previous character or character set combination appears at least twice consecutively, at most 4 times, {3 ,} indicates that the previous character or character set must appear at least three times ).

 

Greedy metacharacters and their lazy versions:

*

*?

+

+?

{N ,}

{N ,}?

 

(Conclusion: the real power of a regular expression is reflected in the repeat matching. + One or more occurrences of matching characters or character sets, * zero or multiple occurrences of matching characters or character sets ,? Matches zero or one occurrence of a character or character set. To get more precise control, you can use the {} syntax to precisely control the minimum and maximum values of a repeat or repeat. Metacharacters are classified into two types: "greedy" and "lazy". To prevent over-matching, use the "lazy" metacharacters to construct regular expressions .)

 

\ B is used to match the start or end of a word.

 

\ B is used to match the start or end of a character.

 

^ Defines the start of a string, and $ defines the end of a string.

 

(? M) used to enable the Branch matching mode ,(? M) must appear at the beginning of the entire mode.

 

(Conclusion: regular expressions can be used not only to match text blocks of any length, but also to match text that appears at a specific position of a string. \ B is used to specify a word boundary (\ B is the opposite ). ^ And $ are used to customize the string boundary (the start of a string and the end of a string ). If (? M) in combination, ^ and $ will also match the string starting or ending at the beginning of a line break (at this time, the line break will be considered as a string separator ).)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.